From maj at fortinbras.us Sat Aug 1 00:35:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 1 Aug 2009 00:35:04 -0400 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: References: Message-ID: <99E27D08408340B9B0611751A17DF266@NewLife> Sorry, I cut off the last script. The entire thing follows: /usr/local/bin/conv-ASMake.sh : #!/usr/bin/sed -f #converting an ActiveState PERL Makefile to run under cygwin make: s/^DIRFILESEP = ^\\/DIRFILESEP = \// s/^NOOP = rem/NOOP = :/ # -or- NOOP = echo -n # byebye volume s/C:/\/cygdrive\/c/ # sed to convert directory \ to / s/\([\)0-9a-zA-Z.]\)\\\([\(0-9a-zA-Z]\)/\1\/\2/g # convert full perl s/\/usr\/bin\/perl/\/cygdrive\/c\/Perl\/bin\/perl/ # a key conversion for DOC_INSTALL action /^DESTINSTALLVENDORHTMLDIR/ a\ DECYGDESTINSTALLARCHLIB = $(subst /cygdrive/c,c:,$(DESTINSTALLARCHLIB)) # --- MakeMaker tools_other section: # let cygwin do native linux commands /^MAKE/ c\ MAKE = make /^CHMOD/ c\ CHMOD = chmod /^CP/ c\ CP = cp /^MV/ c\ MV = mv /^NOOP/ c\ NOOP = : /^RM_F/ c\ RM_F = rm -f /^RM_RF/ c\ RM_RF = rm -rf /^TEST_F[^I]/ c\ TEST_F = test -f /^TOUCH/ c\ TOUCH = touch /^TEST_S/ c\ TEST_S = test -s /^DEV_NULL/ c\ DEV_NULL = > /dev/null 2>&1 /^ECHO[^_]/ c\ ECHO = echo /^ECHO_N/ c\ ECHO_N = echo -n # override OS-specific File::Spec /^MOD_INSTALL/ c\ MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e "use File::Spec::Cygwin;@File::Spec::ISA=('File::Spec::Cygwin');" -e "map { s[/cygdrive/c][] } @ARGV;install({@ARGV}, '$(VERBINST)', 0, '$(UNINST)');" -- /^FIXIN/ c\ FIXIN = $(PERLRUN) "-MExtUtils::MY" -e "MY->fixin(shift)" # remove cygwin volume prefix for doc installs /Appending installation info to/ s/DESTIN/DECYGDESTIN/ /perllocal\.pod/ s/DESTIN/DECYGDESTIN/ /NOECHO) \$(MKPATH/ s/DESTIN/DECYGDESTIN/ #end conv-ASMake.sh ----- Original Message ----- From: "Jonathan Cline" To: Cc: Sent: Friday, July 31, 2009 11:24 PM Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl >I recently mentioned working on Bio::Robotics for Tecan. Vendors > being MS-Win specific, the vendor software allows third-party software > communication through a named pipe (the literal filename is > "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific > and this pseudo-pipe is opened with sysopen() ). This is broken under > cygwin-perl due to cygwin's method of handling paths -- the sysopen > fails. However it works under ActiveState Perl and communication > through the named pipe (to the robot hardware) is OK. The standard > workaround is usually to use cygwin bash, and force the PATH to use > ActiveState perl. (Typical MS Windows incompatibility problem.) The > issue is: Perl module libraries for CPAN work under cygwin-perl > (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN > module use, or "make test", result in a bad list of incompatibility > problems. Yet ActiveState Perl is required for communicating to the > vendor application (unless there is some workaround to raw filesystem > access in cygwin-perl that I haven't found in 2 days of working this). > The stand-alone scripts I have work fine to access the named pipe > (using ActiveState Perl) since the standalone scripts have no module > INC dependencies, no CPAN module test harness, etc etc. > > This isn't specifically a Bio:: issue, though if anyone has > suggestions please email. I could try msys and see if it handles the > named-pipe-special-file better, if msys has an msys-perl distribution. > > -- > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jncline at gmail.com Sun Aug 2 23:32:20 2009 From: jncline at gmail.com (Jonathan Cline) Date: Sun, 02 Aug 2009 22:32:20 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> Message-ID: <4A765A44.7030902@gmail.com> Smithies, Russell wrote: > I "acquired" an old Biomek 1000 that I'm thinking of modernising. It was originally controlled by a monstrously large but slow pc (IBM Value Point Model 466DX2 computer with Microsoft Windows* Version 3.1) > My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) and use software like mach3 www.machsupport.com along with G-code to control it. > I come from an engineering background so it seemed like the easy way to me :-) > > Now I just need a bit of free time to get it working... > > --Russell > > > I agree, that's probably the best way to go. It's hard to know what amount of s/w processing was done on the host PC vs. the embedded controller. If you were able to connect directly to the robot hardware with serial port(s) or whatever it's using, it would be tough to find out the comm protocol unless someone has already reverse engineered it (which is doubtful). Also from what I have seen online, attempting to run the old software under virtual machine is unpredictable due to timing differences in the serial port communication. So removal of the old electronics is probably the best bet. If it has one arm, then it's much easier. As for robots with working workstation software, it seems the annoyance factor is that while the scripting languages are powerful (for GUI scripting that is), they are still relatively low level. Bio types with a bit of CS seem to immediately turn to visual basic, labview, or even excel spreadsheets and macros, in order to provide a higher level abstraction for the workstation software. To me, it seems natural that there should be a "protocol compiler" which takes biology protocols as input, and gives robot instructions as output (google "protolexer"). The huge bottleneck of course is that everyone's robotics work tables and equipment are somewhat unique to their needs. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >> Sent: Thursday, 30 July 2009 2:07 p.m. >> To: bioperl-l at lists.open-bio.org >> Cc: Jonathan Cline >> Subject: [Bioperl-l] Bio::Robotics namespace discussion >> >> I am writing a module for communication with biology robotics, as >> discussed recently on #bioperl, and I invite your comments. >> >> Currently this mode talks to a Tecan genesis workstation robot ( >> http://images.google.com/images?q=tecan genesis ). Other vendors are >> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >> 'net with the exception of some visual basic and labview scripts which I >> have found. There are some computational biologists who program for >> robots via high level s/w, but these scripts are not distributed as OSS. >> >> With Tecan, there is a datapipe interface for hardware communication, as >> an added $$ option from the vendor. I haven't checked other vendors to >> see if they likewise have an open communication path for third party >> software. By allowing third-party communication, then naturally the >> next step is to create a socket client-server; especially as the robot >> vendor only support MS Win and using the local machine has typical >> Microsoft issues (like losing real time communication with the hardware >> due to GUI animation, bad operating system stability, no unix except >> cygwin, etc). >> >> >> On Namespace: >> >> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >> s/w modules already called 'robots' (web spider robots, chat bots, www >> automate, etc) so I chose the longer name "robotics" to differentiate >> this module as manipulating real hardware. Bio::Robotics is the >> abstraction for generic robotics and Bio::Robotics::(vendor) is the >> manufacturer-specific implementation. Robot control is made more >> complex due to the very configurable nature of the work table (placement >> of equipment, type of equipment, type of attached arm, etc). The >> abstraction has to be careful not to generalize or assume too much. In >> some cases, the Bio::Robotics modules may expand to arbitrary equipment >> such as thermocyclers, tray holders, imagers, etc - that could be a >> future roadmap plan. >> >> Here is some theoretical example usage below, subject to change. At >> this time I am deciding how much state to keep within the Perl module. >> By keeping state, some robot programming might be simplified (avoiding >> deadlock or tracking tip state). In general I am aiming for a more >> "protocol friendly" method implementation. >> >> >> To use this software with locally-connected robotics hardware: >> >> use Bio::Robotics; >> >> my $tecan = Bio::Robotics->new("Tecan") || die; >> $tecan->attach() || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack1"); >> $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray", to >> => "DNATray"); >> ... >> >> To use this software with remote robotics hardware over the network: >> >> # On the local machine, run: >> use Bio::Robotics; >> >> my @connected_hardware = Bio::Robotics->query(); >> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >> @connected_hardware\n"; >> $tecan->attach() || die; >> $tecan->configure("my work table configuration file") || die; >> # Run the server and process commands >> while (1) { >> $error = $tecan->server(passwordplaintext => "0xd290"); >> if ($tecan->lastClientCommand() =~ /^shutdown/) { >> last; >> } >> } >> $tecan->detach(); >> exit(0); >> >> # On the remote machine (the client), run: >> use Bio::Robotics; >> >> my $server = "heavybio.dyndns.org:8080"; >> my $password = "0xd290"; >> my $tecan = Bio::Robotics->new("Tecan"); >> $tecan->connect($server, $mypassword) || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack200"); >> $tecan->pipette(aspirate => "1", dispense => "1", >> from => "sampleTray A1", to => "DNATray A2", >> volume => "45", liquid => "Buffer"); >> $tecan->pipette(drop => "1"); >> ... >> $tecan->disconnect(); >> exit(0); >> >> >> >> -- >> >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From dan.bolser at gmail.com Tue Aug 4 08:03:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:03:00 +0100 Subject: [Bioperl-l] problem with t/LocalDB/SeqFeature.t when host ne localhost In-Reply-To: References: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> Message-ID: <2c8757af0908040503oe2a258dkac4311bb099dc3ac@mail.gmail.com> 2009/7/31 Chris Fields : > Dan, > > Can you file this as a BioPerl bug? ?I'm planning on driving towards > releasing 1.6.1 alpha1 soon (next few weeks) and I would like to get this > one fixed. http://bugzilla.open-bio.org/show_bug.cgi?id=2899 Dan. From dan.bolser at gmail.com Tue Aug 4 08:14:02 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:14:02 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0908040514w198085cfgf4a1adc344095f36@mail.gmail.com> 2009/4/27 Heikki Lehvaslaiho : > Dan, > > Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. > > Test and extend, > > ? ?-Heikki Thanks for help with this. I finally got round to looking at the code (after several others had done the same). I have messed with the code a bit, and added a 'mask_below_threshold' method [1] and some tests to go with it (including some extra tests) [2]. Cheers, Dan. [1] http://bugzilla.open-bio.org/show_bug.cgi?id=2897 [2] http://bugzilla.open-bio.org/show_bug.cgi?id=2898 > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> I'll take your code and put it into bioperl-live rewritten the way I >> suggested and add few tests. >> >> That should get you started, >> >> ? -Heikki >> >> 2009/4/27 Dan Bolser : >>> Hi Heikki, >>> >>> Thanks very much for the advice on how to better implement the clear >>> range method within the Bio::Seq::Quality object. I can understand the >>> logic of what you have written, and it all sounds reasonable. The only >>> problem is that I am very inexperienced with working on object >>> oriented Perl (my 'one man' projects to date have never really >>> required me to think beyond scripts, and its been years since I >>> actually tried to code objects in Perl). >>> >>> To be specific, when you say, "Lets add a method that sets the >>> threshold and stores it internally as $self->_threshold", ignoring any >>> other functionality, what would that method look like? in particular, >>> how would $self->_threshold be implemented? >>> >>> I think once I see that detail, I can go ahead and try to code what >>> you suggested. >>> >>> >>> Similarly (Chris), where would I put the tests / how would they be implemented? >>> >>> >>> Thanks again for the feedback. >>> >>> All the best, >>> Dan. >>> >>> >>> >>> 2009/4/27 Heikki Lehvaslaiho : >>>> Dan, >>>> >>>> It looks like your method does two different things: >>>> >>>> 1. Returns the longest subsequence above the threshold >>>> 2. Analyses the the sequence for the number of ranges the current >>>> threshold creates. >>>> >>>> Why not separate these functions? >>>> >>>> Lets add a method that sets the threshold and stores it internally as >>>> $self->_threshold. Setting it to a new values should trigger emptying >>>> all the caches (see below.) >>>> >>>> Lets have two more public methods: >>>> >>>> 1. get_clean_range() - optional argument 'threshold' >>>> >>>> It returns the longest clean subseq. >>>> >>>> 2. count_clean_ranges() -again optional argument 'threshold' >>>> >>>> This returns the number of ranges detected. >>>> >>>> Both methods call first the public method threshold if the argument >>>> has been given and then an internal method ?_find_clean_ranges(). That >>>> method calculates all the ranges and stores them internally ?(as >>>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>>> that needs to be emptied whenever any of the critical values of the >>>> object changes: threshold, quality or seq. Create an internal method >>>> $self->_clear_cache, that does that. >>>> >>>> Now the quality new object does not get created until you call >>>> get_clean_range() which accesses the cached values (or creates them if >>>> they are not there). >>>> >>>> This design allows you to have no extra penalty for adding more >>>> methods that act on cached values. For example, it might be sensible >>>> thing to do ?at some point to look at all the ranges that are longer >>>> than some length. Then you could write in your program: >>>> >>>> >>>> $qual->threshold(10); >>>> if ($qual->count_clean_ranges = 1) { >>>> ?my $newqual = $qual->get_clean_range() >>>> ?# do your analysis >>>> } elsif ($qual->count_clean_ranges = 0) { >>>> ? # do some reporting and logging >>>> } else { ?# more than one ranges >>>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>>> ? # do some more work and possibly select the best one(s) >>>> } >>>> >>>> >>>> >>>> Yours, >>>> >>>> ? -Heikki >>>> >>>> 2009/4/24 Chris Fields : >>>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>>> possible, tests don't hurt either! >>>>> >>>>> chris >>>>> >>>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>>> >>>>>> Its a bit rough and ready, but it does what I need... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> =head2 get_clear_range >>>>>> >>>>>> Title ? ?: get_clear_range >>>>>> >>>>>> Title ? ?: subqual >>>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>>> Function : Get the clear range using the given quality score as a >>>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>>> >>>>>> Returns ?: a new Bio::Seq::Quality object >>>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>>> >>>>>> =cut >>>>>> >>>>>> sub get_clear_range >>>>>> { >>>>>> ? my $self = shift; >>>>>> ? my $qual = $self->qual; >>>>>> ? my $minQual = shift || 13; >>>>>> >>>>>> ? my (@ranges, $rangeFlag); >>>>>> >>>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Log the range >>>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? ? ? ?else{ >>>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? } >>>>>> ? ## Did we exit the last clear range? >>>>>> ? if(defined($rangeFlag)){ >>>>>> ? ? ? ?my $i = scalar(@$qual); >>>>>> ? ? ? ?## Log the range >>>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? } >>>>>> >>>>>> ? unless(@ranges){ >>>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>>> ? } >>>>>> >>>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>>> >>>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>>> >>>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>>> >>>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>>> ? ? ? ?} >>>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>>> >>>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>>> $_->[1]+1), >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>>> $_->[1]+1) >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>>> ? } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>>> in (apart from all the debugging output that I spit out). >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>>> >>>>>> 2009/4/24 Dan Bolser : >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I couldn't find out how to get the 'clear range' from a >>>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>>> >>>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>>> I take the time to finish that off. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Dan. >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>>> -- >>>> ? ?-Heikki >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +27 (0)714328090 >>>> Sent from Claremont, WC, South Africa >>>> >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From dan.bolser at gmail.com Tue Aug 4 12:32:31 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 17:32:31 +0100 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> Message-ID: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> 2009/7/28 shalabh sharma : > Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to find > overall percentage similarity between them. > How i can do that? Tried using blast? You can download that. Try asking in irc://irc.freenode.net/#bioinformatics Dan. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shameer at ncbs.res.in Tue Aug 4 12:43:40 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 4 Aug 2009 22:13:40 +0530 (IST) Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> Message-ID: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Hello Shalabh, You may try ALISTAT. Available as a part of SQUID library from Prof. Sean Eddy. Make an alignment of your 100 sequences and use alignment as input of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ Best, Khader Shameer > 2009/7/28 shalabh sharma : >> Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to >> find >> overall percentage similarity between them. >> How i can do that? > > Tried using blast? > > You can download that. > > > Try asking in irc://irc.freenode.net/#bioinformatics > > Dan. > > >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 4 13:36:34 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 Aug 2009 13:36:34 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Message-ID: <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> Hi All, thanks a lot. @Khader Shameer, ALISTAT is what i was looking for. But still it gives you the average identity, what i need exactly is the average similarity. Thanks Shalabh Sharma On Tue, Aug 4, 2009 at 12:43 PM, K. Shameer wrote: > Hello Shalabh, > > You may try ALISTAT. Available as a part of SQUID library from Prof. Sean > Eddy. Make an alignment of your 100 sequences and use alignment as input > of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ > > Best, > Khader Shameer > > > 2009/7/28 shalabh sharma : > >> Hi All, I have some protein sequences (around 100) i need to > >> find > >> overall percentage similarity between them. > >> How i can do that? > > > > Tried using blast? > > > > You can download that. > > > > > > Try asking in irc://irc.freenode.net/#bioinformatics > > > > Dan. > > > > > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From shalabh.sharma7 at gmail.com Wed Aug 5 09:31:21 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 Aug 2009 09:31:21 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> Message-ID: <9fcc48c70908050631q1a080b74x12e81985b455332e@mail.gmail.com> Hi, Thanks for the reply. I used clustalW for the MSA. Also i was just wondering that what if i use smith Waterman (EMBOSS' water) and pass the same library as query sequences and reference library, then just parse it and calculate average similarity.Is this right approach? Thanks Shalabh On Wed, Aug 5, 2009 at 3:10 AM, Dan Bolser wrote: > 2009/8/4 shalabh sharma : > > Hi All, thanks a lot. > > @Khader Shameer, ALISTAT is what i was looking for. But still it gives > you > > the average identity, what i need exactly is the average similarity. > > The problem is that identity is well defined. Similarity is more > vague, and at least depends on a particular alignment scoring matrix. > How did you align your sequences? > > Dan. > > >> > Try asking in irc://irc.freenode.net/#bioinformatics > >> > > > ;-) > From michael.watson at bbsrc.ac.uk Wed Aug 5 09:50:35 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 5 Aug 2009 14:50:35 +0100 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank Message-ID: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Hi I want to download GSS sequences using Bio::DB::GenBank. When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. I'm using bioperl 1.5.1. Any clues? Mick From rmb32 at cornell.edu Wed Aug 5 11:28:46 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 05 Aug 2009 08:28:46 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <4A79A52E.7000104@cornell.edu> I think you're looking for the -db => 'nucgss' option. I'll add a better listing of this (undocumented) options to the Bio::DB::Query::GenBank docs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu michael watson (IAH-C) wrote: > Hi > > I want to download GSS sequences using Bio::DB::GenBank. > > When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. > > I'm using bioperl 1.5.1. > > Any clues? > > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Wed Aug 5 12:16:04 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 5 Aug 2009 09:16:04 -0700 Subject: [Bioperl-l] Job opening at Genentech [SSF, CA]. Message-ID: <19065.45124.4999.922147@already.dhcp.gene.com> I have an opening in my group in the Bioinformatics department at Genentech [South San Francisco, CA]. At the moment (for the next year or so) our main focus is rebuilding and extending a system for collecting, processing, and disseminating information about mutations and variations (think web interfaces, relational databases, alignments, workflows/pipelines). In the future we'll pick up projects related to next-gen sequencing (Me too!!! In the future, what isn't related to next-gen?), data integration, and/or lab-specific projects. First and foremost I'm looking for someone who's sharp and who enjoys computers, biology, and technology; someone who gets excited about picking up new tools but who also has a sense of responsibility and restraint. I'm looking for someone who's familiar with several languages and tools; modern Perl complemented with C is my first choice these days, supplemented with R and (when necessary) anything from the rest of the programming language bestiary. There's a fair amount of Java flying around here too so familiarity with it and the JVM world will help. Relational databases are part of the picture: Oracle for the big stuff; SQLite, Postgresql, and MySQL play niche roles. I generally interact with them via ORM's, lately it's been Rose::DB::Object on the Perl side though I've been convinced to take another look at DBIx::Class. Most of my web apps use CGI::Application, as fastcgi's, mod_perl, or simple CGI scripts, but (as with ORM's) I may take another look at Catalyst. I'm looking for someone who's interested in building real software. We'll be putting together a set of tools and data that need to hang together and evolve for at least 4-5 years. Deploy and run won't cut it. Requirements will change, so it's important to me that we build things so they're as modular and flexible as possible. Testing, source control, and documentation matter. A strong candidate will have an understanding of basic bioinformatics concepts and the ability to pick up new biology and computer science concepts as necessary. At the junior end of the spectrum I'd expect a bachelor's degree + 3 years of experience, at the upper end would a masters + 5 years (or a PhD interested in moving towards the production side of the house). I can imagine running through one or more detail oriented interview questions that drilled down (or took of on a tangent) from the following: - What's the difference between Smith-Waterman, blast, sim4, gmap, and/or bowtie alignment algorithms or tools? Which would you use when, and why? - Why is Moose better than Class::Accessor? (yes, it's Perl centered, but it could spin out into any language [e.g. why is Java better than Perl?]). What's a MOP? Who cares? - CVS, subversion, git, mercurial. You've already picked one? Which one? Why? Why not? - XML or JSON or YAML. Pick one for moving data back and forth in an Ajax based interface. Why? Would it also work well in other contexts? - How would you store information about positional features on a genome so that you could get fast random access? How would your solution tie into a larger data context? Genentech's a great place to work: solid salaries, great benefits, Bay Area location (who could ask for more?). We're open source friendly and with the arrival Robert Gentleman (our new Director, of Bioconductor/R fame) likely to become more so. The recent Roche acquisition hasn't changed life much, it seems to mostly be a source of opportunities for those of us in Research. If you know anyone who fits the bill, have them drop me a note. Thanks! g. From hilgert at cshl.edu Wed Aug 5 16:27:28 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Wed, 5 Aug 2009 16:27:28 -0400 Subject: [Bioperl-l] Bio::SeqIO issue Message-ID: Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org From cjfields at illinois.edu Wed Aug 5 17:04:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:04:14 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > Is my impression correct that Bio::SeqIO just assumes that sequences > are > being submitted in FASTA format? No. See: http://www.bioperl.org/wiki/HOWTO:SeqIO SeqIO tries to guess at the format using the file extension, and if one isn't present makes use of Bio::Tools::GuessSeqFormat. It's possible that the extension is causing the problem, or that GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to guessing). In any case, it's always advisable to explicitly indicate the format when possible. Relevant lines: return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/i; ... return 'raw' if /\.(txt)$/i; > In our experience, implementing > Bio::SeqIO led to the first line of files being cut off, regardless of > whether the files were indeed fasta files or files that only contained > sequence. Files that only contain sequence are 'raw'. Ones in FASTA are 'fasta'. > Which, in the latter, led to sequence submissions that had the > first line of nucleotides removed. Has anyone tried to write a fix for > this? This sounds like a bug, but we have very little to go on beyond your description. What version of bioperl are you using, OS, etc? What does your data look like? File extension? chris > Thanks, > > Uwe > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Aug 5 17:03:04 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:03:04 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40624DA61@EX02.asurite.ad.asu.edu> SeqIO is just a base framework for reading/writing of files. If you want it to read a fasta format, then you tell it create it the object. $seqio = Bio::SeqIO->new(-format=>'fasta'); Will tell the program to use Bio::SeqIO::fasta for the object. Look at the docs for the various formats that Bio::SeqIO supports. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hilgert, Uwe Sent: Wednesday, August 05, 2009 1:27 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::SeqIO issue Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 5 17:37:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:37:52 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> Message-ID: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Uwe, Please keep replies on the list. It's very possible that's the issue; IIRC the fasta parser pulls out the full sequence in chunks (based on local $/ = "\n>") and splits the header off as the first line in that chunk. You could probably try leaving the format out and letting SeqIO guess it, or passing the file into Bio::Tools::GuessSeqFormat directly, but it's probably better to go through the files and add a file extension that corresponds to the format. chris On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > Thanks, Chris. The files have no extension, but we indicate what > format > to use, like in the manual: > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > I wonder now whether this could exactly cause the problem: as we are > telling that input files are in fasta format they are being treated as > such (=remove first line) - regardless of whether they really are > fasta? > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > Uwe Hilgert, Ph.D. > Dolan DNA Learning Center > Cold Spring Harbor Laboratory > > C: (516) 857-1693 > V: (516) 367-5185 > E: hilgert at cshl.edu > F: (516) 367-5182 > W: http://www.dnalc.org > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, August 05, 2009 5:04 PM > To: Hilgert, Uwe > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >> Is my impression correct that Bio::SeqIO just assumes that sequences >> are >> being submitted in FASTA format? > > No. See: > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > SeqIO tries to guess at the format using the file extension, and if > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > possible that the extension is causing the problem, or that > GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to > guessing). In any case, it's always advisable to explicitly indicate > the format when possible. > > Relevant lines: > > return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > i; > ... > return 'raw' if /\.(txt)$/i; > >> In our experience, implementing >> Bio::SeqIO led to the first line of files being cut off, regardless >> of >> whether the files were indeed fasta files or files that only >> contained >> sequence. > > Files that only contain sequence are 'raw'. Ones in FASTA are > 'fasta'. > >> Which, in the latter, led to sequence submissions that had the >> first line of nucleotides removed. Has anyone tried to write a fix >> for >> this? > > This sounds like a bug, but we have very little to go on beyond your > description. What version of bioperl are you using, OS, etc? What > does your data look like? File extension? > > chris > >> Thanks, >> >> Uwe >> >> >> >> >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> >> Uwe Hilgert, Ph.D. >> >> Dolan DNA Learning Center >> >> Cold Spring Harbor Laboratory >> >> >> >> V: (516) 367-5185 >> >> E: hilgert at cshl.edu >> >> F: (516) 367-5182 >> >> W: http://www.dnalc.org >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Wed Aug 5 17:45:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:45:03 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <1A4207F8295607498283FE9E93B775B40624DA9B@EX02.asurite.ad.asu.edu> I'm not sure, but I think the module is fasta, not Fasta. So it should be -format=>'fasta', unless you're on a case-insensitive system that is forgiving the capital... Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Wednesday, August 05, 2009 2:38 PM > To: Hilgert, Uwe > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and > splits the > header off as the first line in that chunk. You could probably try > leaving the format out and letting SeqIO guess it, or passing > the file > into Bio::Tools::GuessSeqFormat directly, but it's probably > better to > go through the files and add a file extension that > corresponds to the > format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > > > Thanks, Chris. The files have no extension, but we indicate what > > format > > to use, like in the manual: > > > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > > > I wonder now whether this could exactly cause the problem: as we are > > telling that input files are in fasta format they are being > treated as > > such (=remove first line) - regardless of whether they really are > > fasta? > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > C: (516) 857-1693 > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, August 05, 2009 5:04 PM > > To: Hilgert, Uwe > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > > > >> Is my impression correct that Bio::SeqIO just assumes that > sequences > >> are > >> being submitted in FASTA format? > > > > No. See: > > > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > > SeqIO tries to guess at the format using the file extension, and if > > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > > possible that the extension is causing the problem, or that > > GuessSeqFormat guessing wrong (it's apt to do that, as it's > forced to > > guessing). In any case, it's always advisable to > explicitly indicate > > the format when possible. > > > > Relevant lines: > > > > return 'fasta' if > /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > > i; > > ... > > return 'raw' if /\.(txt)$/i; > > > >> In our experience, implementing > >> Bio::SeqIO led to the first line of files being cut off, > regardless > >> of > >> whether the files were indeed fasta files or files that only > >> contained > >> sequence. > > > > Files that only contain sequence are 'raw'. Ones in FASTA are > > 'fasta'. > > > >> Which, in the latter, led to sequence submissions that had the > >> first line of nucleotides removed. Has anyone tried to > write a fix > >> for > >> this? > > > > This sounds like a bug, but we have very little to go on beyond your > > description. What version of bioperl are you using, OS, etc? What > > does your data look like? File extension? > > > > chris > > > >> Thanks, > >> > >> Uwe > >> > >> > >> > >> > >> > >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >> > >> Uwe Hilgert, Ph.D. > >> > >> Dolan DNA Learning Center > >> > >> Cold Spring Harbor Laboratory > >> > >> > >> > >> V: (516) 367-5185 > >> > >> E: hilgert at cshl.edu > >> > >> F: (516) 367-5182 > >> > >> W: http://www.dnalc.org > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Aug 5 18:53:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Aug 2009 18:53:56 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Aug 5 19:12:52 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 5 Aug 2009 19:12:52 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> If these items were included in a Bugzilla report, that would be most convenient (= most likely to get looked carefully) and is the best place for us to keep track of these kinds of issues-- http://bugzilla.bioperl.org/ cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 05, 2009 6:53 PM Subject: Re: [Bioperl-l] Bio::SeqIO issue >I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >>> guessing). In any case, it's always advisable to explicitly indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 6 00:43:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 23:43:45 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: The SeqIO::fasta parser sets: local $/ = "\n>"; then splits the resulting chunks of data (each corresponding to a full FASTA-formatted sequence) into two pieces: my ($top,$sequence) = split(/\n/,$entry,2); If there is no description line (e.g. the file is all raw sequence data) these lines would result in reading in the whole file, then split out the first line. chris On Aug 5, 2009, at 5:53 PM, Hilmar Lapp wrote: > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show > us your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the > line, or that the line endings in your data file are from a > different OS than the one you're running the script on. (Or that you > are running a very old version of BioPerl, which is entirely > possible if you installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls >> out the full sequence in chunks (based on local $/ = "\n>") and >> splits the header off as the first line in that chunk. You could >> probably try leaving the format out and letting SeqIO guess it, or >> passing the file into Bio::Tools::GuessSeqFormat directly, but it's >> probably better to go through the files and add a file extension >> that corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being >>> treated as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a >>>> fix for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 01:12:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 00:12:13 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> Message-ID: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be > most convenient (= most likely to get looked carefully) > and is the best place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From eigenrosen at gmail.com Thu Aug 6 03:12:24 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 00:12:24 -0700 Subject: [Bioperl-l] Trouble with Clustalw Message-ID: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> I'm a complete bioperl novice, trying to do Clustalw on some fasta files, and am running into trouble: ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 550. Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 551. Can't exec "align": No such file or directory at /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/ Root/Root.pm:328 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 STACK: TestClust:22 ----------------------------------------------------------- Here's my code: #!/usr/bin/perl -w use Bio::Perl; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::Seq; use strict; use warnings; my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); my @seq_array = read_all_sequences($ARGV[0],'fasta'); for (my $i = 0; $i < @seq_array; $i++){ (my $seq = $seq_array[$i]->seq()) =~ s/-//g; $seq_array[$i]->seq($seq); } write_sequence(">test",'fasta', at seq_array); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); my @align_array = $aln->each_seq(); write_sequence(">testfile",'fasta', at align_array); The loop is just there to take out some gaps that were placed in a blast previous to this. The write_sequence call confirms that @seq_array is a valid array of Bio:Seq objects at the time align calls it. Here's some output in "test": >A0220B0939one.1 FV584Q101DEWY9 TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >A0220B0939one.2 FV584Q101A4DG7 TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG ... Thanks, Mike From florian.mittag at uni-tuebingen.de Thu Aug 6 05:38:38 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 11:38:38 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907151500.21947.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> Message-ID: <200908061138.38809.florian.mittag@uni-tuebingen.de> Hi! I just noticed, that we didn't solve this problem completely. On Wednesday, 15. July 2009 15:00, Florian Mittag wrote: > > Well, it is like this with version 9.5 of DB2 Express-C: > > > > SELECT NULL FROM bioentry; > > > > yields: > > SQL0206N "NULL" is not valid in the context where it is used. > > SQLSTATE=42703 SQLCODE=-206 > > > > But if I do: > > > > SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; > > > > [...] > > > > It ran fine without the NULL column, but that isn't necessarily a sign of > > correctness. My problem was that (as stated above) the old version of DB2 > > requires you to cast the NULL value to a data type, which I wasn't able > > to determine from the code. With the new version, it should work, so I'll > > have to rerun my tests again and see if the problem is still there. > > You convinced me that the NULL column is supposed to be there, so I found > another workaround around line 1273 in BaseDriver.pm: > > if((! $attr) || (! $entitymap->{$tbl}) || > $dont_select_attrs->{$tbl .".". $attr}) { > #push(@attrs, "NULL"); > push(@attrs, "cast(NULL as VARCHAR(255))"); > } else { > > Since I don't know how to determine the datatype of the column that is set > to NULL, I simply chose VARCHAR and tested it. And it worked! (BTW: The > column set to NULL is named "rank" in the case below.) Although this solution works, it is not the best, because it breaks compatibility with all other database types, e.g., MySQL. Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" only when the driver is DB2? - Florian From hlapp at gmx.net Thu Aug 6 09:36:08 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:36:08 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: Why is specifying fasta format when your input is not in fasta format not a user error? I agree with the not removing newlines in raw format being a bug. -hilmar On Aug 6, 2009, at 1:12 AM, Chris Fields wrote: > Just to confirm: the following is using bioperl-live on my macbook > pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug > or a user issue (if it's the former, we can easily add an exception > indicating lack of a header). Note that 'raw' also fails for the > raw example below (doesn't appear to remove newlines). > > -c > > cjfields4:fasta cjfields$ cat raw_v_fasta.pl > #!/usr/bin/perl -w > > use strict; > use warnings; > use IO::String; > use Bio::SeqIO; > use Test::More qw(no_plan); > > my %seq; > > $seq{raw} = < MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > RAW > > $seq{fasta} = < >CATH_RAT > MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > FASTA > > my %newdata; > for my $input (sort keys %seq) { > my $fh = IO::String->new($seq{$input}); > my $seq = Bio::SeqIO->new(-format => 'fasta', > -fh => $fh)->next_seq; > $newdata{$input} = $seq->seq; > } > is($newdata{raw}, $newdata{fasta}, 'format'); > > cjfields4:fasta cjfields$ perl raw_v_fasta.pl > not ok 1 - format > # Failed test 'format' > # at raw_v_fasta.pl line 36. > # got: > 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > # expected: > 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > 1..1 > # Looks like you failed 1 test of 1. > > On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > >> If these items were included in a Bugzilla report, that would be >> most convenient (= most likely to get looked carefully) >> and is the best place for us to keep track of these kinds of >> issues-- http://bugzilla.bioperl.org/ >> cheers MAJ >> ----- Original Message ----- From: "Hilmar Lapp" >> To: "Chris Fields" >> Cc: "BioPerl List" >> Sent: Wednesday, August 05, 2009 6:53 PM >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> Uwe - I'd like you to go back to Chris' initial questions that >>> you haven't answered yet: "What version of bioperl are you using, >>> OS, etc? What does your data look like?" I'd add to that, can >>> you show us your full script, or a smaller code snippet that >>> reproduces the problem. >>> I suspect that either something in your script is swallowing the >>> line, or that the line endings in your data file are from a >>> different OS than the one you're running the script on. (Or that >>> you are running a very old version of BioPerl, which is entirely >>> possible if you installed through CPAN.) >>> -hilmar >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out the full sequence in chunks (based on local $/ = "\n>") and >>>> splits the header off as the first line in that chunk. You >>>> could probably try leaving the format out and letting SeqIO >>>> guess it, or passing the file into Bio::Tools::GuessSeqFormat >>>> directly, but it's probably better to go through the files and >>>> add a file extension that corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate >>>>> what format >>>>> to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated as >>>>> such (=remove first line) - regardless of whether they really >>>>> are fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences >>>>>> are >>>>>> being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>>> forced to >>>>> guessing). In any case, it's always advisable to explicitly >>>>> indicate >>>>> the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>>> $/ i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless of >>>>>> whether the files were indeed fasta files or files that only >>>>>> contained >>>>>> sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix for >>>>>> this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Aug 6 09:42:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:42:06 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200908061138.38809.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> Message-ID: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > only when the driver is DB2? Not yet, but that's the solution I had in mind, i.e., introducing a method in the Bio::DB::DBI::* (driver-specific) classes that returns whatever NULL as a SELECT field should be represented as. What will be very hard or nearly impossible to do is to cast to the actual type of the column, so if simply using VARCHAR(255) does the trick for DB2 that'd be great. BTW you did check that simply aliasing the column does not fix the problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will throw an error, right? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florian.mittag at uni-tuebingen.de Thu Aug 6 10:12:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 16:12:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> Message-ID: <200908061612.21852.florian.mittag@uni-tuebingen.de> On Thursday, 6. August 2009 15:42, Hilmar Lapp wrote: > On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > > only when the driver is DB2? > > Not yet, but that's the solution I had in mind, i.e., introducing a > method in the Bio::DB::DBI::* (driver-specific) classes that returns > whatever NULL as a SELECT field should be represented as. Sounds like a good idea! > What will be > very hard or nearly impossible to do is to cast to the actual type of > the column, so if simply using VARCHAR(255) does the trick for DB2 > that'd be great. Surprisingly, it does. At least, I haven't noticed any problems if the target data type is for example an integer. With all the trouble I have with DB2, I didn't expect this. > BTW you did check that simply aliasing the column does not fix the > problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will > throw an error, right? Yepp: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL AS col1, term.ontology_id FROM term WHERE identifier = ? [IBM][CLI Driver][DB2/LINUX] SQL0418N A statement contains a use of an untyped parameter marker or a null value that is not valid. - Florian From hilgert at cshl.edu Thu Aug 6 11:01:05 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:01:05 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: I'm not sure what version we have. Cornel may have installed it a while ago from CVS: Module id = Bio::Root::Build CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm INST_VERSION 1.006900 cpan> m Bio::Root::Version Module id = Bio::Root::Version CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm INST_VERSION 1.006900 cpan> m Bio::SeqIO Module id = Bio::SeqIO CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm INST_VERSION undef Cornel still has the checked-out "bioperl-live" directory and the last changes are from March this year. As per why he used "Fasta" instead of 'fasta" as the format parameter in Bio::SeqIO, it's because that what it says in the modules manual. He now tried 'fasta' instead and see no changes in behavior. Omitting the format parameter altogether, fasta-formatted sequence continues to be treated correctly, the first line being removed. However, raw sequence is being treated differently in that the first line is not being removed any more. Instead, the program returns the first line only. Which, in the example I am going to forward in my next message, will return 60 amino acids out of raw sequence of 300 aa. Can't win with raw sequence... The files may be created on different platforms, we didn't notice any difference between using files created on Windows or Linux. Thanks Uwe -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Wednesday, August 05, 2009 6:54 PM To: Chris Fields Cc: Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hilgert at cshl.edu Thu Aug 6 11:03:53 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:03:53 -0400 Subject: [Bioperl-l] FW: Bio::SeqIO issue Message-ID: If you don't specify any format only the first line gets returned: not ok 1 - format # Failed test 'format' # at test/test_fasta.pl line 35. # got: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. -----Original Message----- From: Hilgert, Uwe Sent: Thursday, August 06, 2009 9:12 AM To: Ghiban, Cornel Subject: FW: [Bioperl-l] Bio::SeqIO issue -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 1:12 AM To: Mark A. Jensen Cc: Hilgert, Uwe; BioPerl List; Hilmar Lapp Subject: Re: [Bioperl-l] Bio::SeqIO issue Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWT FSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCK FNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVG YGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be most > convenient (= most likely to get looked carefully) and is the best > place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From hlapp at gmx.net Thu Aug 6 11:18:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 11:18:06 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while > ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the > format parameter altogether, fasta-formatted sequence continues to be > treated correctly, the first line being removed. However, raw sequence > is being treated differently in that the first line is not being > removed > any more. Instead, the program returns the first line only. Which, in > the example I am going to forward in my next message, will return 60 > amino acids out of raw sequence of 300 aa. Can't win with raw > sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bosborne11 at verizon.net Thu Aug 6 11:20:49 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 11:20:49 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <2F73C3DC-D943-4EC3-834A-EA2984FDDB5D@verizon.net> Uwe et al, Yes, this argument works irrespective of case: The format name is case-insensitive: 'FASTA', 'Fasta' and 'fasta' are all valid. From Bio::SeqIO. Brian O. On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the From cjfields at illinois.edu Thu Aug 6 12:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:30:01 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> On Aug 6, 2009, at 8:36 AM, Hilmar Lapp wrote: > Why is specifying fasta format when your input is not in fast format > not a user error? Agreed. My point is should we worry about adding an exception (which may be a little more user-friendly). Right now the bad stuff happens silently. > I agree with the not removing newlines in raw format being a bug. > > -hilmar Acc. to the SeqIO::raw docs, this is a little trickier. The documented behavior explicitly indicates that each line (sans non- whitespace) is assumed to be a separate sequence, so changing that behavior breaks API. I suppose we can have $/ set locally to a cached $/ default value or undef: # assumes entire file is read in my $io = Bio::SeqIO->new(-format => 'raw', -gulp => 1); chris From hlapp at gmx.net Thu Aug 6 12:42:00 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 12:42:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> Message-ID: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> On Aug 6, 2009, at 12:30 PM, Chris Fields wrote: > Agreed. My point is should we worry about adding an exception > (which may be a little more user-friendly). Right now the bad stuff > happens silently. Great point. We don't want silent failures, do we. > >> I agree with the not removing newlines in raw format being a bug. >> >> -hilmar > > Acc. to the SeqIO::raw docs, this is a little trickier. The > documented behavior explicitly indicates that each line (sans non- > whitespace) is assumed to be a separate sequence, so changing that > behavior breaks API. Ah - true indeed. I like the optional argument feature - that way it's easy for the user to choose. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Thu Aug 6 12:49:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:49:53 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From biopython at maubp.freeserve.co.uk Thu Aug 6 12:51:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Aug 2009 17:51:34 +0100 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> Message-ID: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: > >>> I agree with the not removing newlines in raw format being a bug. >>> >>> ? ? ? ?-hilmar >> >> Acc. to the SeqIO::raw docs, this is a little trickier. ?The documented >> behavior explicitly indicates that each line (sans non-whitespace) is >> assumed to be a separate sequence, so changing that behavior breaks API. > > Ah - true indeed. I like the optional argument feature - that way it's easy > for the user to choose. > For reference, "raw" as a format in EMBOSS seems to give just one sequence regardless of any line breaks. Adding an optional argument might be clearest, but have you considered using the new BioPerl SeqIO variant argument to have two forms of raw (the original variant giving one sequence per line, and a new variant where you just get one sequence regardless of any line breaks)? Peter From cjfields at illinois.edu Thu Aug 6 12:58:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:58:07 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> Message-ID: On Aug 6, 2009, at 11:51 AM, Peter wrote: > On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: >> >>>> I agree with the not removing newlines in raw format being a bug. >>>> >>>> -hilmar >>> >>> Acc. to the SeqIO::raw docs, this is a little trickier. The >>> documented >>> behavior explicitly indicates that each line (sans non-whitespace) >>> is >>> assumed to be a separate sequence, so changing that behavior >>> breaks API. >> >> Ah - true indeed. I like the optional argument feature - that way >> it's easy >> for the user to choose. >> > > For reference, "raw" as a format in EMBOSS seems to give just one > sequence regardless of any line breaks. Yes, and that's the behavior I would expect, actually. > Adding an optional argument might be clearest, but have you considered > using the new BioPerl SeqIO variant argument to have two forms of raw > (the original variant giving one sequence per line, and a new variant > where you just get one sequence regardless of any line breaks)? > > Peter That's a good point. We'd have to keep 'raw' as the prior behavior, but 'raw-complete' could be used for such a circumstance ('raw-gulp' sounds just wrong ;) chris From rmb32 at cornell.edu Thu Aug 6 13:14:12 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 Aug 2009 10:14:12 -0700 Subject: [Bioperl-l] tigrxml parsing Message-ID: <4A7B0F64.9070205@cornell.edu> Hi all, Recently in #bioperl somebody came by trying to use Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz svn HEAD tigrxml.pm was not at all happy with these files, eventually dieing with ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: start is undefined STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 STACK: Bio::RangeI::contains Bio/RangeI.pm:255 STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/Generic.pm:783 STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/Base.pm:266 STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/Expat.pm:225 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/Expat.pm:45 STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm:2631 STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 STACK: /crypt/rob/test2.pl:10 ----------------------------------------------------------- Looking at the medicago XML and comparing it to the bioperl-live/t/data/test.tigrxml, the two look VERY different in structure. Lots of things that are attrs in test.tigrxml seem to be elements in the medicago XML, for example. So I guess the question is: is the medicago TIGR XML malformed? Can tigrxml.pm be expected to parse it? What, if anything, should be done about this? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From hilgert at cshl.edu Thu Aug 6 15:36:36 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 15:36:36 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Hmmm, I fail to see how supplying raw sequence could be a called "bad" input or a "problem". In our case, for example, not every user is a bioinformatics expert and Cornel was suggesting to account for that instead of trying to "train" the user to adhere to requirements that have not much to do with what s/he tries to accomplish. I don't really see data being modified, rather that the data format is being adopted to the needs of the software; which I would argue should be something the software is being able to take care of. Uwe -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 12:50 PM To: Ghiban, Cornel Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 16:09:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:09:22 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <6729F9CC-ACF9-4BC4-9905-7EA24C1DCA61@illinois.edu> If one supplies raw sequence (no descriptor) to a FASTA parser (requires a descriptor), then it is bad input. One can't reasonably expect the parser to work correctly under those circumstance. Garbage in, garbage out. The simplest and (IMHO) best solution under such circumstances is for the parser to die meaningfully ("Sequence is not FASTA format; '>' descriptor line is missing" or similar). Tacking a '>' onto bad data doesn't make it magically work, it's just bad data with a '>' appended. To take this one step further, what if this were genbank data? Or XML? A well-formed exception, though initially inconvenient to the user, will indicate the problem right away. Silently trying to fix the problem by appending '>' to bad input data wouldn't work, and the resulting failure downstream (likely from validate_seq) would obscure the real problem, being the user is using the wrong format parser. chris On Aug 6, 2009, at 2:36 PM, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being > adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > >> Hi, >> >> It doesn't matter what sequence we use. As Chris Fields's showed in >> his test, not having >> ">" as the 1st character on the first line is the problem. >> We always assumed the sequence is in FASTA format and this seems to >> be wrong. >> >> I think, the solution to our problem is to check whether the ">" >> symbol is present or not. >> If not present then it will be added. >> >> Thank you, >> Cornel Ghiban >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Thursday, August 06, 2009 11:18 AM >> To: Hilgert, Uwe >> Cc: Chris Fields; BioPerl List; Ghiban, Cornel >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> Uwe - could you send an actual data file (as an attachment) that >> reproduces the problem, or is that not possible? >> >> -hilmar >> >> On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: >> >>> I'm not sure what version we have. Cornel may have installed it a >>> while ago from CVS: >>> >>> Module id = Bio::Root::Build >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::Root::Version >>> Module id = Bio::Root::Version >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::SeqIO >>> Module id = Bio::SeqIO >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >>> INST_VERSION undef >>> >>> Cornel still has the checked-out "bioperl-live" directory and the >>> last >>> changes are from March this year. >>> >>> As per why he used "Fasta" instead of 'fasta" as the format >>> parameter >>> in Bio::SeqIO, it's because that what it says in the modules manual. >>> He now tried 'fasta' instead and see no changes in behavior. >>> Omitting >>> the format parameter altogether, fasta-formatted sequence continues >>> to >>> be treated correctly, the first line being removed. However, raw >>> sequence is being treated differently in that the first line is not >>> being removed any more. Instead, the program returns the first line >>> only. Which, in the example I am going to forward in my next >>> message, >>> will return 60 amino acids out of raw sequence of 300 aa. Can't win >>> with raw sequence... >>> >>> >>> The files may be created on different platforms, we didn't notice >>> any >>> difference between using files created on Windows or Linux. >>> >>> Thanks >>> Uwe >>> >>> >>> >>> >>> -----Original Message----- >>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>> Sent: Wednesday, August 05, 2009 6:54 PM >>> To: Chris Fields >>> Cc: Hilgert, Uwe; BioPerl List >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> >>> Uwe - I'd like you to go back to Chris' initial questions that you >>> haven't answered yet: "What version of bioperl are you using, OS, >>> etc? >>> What does your data look like?" I'd add to that, can you show us >>> your >>> full script, or a smaller code snippet that reproduces the problem. >>> >>> I suspect that either something in your script is swallowing the >>> line, >>> or that the line endings in your data file are from a different OS >>> than the one you're running the script on. (Or that you are >>> running a >>> very old version of BioPerl, which is entirely possible if you >>> installed through CPAN.) >>> >>> -hilmar >>> >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out >>>> the full sequence in chunks (based on local $/ = "\n>") and splits >>>> the header off as the first line in that chunk. You could probably >>>> try leaving the format out and letting SeqIO guess it, or passing >>>> the >>>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>>> better to go through the files and add a file extension that >>>> corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate what >>>>> format to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated >>>>> as such (=remove first line) - regardless of whether they really >>>>> are >>>>> fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe >>>>> Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences are being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>>> to guessing). In any case, it's always advisable to explicitly >>>>> indicate the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>>> i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless >>>>>> of whether the files were indeed fasta files or files that only >>>>>> contained sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix >>>>>> for this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:25:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:25:45 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> Message-ID: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Michael, Are you using ClustalW 2? I'm not sure but I don't think the wrapper has been updated for the latest version (I think parsing still works, though). chris On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > I'm a complete bioperl novice, trying to do Clustalw on some fasta > files, and am running into trouble: > > ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 550. > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 551. > Can't exec "align": No such file or directory at /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - > output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ > Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 > STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 > STACK: TestClust:22 > ----------------------------------------------------------- > > Here's my code: > > #!/usr/bin/perl -w > > use Bio::Perl; > use Bio::AlignIO; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::Seq; > use strict; > use warnings; > > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my @seq_array = read_all_sequences($ARGV[0],'fasta'); > > for (my $i = 0; $i < @seq_array; $i++){ > (my $seq = $seq_array[$i]->seq()) =~ s/-//g; > $seq_array[$i]->seq($seq); > } > > write_sequence(">test",'fasta', at seq_array); > > my $seq_array_ref = \@seq_array; > my $aln = $factory->align($seq_array_ref); > > my @align_array = $aln->each_seq(); > write_sequence(">testfile",'fasta', at align_array); > > > The loop is just there to take out some gaps that were placed in a > blast previous to this. The write_sequence call confirms that > @seq_array is a valid array of Bio:Seq objects at the time align > calls it. Here's some output in "test": > > >A0220B0939one.1 FV584Q101DEWY9 > TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC > CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT > TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT > TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG > CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG > CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA > CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA > CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT > AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG > >A0220B0939one.2 FV584Q101A4DG7 > TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG > ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC > AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG > TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG > GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA > GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT > CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT > CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT > ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG > ... > > Thanks, > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:30:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:30:30 -0500 Subject: [Bioperl-l] tigrxml parsing In-Reply-To: <4A7B0F64.9070205@cornell.edu> References: <4A7B0F64.9070205@cornell.edu> Message-ID: Robert, This popped up recently (may be related): http://thread.gmane.org/gmane.comp.lang.perl.bio.general/19782 http://bugzilla.open-bio.org/show_bug.cgi?id=2868 It might be possible to map this into bioperl, but someone needs to take it up. chris On Aug 6, 2009, at 12:14 PM, Robert Buels wrote: > Hi all, > > Recently in #bioperl somebody came by trying to use > Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz > > svn HEAD tigrxml.pm was not at all happy with these files, > eventually dieing with > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: start is undefined > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 > STACK: Bio::RangeI::contains Bio/RangeI.pm:255 > STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/ > Generic.pm:783 > STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 > STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 > STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/ > Base.pm:266 > STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/ > Expat.pm:225 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm: > 469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/ > Expat.pm:45 > STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 > STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm: > 2631 > STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 > STACK: /crypt/rob/test2.pl:10 > ----------------------------------------------------------- > > Looking at the medicago XML and comparing it to the bioperl-live/t/ > data/test.tigrxml, the two look VERY different in structure. Lots > of things that are attrs in test.tigrxml seem to be elements in the > medicago XML, for example. > > So I guess the question is: is the medicago TIGR XML malformed? > Can tigrxml.pm be expected to parse it? What, if anything, should > be done about this? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From eigenrosen at gmail.com Thu Aug 6 16:39:09 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 13:39:09 -0700 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Hi Chris, I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the top of the module being called. Mike On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the > wrapper has been updated for the latest version (I think parsing > still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > >> I'm a complete bioperl novice, trying to do Clustalw on some fasta >> files, and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >> Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a >> blast previous to this. The write_sequence call confirms that >> @seq_array is a valid array of Bio:Seq objects at the time align >> calls it. Here's some output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Thu Aug 6 16:49:56 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 6 Aug 2009 16:49:56 -0400 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <69367b8f0908061349i48f4d2b1tcbccb00d5a3de5ca@mail.gmail.com> Hi Micheal, Have you considered calling clustalw from perl's "system" command and passing in the files for alignment? Mgavi On Thu, Aug 6, 2009 at 4:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > > I'm a complete bioperl novice, trying to do Clustalw on some fasta files, >> and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf -output=gcg >> -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a blast >> previous to this. The write_sequence call confirms that @seq_array is a >> valid array of Bio:Seq objects at the time align calls it. Here's some >> output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Aug 6 17:00:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 16:00:37 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <2C8DF4CB-40B0-41DB-882A-AAF346A008B2@illinois.edu> Michael, No, I meant was what version of clustalw (the actual executable) you are using. This is the bioperl wrapper svn version. What happens if you enter 'clustalw' on the command line? Do you get: ************************************************************** ******** CLUSTAL 2.0.11 Multiple Sequence Alignments ******** ************************************************************** I think the above version has problems with bioperl, though I can't recall exactly what the problems were. chris On Aug 6, 2009, at 3:39 PM, Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at > the top of the module being called. > > Mike > On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has been updated for the latest version (I think parsing >> still works, though). >> >> chris >> >> On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: >> >>> I'm a complete bioperl novice, trying to do Clustalw on some fasta >>> files, and am running into trouble: >>> >>> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 550. >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 551. >>> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >>> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >>> Bio/Root/Root.pm:328 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >>> STACK: TestClust:22 >>> ----------------------------------------------------------- >>> >>> Here's my code: >>> >>> #!/usr/bin/perl -w >>> >>> use Bio::Perl; >>> use Bio::AlignIO; >>> use Bio::Tools::Run::Alignment::Clustalw; >>> use Bio::SimpleAlign; >>> use Bio::Seq; >>> use strict; >>> use warnings; >>> >>> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >>> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >>> >>> for (my $i = 0; $i < @seq_array; $i++){ >>> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >>> $seq_array[$i]->seq($seq); >>> } >>> >>> write_sequence(">test",'fasta', at seq_array); >>> >>> my $seq_array_ref = \@seq_array; >>> my $aln = $factory->align($seq_array_ref); >>> >>> my @align_array = $aln->each_seq(); >>> write_sequence(">testfile",'fasta', at align_array); >>> >>> >>> The loop is just there to take out some gaps that were placed in a >>> blast previous to this. The write_sequence call confirms that >>> @seq_array is a valid array of Bio:Seq objects at the time align >>> calls it. Here's some output in "test": >>> >>> >A0220B0939one.1 FV584Q101DEWY9 >>> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >>> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >>> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >>> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >>> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >>> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >>> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >>> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >>> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >>> >A0220B0939one.2 FV584Q101A4DG7 >>> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >>> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >>> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >>> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >>> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >>> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >>> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >>> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >>> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >>> ... >>> >>> Thanks, >>> Mike >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From bosborne11 at verizon.net Thu Aug 6 16:01:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 16:01:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Chris, Yes, I think so. By the way, this is related to an old bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1508 Brian O. > This is a simple validation issue: should we throw an exception on > bad input (no '>') From bix at sendu.me.uk Thu Aug 6 17:18:02 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 06 Aug 2009 22:18:02 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <4A7B488A.2060600@sendu.me.uk> Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the > top of the module being called. I'm guessing your error is caused simply by not having clustalw installed. BioPerl run modules provide perl wrappers to external executables. They don't replace the need for those executables. From cjfields at illinois.edu Thu Aug 6 20:47:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 19:47:47 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: I added the exception and tests to svn (r15895), so I closed that bug out. Almost forgot about that one, thanks for pointing it out! chris On Aug 6, 2009, at 3:01 PM, Brian Osborne wrote: > Chris, > > Yes, I think so. > > By the way, this is related to an old bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1508 > > > Brian O. > > >> This is a simple validation issue: should we throw an exception on >> bad input (no '>') > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 22:30:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 21:30:09 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A765A44.7030902@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: Jonathan, Just to make sure you aren't accidentally 'warnocked' by the core devs: Your code sounds quite nice! However, we will begin the process of massively restructuring bioperl pretty soon, so I don't think it's a good idea to gear your code towards fitting directly into core. The best alternative should be fairly obvious, which is to release it to CPAN listing BioPerl 1.6.0 as a dependency if it is required. Your modules may or may not need the Bio* namespace (that's up to you, actually); there are several non-bioperl modules that also share the Bio* namespace, and I believe there are modules that aren't Bio* that use BioPerl (Gbrowse comes to mind). If you're focusing on interaction with robotics, Robotics::Bio::X might be a better namespace for instance (b/c you could expand later into other possibly non-bio robotics interfaces). The cpan-discuss list is probably a good place to ask, or (after you register on PAUSE) you can register the module namespace and see if there are any objections to the request. chris On Aug 2, 2009, at 10:32 PM, Jonathan Cline wrote: > Smithies, Russell wrote: >> I "acquired" an old Biomek 1000 that I'm thinking of modernising. >> It was originally controlled by a monstrously large but slow pc >> (IBM Value Point Model 466DX2 computer with Microsoft Windows* >> Version 3.1) >> My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) >> and use software like mach3 www.machsupport.com along with G-code >> to control it. >> I come from an engineering background so it seemed like the easy >> way to me :-) >> >> Now I just need a bit of free time to get it working... >> >> --Russell >> >> >> > I agree, that's probably the best way to go. It's hard to know what > amount of s/w processing was done on the host PC vs. the embedded > controller. If you were able to connect directly to the robot > hardware > with serial port(s) or whatever it's using, it would be tough to find > out the comm protocol unless someone has already reverse engineered it > (which is doubtful). Also from what I have seen online, attempting > to > run the old software under virtual machine is unpredictable due to > timing differences in the serial port communication. So removal of > the > old electronics is probably the best bet. If it has one arm, then > it's > much easier. > > As for robots with working workstation software, it seems the > annoyance > factor is that while the scripting languages are powerful (for GUI > scripting that is), they are still relatively low level. Bio types > with > a bit of CS seem to immediately turn to visual basic, labview, or even > excel spreadsheets and macros, in order to provide a higher level > abstraction for the workstation software. To me, it seems natural > that > there should be a "protocol compiler" which takes biology protocols as > input, and gives robot instructions as output (google "protolexer"). > The huge bottleneck of course is that everyone's robotics work tables > and equipment are somewhat unique to their needs. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > > >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>> Sent: Thursday, 30 July 2009 2:07 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Cc: Jonathan Cline >>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>> >>> I am writing a module for communication with biology robotics, as >>> discussed recently on #bioperl, and I invite your comments. >>> >>> Currently this mode talks to a Tecan genesis workstation robot ( >>> http://images.google.com/images?q=tecan genesis ). Other vendors >>> are >>> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >>> 'net with the exception of some visual basic and labview scripts >>> which I >>> have found. There are some computational biologists who program for >>> robots via high level s/w, but these scripts are not distributed >>> as OSS. >>> >>> With Tecan, there is a datapipe interface for hardware >>> communication, as >>> an added $$ option from the vendor. I haven't checked other >>> vendors to >>> see if they likewise have an open communication path for third party >>> software. By allowing third-party communication, then naturally the >>> next step is to create a socket client-server; especially as the >>> robot >>> vendor only support MS Win and using the local machine has typical >>> Microsoft issues (like losing real time communication with the >>> hardware >>> due to GUI animation, bad operating system stability, no unix except >>> cygwin, etc). >>> >>> >>> On Namespace: >>> >>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are >>> many >>> s/w modules already called 'robots' (web spider robots, chat bots, >>> www >>> automate, etc) so I chose the longer name "robotics" to >>> differentiate >>> this module as manipulating real hardware. Bio::Robotics is the >>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>> manufacturer-specific implementation. Robot control is made more >>> complex due to the very configurable nature of the work table >>> (placement >>> of equipment, type of equipment, type of attached arm, etc). The >>> abstraction has to be careful not to generalize or assume too >>> much. In >>> some cases, the Bio::Robotics modules may expand to arbitrary >>> equipment >>> such as thermocyclers, tray holders, imagers, etc - that could be a >>> future roadmap plan. >>> >>> Here is some theoretical example usage below, subject to change. At >>> this time I am deciding how much state to keep within the Perl >>> module. >>> By keeping state, some robot programming might be simplified >>> (avoiding >>> deadlock or tracking tip state). In general I am aiming for a more >>> "protocol friendly" method implementation. >>> >>> >>> To use this software with locally-connected robotics hardware: >>> >>> use Bio::Robotics; >>> >>> my $tecan = Bio::Robotics->new("Tecan") || die; >>> $tecan->attach() || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack1"); >>> $tecan->pipette(aspirate => "1", dispense => "1", from => >>> "sampleTray", to >>> => "DNATray"); >>> ... >>> >>> To use this software with remote robotics hardware over the network: >>> >>> # On the local machine, run: >>> use Bio::Robotics; >>> >>> my @connected_hardware = Bio::Robotics->query(); >>> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >>> @connected_hardware\n"; >>> $tecan->attach() || die; >>> $tecan->configure("my work table configuration file") || die; >>> # Run the server and process commands >>> while (1) { >>> $error = $tecan->server(passwordplaintext => "0xd290"); >>> if ($tecan->lastClientCommand() =~ /^shutdown/) { >>> last; >>> } >>> } >>> $tecan->detach(); >>> exit(0); >>> >>> # On the remote machine (the client), run: >>> use Bio::Robotics; >>> >>> my $server = "heavybio.dyndns.org:8080"; >>> my $password = "0xd290"; >>> my $tecan = Bio::Robotics->new("Tecan"); >>> $tecan->connect($server, $mypassword) || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack200"); >>> $tecan->pipette(aspirate => "1", dispense => "1", >>> from => "sampleTray A1", to => "DNATray A2", >>> volume => "45", liquid => "Buffer"); >>> $tecan->pipette(drop => "1"); >>> ... >>> $tecan->disconnect(); >>> exit(0); >>> >>> >>> >>> -- >>> >>> ## Jonathan Cline >>> ## jcline at ieee.org >>> ## Mobile: +1-805-617-0223 >>> ######################## >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Aug 7 05:19:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Aug 2009 10:19:14 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? ?I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris That shouldn't matter, according to Des Higgins ClustalW 2 is intended to be completely compatible with ClustalW 1.83, including the command line options. They will be adding new stuff in ClustalW 3. The only think to worry about with ClustalW 2 is parsing the output, as the header line of the alignments has changed very slightly. I can tell you from personal experience that the Biopython command line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for example, and would expect the same to be true for BioPerl. Peter From paola.bisignano at gmail.com Fri Aug 7 08:11:58 2009 From: paola.bisignano at gmail.com (Paola Bisignano via Scour) Date: Fri, 7 Aug 2009 05:11:58 -0700 Subject: [Bioperl-l] Scour Friend Invite Message-ID: <4a7c1a0e5b82d@gmail.com> Hey, Check out: http://scour.com/invite/paola82/ I'm using a new search engine called Scour.com. It shows Google/Yahoo/MSN results and user comments all on one page. Best of all we get rewarded for using it by collecting points with every search, comment and vote. The points are redeemable for Visa gift cards. Join through my invite link so we can be friends and search socially! I know you'll like it, - Paola Bisignano This message was sent to you as a friend referral to join scour.com, please feel free to review our http://scour.com/privacy page and our http://scour.com/communityguidelines/antispam page. If you prefer not to receive invitations from ANY scour members, please click here - http://www.scour.com/unsub/e/YmlvcGVybC1sQGxpc3RzLm9wZW4tYmlvLm9yZw== Write to us at: Scour, Inc., 15303 Ventura Blvd. Suite 220, Sherman Oaks, CA 91403, USA. campaignid: scour200908070001 Scour.com From hlapp at gmx.net Fri Aug 7 09:21:51 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 7 Aug 2009 09:21:51 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4a7c1a0e5b82d@gmail.com> References: <4a7c1a0e5b82d@gmail.com> Message-ID: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Just FYI, I am addressing this offline. Note to everyone: we don't tolerate this and it will get you removed from the list immediately (and banned for the second offense). This is a large list. You better spend the time and be very careful who you send this kind of stuff to before you waste everyone else's. -hilmar From stefan.kirov at bms.com Fri Aug 7 10:25:52 2009 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 07 Aug 2009 10:25:52 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> References: <4a7c1a0e5b82d@gmail.com> <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Message-ID: <4A7C3970.10501@bms.com> Hilmar Lapp wrote: > Just FYI, I am addressing this offline. Note to everyone: we don't > tolerate this and it will get you removed from the list immediately > (and banned for the second offense). This is a large list. You better > spend the time and be very careful who you send this kind of stuff to > before you waste everyone else's. > > -hilmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > It is quite possible this guy has no idea scour is spamming people on his behalf. It seems to me there should be spam-filter trained to take care of these guys. As a reference: http://forums.digitalpoint.com/showthread.php?t=955786 http://markmail.org/message/fzlutwd3mkforbsu -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From jdalzell03 at qub.ac.uk Mon Aug 3 19:18:24 2009 From: jdalzell03 at qub.ac.uk (Johnathan Dalzell) Date: Tue, 4 Aug 2009 00:18:24 +0100 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 Message-ID: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl 5.10 and the activePerl equivalent. I'm wrking through vista, and ovver multiple times, this is the furthest I can get through installation.... Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] a - will install all scripts Do you want to run tests that require connection to servers across the internet (likely to cause some failures)? y/n [n] y - will run internet-requiring tests Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/lib/Data/Dumper.pm lin e 190, line 9. Creating new 'Build' script for 'BioPerl' version '1.006000' ---- Unsatisfied dependencies detected during ---- ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- SOAP::Lite [requires] GraphViz [requires] Convert::Binary::C [requires] Algorithm::Munkres [requires] XML::Twig [requires] DB_File [requires] Set::Scalar [requires] XML::Parser::PerlSAX [requires] XML::Writer [requires] XML::SAX::Writer [requires] Clone [requires] XML::DOM::XPath [requires] PostScript::TextBlock [requires] Running Build test Delayed until after prerequisites Running Build install Delayed until after prerequisites Running install for module 'SOAP::Lite' Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP-Lite-0.710.08.tar.gz ok CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz We are about to install SOAP::Lite and for your convenience will provide you with list of modules and prerequisites, so you'll be able to choose only modules you need for your configuration. XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by default. Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. Press to see the detailed list. Feature Prerequisites Install? ----------------------------- ---------------------------- -------- Core Package [*] Scalar::Util always [*] Test::More [*] URI [*] MIME::Base64 [*] version [*] XML::Parser (v2.23) Client HTTP support [*] LWP::UserAgent always Client HTTPS support [ ] Crypt::SSLeay [ no ] Client SMTP/sendmail support [ ] MIME::Lite [ no ] Client FTP support [*] IO::File [ yes ] [*] Net::FTP Standalone HTTP server [*] HTTP::Daemon [ yes ] Apache/mod_perl server [ ] Apache [ no ] FastCGI server [ ] FCGI [ no ] POP3 server [ ] MIME::Parser [ no ] [*] Net::POP3 IO server [*] IO::File [ yes ] MQ transport support [ ] MQSeries [ no ] JABBER transport support [ ] Net::Jabber [ no ] MIME messages [ ] MIME::Parser [ no ] DIME messages [*] IO::Scalar (v2.105) [ no ] [ ] DIME::Tools (v0.03) [ ] Data::UUID (v0.11) SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] Compression support for HTTP [*] Compress::Zlib [ yes ] MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] --- An asterix '[*]' indicates if the module is currently installed. Do you want to proceed with this configuration? [yes] yes Checking if your kit is complete... Looks good Writing Makefile for SOAP::Lite cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport\TCP.pm cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport\POP3.pm cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema19 99.pm cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema20 01.pm cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport\MQ.pm cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport\FTP.pm cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP\Transport\JABBER.pm cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_2.pm cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport\IO.pm cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_1.pm cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP\Transport\LOCAL.pm cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP\Transport\MAILTO.pm cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/SOAPsh.pl blib\script\S OAPsh.pl pl2bat.bat blib\script\SOAPsh.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/stubmaker.pl blib\scrip t\stubmaker.pl pl2bat.bat blib\script\stubmaker.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/XMLRPCsh.pl blib\script \XMLRPCsh.pl pl2bat.bat blib\script\XMLRPCsh.pl MKUTTER/SOAP-Lite-0.710.08.tar.gz C:\strawberry\c\bin\dmake.EXE -- OK Running make test C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib\lib' , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/013-array-deserializati on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03-server.t t/04-attach. t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08-schema.t t/096_characters.t t /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t t/IO/SessionSet.t t/SO AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/Deserializer/XMLSchema199 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t t /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/SOAP/Transport/FTP.t t/S OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t t/SOAP/Transport/MAILT O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/HTTP/CGI.t t/XML/Parser /Lite.t t/XMLRPC/Lite.t t/01-core.t .................................. ok t/010-serializer.t ........................... ok t/012-cloneable.t ............................ ok t/013-array-deserialization.t ................ ok t/014_UNIVERSAL_use.t ........................ ok t/015_UNIVERSAL_can.t ........................ ok t/02-payload.t ............................... ok t/03-server.t ................................ ok t/04-attach.t ................................ skipped: Could not find MIME::Parser - is M IME::Tools installed? Aborting. t/05-customxml.t ............................. ok t/06-modules.t ............................... ok t/07-xmlrpc_payload.t ........................ ok t/08-schema.t ................................ ok t/096_characters.t ........................... skipped: (no reason given) t/097_kwalitee.t ............................. skipped: (no reason given) t/098_pod.t .................................. skipped: (no reason given) t/099_pod_coverage.t ......................... skipped: (no reason given) t/IO/SessionData.t ........................... ok t/IO/SessionSet.t ............................ ok t/SOAP/Data.t ................................ ok t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok t/SOAP/Lite/Packager.t ....................... ok t/SOAP/Schema/WSDL.t ......................... ok t/SOAP/Serializer.t .......................... 1/12 Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Lite .pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. t/SOAP/Serializer.t .......................... ok t/SOAP/Transport/FTP.t ....................... 1/7 Use of uninitialized value in split at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 55. substr outside of string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SO AP/Transport/FTP.pm line 56. Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/perl/lib/IO/Socket/INET. pm line 117. Use of uninitialized value $server in concatenation (.) or string at C:\strawberry\cpan\bu ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. t/SOAP/Transport/FTP.t ....................... ok t/SOAP/Transport/HTTP.t ...................... ok t/SOAP/Transport/HTTP/CGI.t .................. everytime I get to the CGI.t at the end here the installation won't move! Any suggestions would be greatly appreciated, I've been trying to force it through, literally for 5 hours now.... cheers, jonny From ghiban at cshl.edu Thu Aug 6 12:04:38 2009 From: ghiban at cshl.edu (Ghiban, Cornel) Date: Thu, 6 Aug 2009 12:04:38 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Message-ID: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Hi, It doesn't matter what sequence we use. As Chris Fields's showed in his test, not having ">" as the 1st character on the first line is the problem. We always assumed the sequence is in FASTA format and this seems to be wrong. I think, the solution to our problem is to check whether the ">" symbol is present or not. If not present then it will be added. Thank you, Cornel Ghiban -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Thursday, August 06, 2009 11:18 AM To: Hilgert, Uwe Cc: Chris Fields; BioPerl List; Ghiban, Cornel Subject: Re: [Bioperl-l] Bio::SeqIO issue Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format parameter > in Bio::SeqIO, it's because that what it says in the modules manual. > He now tried 'fasta' instead and see no changes in behavior. Omitting > the format parameter altogether, fasta-formatted sequence continues to > be treated correctly, the first line being removed. However, raw > sequence is being treated differently in that the first line is not > being removed any more. Instead, the program returns the first line > only. Which, in the example I am going to forward in my next message, > will return 60 amino acids out of raw sequence of 300 aa. Can't win > with raw sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, etc? > What does your data look like?" I'd add to that, can you show us your > full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing the >> file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>> Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences are being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to guessing). In any case, it's always advisable to explicitly >>> indicate the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, regardless >>>> of whether the files were indeed fasta files or files that only >>>> contained sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 8 08:38:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 Aug 2009 08:38:46 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4A7C3970.10501@bms.com> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> Message-ID: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Thanks Stefan--this makes a lot more sense to me than supposing a priori that a previous legitimate user of this list is spamming bioperl-l intentionally. I would prefer to initially give the benefit of the doubt to the intelligence of the users, rather than scare people off who are likely to be already mortified that their emails have been commandeered like this. I would definitely support an spam filter that works. MAJ ----- Original Message ----- From: "Stefan Kirov" To: "Hilmar Lapp" Cc: "BioPerl List" Sent: Friday, August 07, 2009 10:25 AM Subject: Re: [Bioperl-l] Scour Friend Invite > Hilmar Lapp wrote: >> Just FYI, I am addressing this offline. Note to everyone: we don't >> tolerate this and it will get you removed from the list immediately >> (and banned for the second offense). This is a large list. You better >> spend the time and be very careful who you send this kind of stuff to >> before you waste everyone else's. >> >> -hilmar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > It is quite possible this guy has no idea scour is spamming people on > his behalf. It seems to me there should be spam-filter trained to take > care of these guys. > As a reference: > http://forums.digitalpoint.com/showthread.php?t=955786 > http://markmail.org/message/fzlutwd3mkforbsu > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 10:18:59 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 10:18:59 -0400 Subject: [Bioperl-l] SeqIO documentation Message-ID: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Chris, Since we've been discussing formats I just wanted to mention that I've changed this documentation from SeqIO.pm: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then Fasta format is assumed. To: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then SeqIO will throw a fatal error. The code is clear, if SeqIO can't figure out what the format is then it dies, "fasta" is not the default format. Brian O. From cjfields at illinois.edu Sat Aug 8 12:23:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:23:44 -0500 Subject: [Bioperl-l] SeqIO documentation In-Reply-To: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> References: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Message-ID: Brian, That fits current behavior, so yes that makes sense. chris On Aug 8, 2009, at 9:18 AM, Brian Osborne wrote: > Chris, > > Since we've been discussing formats I just wanted to mention that > I've changed this documentation from SeqIO.pm: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then Fasta > format is assumed. > > To: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then SeqIO > will throw a fatal error. > > The code is clear, if SeqIO can't figure out what the format is then > it dies, "fasta" is not the default format. > > > Brian O. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:24:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:24:48 -0500 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Message-ID: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite > > >> Hilmar Lapp wrote: >>> Just FYI, I am addressing this offline. Note to everyone: we don't >>> tolerate this and it will get you removed from the list immediately >>> (and banned for the second offense). This is a large list. You >>> better >>> spend the time and be very careful who you send this kind of stuff >>> to >>> before you waste everyone else's. >>> >>> -hilmar >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> It is quite possible this guy has no idea scour is spamming people on >> his behalf. It seems to me there should be spam-filter trained to >> take >> care of these guys. >> As a reference: >> http://forums.digitalpoint.com/showthread.php?t=955786 >> http://markmail.org/message/fzlutwd3mkforbsu >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:26:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:55 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> Message-ID: <0A43205F-828F-4CC9-ADC3-EBCE92690765@illinois.edu> On Aug 7, 2009, at 4:19 AM, Peter wrote: > On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields > wrote: >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has >> been updated for the latest version (I think parsing still works, >> though). >> >> chris > > That shouldn't matter, according to Des Higgins ClustalW 2 is intended > to be completely compatible with ClustalW 1.83, including the command > line options. They will be adding new stuff in ClustalW 3. The only > think to worry about with ClustalW 2 is parsing the output, as the > header line of the alignments has changed very slightly. > > I can tell you from personal experience that the Biopython command > line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for > example, and would expect the same to be true for BioPerl. > > Peter I would think so as well, but I encountered some issues on my OS using ClustalW 2 with the last release: http://bugzilla.open-bio.org/show_bug.cgi?id=2728 I think it's something small, like something hard-coded in (version maybe) that's causing the problem, just didn't have time to check. chris From cjfields at illinois.edu Sat Aug 8 12:26:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:38 -0500 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <0963ED84-359B-465B-9BA2-956A0AB23587@illinois.edu> Have you tried installing SOAP::Lite directly? That seems to be the hanging point. The funny thing is this is somehow assigning everything as a requirement (SOAP::Lite is a 'recommends'). Worth investigating, but I don't have access to a Windows box (either for XP, Vista, or Win7). Hopefully we'll get a PPM up soon; it's in the roadmap for 1.6.1. In the meantime, (as a strictly temporary measure) have you tried setting PERL5LIB to point to a local copy of bioperl-1.6? chris On Aug 3, 2009, at 6:18 PM, Johnathan Dalzell wrote: > Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl > 5.10 and the activePerl equivalent. I'm wrking through vista, and > ovver multiple times, this is the furthest I can get through > installation.... > > > Install [a]ll Bioperl scripts, [n]one, or choose groups > [i]nteractively? [a] a > - will install all scripts > Do you want to run tests that require connection to servers across > the internet > (likely to cause some failures)? y/n [n] y > - will run internet-requiring tests > Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/ > lib/Data/Dumper.pm lin > e 190, line 9. > Creating new 'Build' script for 'BioPerl' version '1.006000' > ---- Unsatisfied dependencies detected during ---- > ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- > SOAP::Lite [requires] > GraphViz [requires] > Convert::Binary::C [requires] > Algorithm::Munkres [requires] > XML::Twig [requires] > DB_File [requires] > Set::Scalar [requires] > XML::Parser::PerlSAX [requires] > XML::Writer [requires] > XML::SAX::Writer [requires] > Clone [requires] > XML::DOM::XPath [requires] > PostScript::TextBlock [requires] > Running Build test > Delayed until after prerequisites > Running Build install > Delayed until after prerequisites > Running install for module 'SOAP::Lite' > Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP- > Lite-0.710.08.tar.gz > ok > CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > We are about to install SOAP::Lite and for your convenience will > provide > you with list of modules and prerequisites, so you'll be able to > choose > only modules you need for your configuration. > XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by > default. > Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. > Press to see the detailed list. > Feature Prerequisites Install? > ----------------------------- ---------------------------- -------- > Core Package [*] Scalar::Util always > [*] Test::More > [*] URI > [*] MIME::Base64 > [*] version > [*] XML::Parser (v2.23) > Client HTTP support [*] LWP::UserAgent always > Client HTTPS support [ ] Crypt::SSLeay [ no ] > Client SMTP/sendmail support [ ] MIME::Lite [ no ] > Client FTP support [*] IO::File [ yes ] > [*] Net::FTP > Standalone HTTP server [*] HTTP::Daemon [ yes ] > Apache/mod_perl server [ ] Apache [ no ] > FastCGI server [ ] FCGI [ no ] > POP3 server [ ] MIME::Parser [ no ] > [*] Net::POP3 > IO server [*] IO::File [ yes ] > MQ transport support [ ] MQSeries [ no ] > JABBER transport support [ ] Net::Jabber [ no ] > MIME messages [ ] MIME::Parser [ no ] > DIME messages [*] IO::Scalar (v2.105) [ no ] > [ ] DIME::Tools (v0.03) > [ ] Data::UUID (v0.11) > SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] > Compression support for HTTP [*] Compress::Zlib [ yes ] > MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] > --- An asterix '[*]' indicates if the module is currently installed. > Do you want to proceed with this configuration? [yes] yes > Checking if your kit is complete... > Looks good > Writing Makefile for SOAP::Lite > cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod > cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm > cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm > cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm > cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm > cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm > cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm > cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport > \TCP.pm > cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm > cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport > \POP3.pm > cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm > cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod > cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm > cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm > cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm > cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm > cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm > cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod > cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod > cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod > cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm > cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm > cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod > cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm > cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm > cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod > cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema19 > 99.pm > cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm > cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm > cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod > cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport > \HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema20 > 01.pm > cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod > cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm > cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm > cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport > \MQ.pm > cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport > \FTP.pm > cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP > \Transport\JABBER.pm > cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm > cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod > cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm > cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_2.pm > cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport > \IO.pm > cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_1.pm > cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP > \Transport\LOCAL.pm > cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm > cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod > cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm > cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP > \Transport\MAILTO.pm > cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > SOAPsh.pl blib\script\S > OAPsh.pl > pl2bat.bat blib\script\SOAPsh.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > stubmaker.pl blib\scrip > t\stubmaker.pl > pl2bat.bat blib\script\stubmaker.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > XMLRPCsh.pl blib\script > \XMLRPCsh.pl > pl2bat.bat blib\script\XMLRPCsh.pl > MKUTTER/SOAP-Lite-0.710.08.tar.gz > C:\strawberry\c\bin\dmake.EXE -- OK > Running make test > C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib\lib' > , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/ > 013-array-deserializati > on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03- > server.t t/04-attach. > t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08- > schema.t t/096_characters.t t > /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t > t/IO/SessionSet.t t/SO > AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/ > Deserializer/XMLSchema199 > 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/ > Deserializer/XMLSchemaSOAP1_1.t t > /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/ > SOAP/Transport/FTP.t t/S > OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t > t/SOAP/Transport/MAILT > O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/ > HTTP/CGI.t t/XML/Parser > /Lite.t t/XMLRPC/Lite.t > t/01-core.t .................................. ok > t/010-serializer.t ........................... ok > t/012-cloneable.t ............................ ok > t/013-array-deserialization.t ................ ok > t/014_UNIVERSAL_use.t ........................ ok > t/015_UNIVERSAL_can.t ........................ ok > t/02-payload.t ............................... ok > t/03-server.t ................................ ok > t/04-attach.t ................................ skipped: Could not > find MIME::Parser - is M > IME::Tools installed? Aborting. > t/05-customxml.t ............................. ok > t/06-modules.t ............................... ok > t/07-xmlrpc_payload.t ........................ ok > t/08-schema.t ................................ ok > t/096_characters.t ........................... skipped: (no reason > given) > t/097_kwalitee.t ............................. skipped: (no reason > given) > t/098_pod.t .................................. skipped: (no reason > given) > t/099_pod_coverage.t ......................... skipped: (no reason > given) > t/IO/SessionData.t ........................... ok > t/IO/SessionSet.t ............................ ok > t/SOAP/Data.t ................................ ok > t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok > t/SOAP/Lite/Packager.t ....................... ok > t/SOAP/Schema/WSDL.t ......................... ok > t/SOAP/Serializer.t .......................... 1/12 Use of > uninitialized value $values[0] > in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08- > wfOzhM\blib\lib/SOAP/Lite > .pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > t/SOAP/Serializer.t .......................... ok > t/SOAP/Transport/FTP.t ....................... 1/7 Use of > uninitialized value in split at > C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/ > Transport/FTP.pm line 55. > substr outside of string at C:\strawberry\cpan\build\SOAP- > Lite-0.710.08-wfOzhM\blib\lib/SO > AP/Transport/FTP.pm line 56. > Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/ > perl/lib/IO/Socket/INET. > pm line 117. > Use of uninitialized value $server in concatenation (.) or string at > C:\strawberry\cpan\bu > ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. > t/SOAP/Transport/FTP.t ....................... ok > t/SOAP/Transport/HTTP.t ...................... ok > t/SOAP/Transport/HTTP/CGI.t .................. > > everytime I get to the CGI.t at the end here the installation won't > move! Any suggestions would be greatly appreciated, I've been > trying to force it through, literally for 5 hours now.... > > cheers, > jonny > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 12:42:12 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 12:42:12 -0400 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <979637B9-F2EC-47A0-9283-440AA2558481@verizon.net> Jonathan, It looks like you're not the only one having problems with SOAP::Lite on Windows. For a possible workaround: http://objectmix.com/perl/638075-how-install-soap-lite-windows.html Brian O. On Aug 3, 2009, at 7:18 PM, Johnathan Dalzell wrote: > SOAP/Transport/HTTP/CGI From stefan.kirov at bms.com Sat Aug 8 16:45:32 2009 From: stefan.kirov at bms.com (Kirov, Stefan) Date: Sat, 8 Aug 2009 16:45:32 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife>, <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> Message-ID: There is indeed, actually my mail with the same header was held for a while. In any case I think these pay-to-search/invite-colleagues/et spam-whole-address-book sites should be banned if they are not formally not spam, since the user is at least partially aware of the effect. I am not sure if this is a good solution, I am just frustrated, because these companies are quite unethical. Maybe not as unethical as others (few come to my mind, but will not name them :-)), but still... On the other hand they have not been a real problem before. As long as this is not a frequent thing I guess the filter is doing a great job. Stefan ________________________________________ From: Chris Fields [cjfields at illinois.edu] Sent: Saturday, August 08, 2009 12:24 PM To: Mark A. Jensen Cc: Kirov, Stefan; Hilmar Lapp; BioPerl List Subject: Re: [Bioperl-l] Scour Friend Invite I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. From j_martin at lbl.gov Sat Aug 8 22:41:53 2009 From: j_martin at lbl.gov (Joel Martin) Date: Sat, 8 Aug 2009 19:41:53 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <20090809024152.GA26943@eniac.jgi-psf.org> Hello, It sounds like you want a layer to to figure out what they're giving your program before you open it, you could use Bio::Tools::GuessSeqFormat and spare your user the pain of knowledge. It seems reasonable that coddling happens only when requested. use IO::String; use Bio::SeqIO; use Bio::Tools::GuessSeqFormat; my @files = ( 'NC_000913.fasta', '.gb' ); for my $file ( @files ) { my ( $string, $strio, $out ); $strio = IO::String->new( $string ); $out = Bio::SeqIO->new ( -fh => $strio, -format => 'raw' ); my $guesser = new Bio::Tools::GuessSeqFormat( -file => $file ); my $in = Bio::SeqIO->new( -format => $guesser->guess , -file => $file ); while ( my $seq = $in->next_seq() ) { $out->write_seq( $seq ); print substr($string, 0, 30), "\n"; } } Joel On Thu, Aug 06, 2009 at 03:36:36PM -0400, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > > > Hi, > > > > It doesn't matter what sequence we use. As Chris Fields's showed in > > his test, not having > > ">" as the 1st character on the first line is the problem. > > We always assumed the sequence is in FASTA format and this seems to > > be wrong. > > > > I think, the solution to our problem is to check whether the ">" > > symbol is present or not. > > If not present then it will be added. > > > > Thank you, > > Cornel Ghiban > > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Thursday, August 06, 2009 11:18 AM > > To: Hilgert, Uwe > > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > Uwe - could you send an actual data file (as an attachment) that > > reproduces the problem, or is that not possible? > > > > -hilmar > > > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > > > >> I'm not sure what version we have. Cornel may have installed it a > >> while ago from CVS: > >> > >> Module id = Bio::Root::Build > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::Root::Version > >> Module id = Bio::Root::Version > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::SeqIO > >> Module id = Bio::SeqIO > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > >> INST_VERSION undef > >> > >> Cornel still has the checked-out "bioperl-live" directory and the > >> last > >> changes are from March this year. > >> > >> As per why he used "Fasta" instead of 'fasta" as the format parameter > >> in Bio::SeqIO, it's because that what it says in the modules manual. > >> He now tried 'fasta' instead and see no changes in behavior. Omitting > >> the format parameter altogether, fasta-formatted sequence continues > >> to > >> be treated correctly, the first line being removed. However, raw > >> sequence is being treated differently in that the first line is not > >> being removed any more. Instead, the program returns the first line > >> only. Which, in the example I am going to forward in my next message, > >> will return 60 amino acids out of raw sequence of 300 aa. Can't win > >> with raw sequence... > >> > >> > >> The files may be created on different platforms, we didn't notice any > >> difference between using files created on Windows or Linux. > >> > >> Thanks > >> Uwe > >> > >> > >> > >> > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Wednesday, August 05, 2009 6:54 PM > >> To: Chris Fields > >> Cc: Hilgert, Uwe; BioPerl List > >> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >> > >> I don't think that can be the problem. If anything, providing the > >> format ought to be better in terms of result than not providing it? > >> > >> Uwe - I'd like you to go back to Chris' initial questions that you > >> haven't answered yet: "What version of bioperl are you using, OS, > >> etc? > >> What does your data look like?" I'd add to that, can you show us your > >> full script, or a smaller code snippet that reproduces the problem. > >> > >> I suspect that either something in your script is swallowing the > >> line, > >> or that the line endings in your data file are from a different OS > >> than the one you're running the script on. (Or that you are running a > >> very old version of BioPerl, which is entirely possible if you > >> installed through CPAN.) > >> > >> -hilmar > >> > >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> > >>> Uwe, > >>> > >>> Please keep replies on the list. > >>> > >>> It's very possible that's the issue; IIRC the fasta parser pulls out > >>> the full sequence in chunks (based on local $/ = "\n>") and splits > >>> the header off as the first line in that chunk. You could probably > >>> try leaving the format out and letting SeqIO guess it, or passing > >>> the > >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably > >>> better to go through the files and add a file extension that > >>> corresponds to the format. > >>> > >>> chris > >>> > >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >>> > >>>> Thanks, Chris. The files have no extension, but we indicate what > >>>> format to use, like in the manual: > >>>> > >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > >>>> > >>>> I wonder now whether this could exactly cause the problem: as we > >>>> are > >>>> telling that input files are in fasta format they are being treated > >>>> as such (=remove first line) - regardless of whether they really > >>>> are > >>>> fasta? > >>>> > >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe > >>>> Hilgert, Ph.D. > >>>> Dolan DNA Learning Center > >>>> Cold Spring Harbor Laboratory > >>>> > >>>> C: (516) 857-1693 > >>>> V: (516) 367-5185 > >>>> E: hilgert at cshl.edu > >>>> F: (516) 367-5182 > >>>> W: http://www.dnalc.org > >>>> > >>>> -----Original Message----- > >>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>> Sent: Wednesday, August 05, 2009 5:04 PM > >>>> To: Hilgert, Uwe > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >>>> > >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >>>> > >>>>> Is my impression correct that Bio::SeqIO just assumes that > >>>>> sequences are being submitted in FASTA format? > >>>> > >>>> No. See: > >>>> > >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO > >>>> > >>>> SeqIO tries to guess at the format using the file extension, and if > >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > >>>> possible that the extension is causing the problem, or that > >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced > >>>> to guessing). In any case, it's always advisable to explicitly > >>>> indicate the format when possible. > >>>> > >>>> Relevant lines: > >>>> > >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > >>>> i; > >>>> ... > >>>> return 'raw' if /\.(txt)$/i; > >>>> > >>>>> In our experience, implementing > >>>>> Bio::SeqIO led to the first line of files being cut off, > >>>>> regardless > >>>>> of whether the files were indeed fasta files or files that only > >>>>> contained sequence. > >>>> > >>>> Files that only contain sequence are 'raw'. Ones in FASTA are > >>>> 'fasta'. > >>>> > >>>>> Which, in the latter, led to sequence submissions that had the > >>>>> first line of nucleotides removed. Has anyone tried to write a fix > >>>>> for this? > >>>> > >>>> This sounds like a bug, but we have very little to go on beyond > >>>> your > >>>> description. What version of bioperl are you using, OS, etc? What > >>>> does your data look like? File extension? > >>>> > >>>> chris > >>>> > >>>>> Thanks, > >>>>> > >>>>> Uwe > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >>>>> > >>>>> Uwe Hilgert, Ph.D. > >>>>> > >>>>> Dolan DNA Learning Center > >>>>> > >>>>> Cold Spring Harbor Laboratory > >>>>> > >>>>> > >>>>> > >>>>> V: (516) 367-5185 > >>>>> > >>>>> E: hilgert at cshl.edu > >>>>> > >>>>> F: (516) 367-5182 > >>>>> > >>>>> W: http://www.dnalc.org > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Sun Aug 9 06:38:30 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 11:38:30 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EA726.60303@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > OK, I propose to look into these. Almost certainly I'll be doing "convert > run/db/network to Module::Build". I'll try to resolve the bugs you've > mentioned. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. Chris already started on "convert run/db/network to Module::Build" for some reason, but his attempt doesn't actually result in any modules getting installed (setting pm_files() like that isn't enough). The easiest, cleanest and most standard solution is to create a lib directory and svn move Bio into it. Does anyone have an objection to me doing this for the network, db and run packages? It will only affect developers currently working on code in those packages, and they just need to be aware that an svn update will be rather dramatic after my change. From cjfields at illinois.edu Sun Aug 9 09:05:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:05:17 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7EA726.60303@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> Message-ID: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> ... > > Chris already started on "convert run/db/network to Module::Build" > for some reason, but his attempt doesn't actually result in any > modules getting installed (setting pm_files() like that isn't enough). > > The easiest, cleanest and most standard solution is to create a lib > directory and svn move Bio into it. Does anyone have an objection to > me doing this for the network, db and run packages? It will only > affect developers currently working on code in those packages, and > they just need to be aware that an svn update will be rather > dramatic after my change. If it stimulates you into doing this then I'm all for it, but I've waited on getting this fixed long enough I decided to take it on myself to work on it, using the simplest ones. You had mentioned several times you would do this and I hadn't seen any progress. The point: I would really like to get another point release out before we work on splitting things up. Simple as that. From what I have seen (with my few tests) everything (modules, scripts) gets copied into blib just fine and the temp folder for script generation gets cleaned up; I haven't progressed beyond to the installation step, but there isn't anything to me that indicates it wouldn't work. I won't be available until Wed. at the earliest for additional comment (out of town, no internet connection). chris From bix at sendu.me.uk Sun Aug 9 09:15:07 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 14:15:07 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> Message-ID: <4A7ECBDB.9030505@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> The easiest, cleanest and most standard solution is to create a lib >> directory and svn move Bio into it. Does anyone have an objection to >> me doing this for the network, db and run packages? It will only >> affect developers currently working on code in those packages, and >> they just need to be aware that an svn update will be rather dramatic >> after my change. > > From what I have seen (with my few tests) everything (modules, scripts) > gets copied into blib just fine and the temp folder for script > generation gets cleaned up; I haven't progressed beyond to the > installation step, but there isn't anything to me that indicates it > wouldn't work. ./Build testinstall will show you it doesn't work as-is. If you're in a rush I'll just do the svn moves and we can revert later if anyone complains. From cjfields at illinois.edu Sun Aug 9 09:19:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:19:30 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <2790F9A5-43E8-47E5-B5AA-98239B95EF04@illinois.edu> On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. > > If you're in a rush I'll just do the svn moves and we can revert > later if anyone complains. Works for me. The sooner it gets done the better (next week, would be nice, but two is fine so we don't rush it too much). I'll be working on several other bits, including FASTQ, when I get back Wed, then I'll merge over and work on the next point release. chris From cjfields at illinois.edu Sun Aug 9 09:34:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:34:07 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. Sorry, I'll be leaving in the next hour, but for the above, did you mean './Build fakeinstall'? As long as you're moving everything into /lib (which I fully support), we should consider hard_coding scripts into bp_foo.PLS syntax seeing as we're going through additional trouble of converting them over. That is, unless there is a specific purpose to keeping them without the 'bp_'. chris From bix at sendu.me.uk Sun Aug 9 10:00:18 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 15:00:18 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <4A7ED672.20701@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>>> The easiest, cleanest and most standard solution is to create a lib >>>> directory and svn move Bio into it. Does anyone have an objection to >>>> me doing this for the network, db and run packages? It will only >>>> affect developers currently working on code in those packages, and >>>> they just need to be aware that an svn update will be rather >>>> dramatic after my change. >>> >>> From what I have seen (with my few tests) everything (modules, >>> scripts) gets copied into blib just fine and the temp folder for >>> script generation gets cleaned up; I haven't progressed beyond to the >>> installation step, but there isn't anything to me that indicates it >>> wouldn't work. >> >> ./Build testinstall will show you it doesn't work as-is. > > Sorry, I'll be leaving in the next hour, but for the above, did you mean > './Build fakeinstall'? Yes, sorry. > As long as you're moving everything into /lib (which I fully support), > we should consider hard_coding scripts into bp_foo.PLS syntax seeing as > we're going through additional trouble of converting them over. That > is, unless there is a specific purpose to keeping them without the 'bp_'. (The final suffix is supposed to be .pl - we convert from PLS to pl in core, no conversion needed in db) Yes, for only a handful of scripts, it actually makes sense to flatten them all into a new bin directory, which is the default script location for Module::Build. So for example I'd do: svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl etc. From bix at sendu.me.uk Sun Aug 9 12:13:03 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 17:13:03 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EF58F.9000909@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. These issues should now be resolved. I'll note that for future cases similar to 3), if a user chooses to install an optional dependency using CPAN/CPANPLUS and the installation of that external module causes an infinite loop, it's an issue of that module or CPAN/CPANPLUS, not BioPerl. The solution from our end is to tell the user to choose not to install that dependency or ask on the CPAN mailing list if they really need it. (I've often got stuck in infinite loops just trying to install Bundle::CPAN! CPAN itself will detect infinite loops after a while and kill itself.) From jdalzell03 at qub.ac.uk Sun Aug 9 05:06:26 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Sun, 9 Aug 2009 02:06:26 -0700 (PDT) Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <24885345.post@talk.nabble.com> Thanks for the replies, I emailed Chris and Brian individually, but I guess it would be helpfull if I threw my solution to "the dogs" In the end I found that by downloading subversion (you need to sign up to collabnet for a user account first), and following the installation instructions of the relevant subversion pages on the bioperl site (http://www.bioperl.org/wiki/Using_Subversion), that It downloaded fine first time. No need for CPAN, or a PPM, just copy paste 'svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live' into your command line, and it auto installs in under 30 seconds...definately the way to go for anyone else out there trying to bust-a-move on a Win machine. At time of writing, I have also installed BioPerl-db (same as above, copy and paste 'svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db' into command line), and BioPerl-run (I typed in 'svn co svn://code.open-bio.org/bioperl/bioperl-run/trunk bio' (I THINK), and it worked fine. The relevant installation instructions don't give an explicit command for BP-run installation, but I think that matches the branches and trunk in the subversion repository (if not, sorry, but you can cross ref its position in there easily by following the links). Both have worked without problem on Strawberry Perl 5.10 through WinVista, so far. Jonny -- View this message in context: http://www.nabble.com/bioperl-1.6-installation-on-vista-with-perl-5.10-tp24875623p24885345.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From mwhagen85 at gmail.com Mon Aug 10 14:54:53 2009 From: mwhagen85 at gmail.com (OjoLoco) Date: Mon, 10 Aug 2009 11:54:53 -0700 (PDT) Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits Message-ID: <24905417.post@talk.nabble.com> Hello all, I have found matching sequences between two genomes and I would now like to create a graphic that contains a heat map-like track that will show areas of the genome that were found more often than others. For every nt I have the number of times it was found, so if it was found very often it would be a darker color than say a nt that wasn't found at all. Is there any way to achieve this using built in BioPerl graphics? Thank you for your time. -- View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Mon Aug 10 15:22:36 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 10 Aug 2009 15:22:36 -0400 Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits In-Reply-To: <24905417.post@talk.nabble.com> References: <24905417.post@talk.nabble.com> Message-ID: Hi, You should be able to do that with wiggle_density and wiggle_xyplot glyphs. See http://gmod.org/wiki/GBrowse/Uploading_Wiggle_Tracks for instructions on constructing wiggle plots. After you have a wiggle plot, you'll need the wiggle2gff3.pl script (which is part of GBrowse, but it will should run fine on its own), which you can get from GMOD's cvs: http://gmod.cvs.sourceforge.net/viewvc/*checkout*/gmod/Generic-Genome-Browser/bin/wiggle2gff3.pl which will convert the wig file to a binary file. Then you can create Bio::SeqFeatureI objects that will work with Bio::Graphics to draw the density or xyplot. Note as well that Bio::Graphics is no longer part of the main BioPerl distribution, so you'll need to get the most recent version from CPAN. Also, fair warning: I've never actually done this; I've only used wiggle plots in the context of GBrowse, but it should work pretty much as described. Scott On Aug 10, 2009, at 2:54 PM, OjoLoco wrote: > > Hello all, > I have found matching sequences between two genomes and I would > now like > to create a graphic that contains a heat map-like track that will > show areas > of the genome that were found more often than others. For every nt > I have > the number of times it was found, so if it was found very often it > would be > a darker color than say a nt that wasn't found at all. Is there any > way to > achieve this using built in BioPerl graphics? Thank you for your time. > -- > View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jdalzell03 at qub.ac.uk Tue Aug 11 11:07:52 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:07:52 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <24919498.post@talk.nabble.com> Hi, trying to run the example given for Bio::Tools::HMM on the Bioperl site, and when I try to run it, I get this in the command line... "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. Compilation failed in require at HMM.txt line 4. BEGIN failed--compilation aborted at HMM.txt line 4." I have installed the entire bioperl-ext package through subversion, and it looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it won't work. Am I missing something? I'm under the impression that the C-compiler comes with bioperl-ext (which installed with no reported problems)? I concede that I am extrememly new to both Perl in general and Bioperl more specifically, but I have followed the instructions which I can find. I have the bioperl core installed in addition to bioperl-db and bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that most work through Linux systems...I am at times sorely tempted myself. Any suggestions would be welcomed gratefully, cheers, Jonny ps. this is the partial script I was trying to run... #!/usr/bin/perl -w usr strict; use Bio::Tools::HMM; use Bio::SeqIO; use Bio::Matrix::Scoring; #Create a HMM object #ACGT are the bases NC mean non-coding and coding $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); #Initialise some training observation sequences $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); @seqs = ($seq1, $seq2); #Train the HMM with the observation sequences $hmm ->baum_welch_training(\@seqs); #Get parameters $init = $hmm->init_prob; #Returns an array reference $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring I realise that this is incomplete. -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shameer at ncbs.res.in Tue Aug 11 13:07:20 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 11 Aug 2009 22:37:20 +0530 (IST) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Hello Jonny, Are you sure that you have a compiled version of HMMER installed in your machine ? -- K. Shameer > Hi, > > trying to run the example given for Bio::Tools::HMM on the Bioperl site, > and > when I try to run it, I get this in the command line... > > "The C-compiled engine for Hidden Markov Model (HMM) has not been > installed. > Please read the install the bioperl-ext package > > BEGIN failed--compilation aborted at > C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. > Compilation failed in require at HMM.txt line 4. > BEGIN failed--compilation aborted at HMM.txt line 4." > > I have installed the entire bioperl-ext package through subversion, and it > looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it > won't work. Am I missing something? I'm under the impression that the > C-compiler comes with bioperl-ext (which installed with no reported > problems)? I concede that I am extrememly new to both Perl in general and > Bioperl more specifically, but I have followed the instructions which I > can > find. I have the bioperl core installed in addition to bioperl-db and > bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that > most > work through Linux systems...I am at times sorely tempted myself. > > Any suggestions would be welcomed gratefully, > cheers, > Jonny > > ps. this is the partial script I was trying to run... > > #!/usr/bin/perl -w > > usr strict; > use Bio::Tools::HMM; > use Bio::SeqIO; > use Bio::Matrix::Scoring; > > #Create a HMM object > #ACGT are the bases NC mean non-coding and coding > $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); > > #Initialise some training observation sequences > $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); > $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); > @seqs = ($seq1, $seq2); > > #Train the HMM with the observation sequences > $hmm ->baum_welch_training(\@seqs); > > #Get parameters > $init = $hmm->init_prob; #Returns an array reference > $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring > $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring > > I realise that this is incomplete. > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jdalzell03 at qub.ac.uk Tue Aug 11 11:14:59 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:14:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24919603.post@talk.nabble.com> I should point out perhaps that CPAN is not an option on a Win setup...it has never worked for anything I have tried to install. Although I'm using Strawberry Perl now, I had no success getting bioperl or any of its components through the activestate PPM either (One of the reasons I ended up going to Strawberry). The only option I have for installation is the subversion server. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919603.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 11:42:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:42:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24920117.post@talk.nabble.com> I realise that this looks like there is a problem with Bio::Tools::HMM when looking at the source code, but I've even tried replacing the HMM.pm file I had with the HMM.pm script at http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, and now I'm getting... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: C:/strawberry/perl/lib C:/strawberry/perl/site/ lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." ?? jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24920117.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 14:52:21 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 11:52:21 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Message-ID: <24923606.post@talk.nabble.com> Hi, I'm as sure as I can be. I look in the HHMER folder and it contains "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something to do with @INC, but I put "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at the top of my script, which definately encompasses the directory it should be in, and I still get... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib C:/strawberry/perl/site/lib/ Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." I'm out of ideas. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From rmb32 at cornell.edu Tue Aug 11 15:23:56 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:23:56 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24920117.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> Message-ID: <4A81C54C.5020905@cornell.edu> Jonny, For quicker help you might want to try #bioperl on freenode. That said, the problem here is that when you get code from subversion, you are not really 'installing' it, you are just copying it to your machine. Part of the installation process is compiling these things, and for that you need a working C compiler. I don't know anything about using BioPerl on Windows, but as a general recommendation I would say go back to the CPAN and/or ppm directions and getting those working. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Jonny Dalzell wrote: > I realise that this looks like there is a problem with Bio::Tools::HMM when > looking at the source code, but I've even tried replacing the HMM.pm file I > had with the HMM.pm script at > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, > and now I'm getting... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: > C:/strawberry/perl/lib C:/strawberry/perl/site/ > lib .) at HMM.txt line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > ?? > > jonny From maj at fortinbras.us Tue Aug 11 15:22:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 15:22:42 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <7C7654A8A64E49158F6761EE09C9F297@NewLife> Jonny, You need the HMMER application, which is not part of BioPerl. See http://hmmer.janelia.org/ for download options. MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 2:52 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Tue Aug 11 15:48:11 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:48:11 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81C54C.5020905@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> Message-ID: <4A81CAFB.5050903@cornell.edu> Elaborating more, the 'C-compiled engine' error comes because Bio::Ext::HMM is not installed, because bioperl-ext is not installed (correctly), because Bio::Ext::HMM is an XS extension written in C. Which needs to be compiled. With a C compiler. As part of some kind of installation process, not just copying the files to a machine with subversion. Rob Robert Buels wrote: > Jonny, > > For quicker help you might want to try #bioperl on freenode. > > That said, the problem here is that when you get code from subversion, > you are not really 'installing' it, you are just copying it to your > machine. Part of the installation process is compiling these things, > and for that you need a working C compiler. > > I don't know anything about using BioPerl on Windows, but as a general > recommendation I would say go back to the CPAN and/or ppm directions and > getting those working. > > Rob > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From bix at sendu.me.uk Tue Aug 11 16:11:43 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 11 Aug 2009 21:11:43 +0100 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <4A81D07F.6000703@sendu.me.uk> Jonny Dalzell wrote: > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. lib (or at least one entry in your PERL5LIB) needs to point to the directory that contains the Bio directory. So: use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; Now it will be able to locate Bio::Tools::Hmm. You'll still get your original error because you don't have Hmmer installed. See Mark's reply. From jdalzell03 at qub.ac.uk Tue Aug 11 16:29:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:29:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81D07F.6000703@sendu.me.uk> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> Message-ID: <24925178.post@talk.nabble.com> Hi, thanks. I did install HHMER from the site Mark suggested, and it is within the directories that perl recognizes when reading the script...still I get "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package" Is it possible that this module simply won't run through windows? jonny Sendu Bala-2 wrote: > > Jonny Dalzell wrote: >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >> something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >> the top of my script, which definately encompasses the directory it >> should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >> HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. > > lib (or at least one entry in your PERL5LIB) needs to point to the > directory that contains the Bio directory. So: > > use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > > Now it will be able to locate Bio::Tools::Hmm. You'll still get your > original error because you don't have Hmmer installed. See Mark's reply. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 16:31:36 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:31:36 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81CAFB.5050903@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> Message-ID: <24925211.post@talk.nabble.com> OK, so is there any particular C-compiler which I should use? Thanks, jonny Robert Buels wrote: > > Elaborating more, the 'C-compiled engine' error comes because > Bio::Ext::HMM is not installed, because bioperl-ext is not installed > (correctly), because Bio::Ext::HMM is an XS extension written in C. > Which needs to be compiled. With a C compiler. As part of some kind of > installation process, not just copying the files to a machine with > subversion. > > Rob > > Robert Buels wrote: >> Jonny, >> >> For quicker help you might want to try #bioperl on freenode. >> >> That said, the problem here is that when you get code from subversion, >> you are not really 'installing' it, you are just copying it to your >> machine. Part of the installation process is compiling these things, >> and for that you need a working C compiler. >> >> I don't know anything about using BioPerl on Windows, but as a general >> recommendation I would say go back to the CPAN and/or ppm directions and >> getting those working. >> >> Rob >> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Tue Aug 11 17:05:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 17:05:10 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925178.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: Jonny, It will run in Win/Vis but there are some caveats. The BioPerl package has some plain C components, as Rob pointed out. These need to be compiled, and the objects/libraries put in the right place. CPAN will cause this to happen when you have a compiler available; ActiveState .ppm will download the binaries directly from the repository (my understanding, anyway). CPAN is always available by doing > perl -MCPAN -e shell but you may not have a C compiler around. This is a little tricky. You can either explore Visual C/C++ options from MS here http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, and install Cygwin (www.cygwin.com), which creates a linux-like environment with GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful as the real thing, I grant. Which bring me to a third possibility, that I haven't tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot system (https://help.ubuntu.com/community/WindowsDualBoot). MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 4:29 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > thanks. I did install HHMER from the site Mark suggested, and it is within > the directories that perl recognizes when reading the script...still I get > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > Please read the install the bioperl-ext package" > > Is it possible that this module simply won't run through windows? > > jonny > > > > Sendu Bala-2 wrote: >> >> Jonny Dalzell wrote: >>> Hi, >>> >>> I'm as sure as I can be. I look in the HHMER folder and it contains >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >>> something >>> to do with @INC, but I put >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >>> the top of my script, which definately encompasses the directory it >>> should >>> be in, and I still get... >>> >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >>> C:/strawberry/perl/site/lib/ >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >>> HMM.txt >>> line 5. >>> BEGIN failed--compilation aborted at HMM.txt line 5." >>> >>> I'm out of ideas. >> >> lib (or at least one entry in your PERL5LIB) needs to point to the >> directory that contains the Bio directory. So: >> >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; >> >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your >> original error because you don't have Hmmer installed. See Mark's reply. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue Aug 11 17:39:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 12 Aug 2009 09:39:30 +1200 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB6F93AA@exchsth.agresearch.co.nz> Dev-C++ http://www.bloodshed.net/devcpp.html is a good (i.e. free under GPL) Windows compiler I've used before. Might save having to install Cygwin. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Wednesday, 12 August 2009 9:05 a.m. > To: Jonny Dalzell; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Jonny, > It will run in Win/Vis but there are some caveats. The BioPerl package has > some > plain C components, as Rob pointed out. These need to be compiled, and the > objects/libraries put in the right place. CPAN will cause this to happen when > you have a compiler available; ActiveState .ppm will download the binaries > directly from the repository (my understanding, anyway). CPAN is always > available by doing > > > perl -MCPAN -e shell > > but you may not have a C compiler around. This is a little tricky. You can > either explore Visual C/C++ options from MS here > http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, > and install Cygwin (www.cygwin.com), which creates a linux-like environment > with > GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful > as > the real thing, I grant. Which bring me to a third possibility, that I haven't > tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot > system (https://help.ubuntu.com/community/WindowsDualBoot). > MAJ > ----- Original Message ----- > From: "Jonny Dalzell" > To: > Sent: Tuesday, August 11, 2009 4:29 PM > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > > > > > Hi, > > > > thanks. I did install HHMER from the site Mark suggested, and it is within > > the directories that perl recognizes when reading the script...still I get > > > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > > Please read the install the bioperl-ext package" > > > > Is it possible that this module simply won't run through windows? > > > > jonny > > > > > > > > Sendu Bala-2 wrote: > >> > >> Jonny Dalzell wrote: > >>> Hi, > >>> > >>> I'm as sure as I can be. I look in the HHMER folder and it contains > >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > >>> something > >>> to do with @INC, but I put > >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > >>> the top of my script, which definately encompasses the directory it > >>> should > >>> be in, and I still get... > >>> > >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > >>> C:/strawberry/perl/site/lib/ > >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > >>> HMM.txt > >>> line 5. > >>> BEGIN failed--compilation aborted at HMM.txt line 5." > >>> > >>> I'm out of ideas. > >> > >> lib (or at least one entry in your PERL5LIB) needs to point to the > >> directory that contains the Bio directory. So: > >> > >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > >> > >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your > >> original error because you don't have Hmmer installed. See Mark's reply. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > -- > > View this message in context: > > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista-- > tp24919498p24925178.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Tue Aug 11 19:44:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:44:23 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext that generates HMM's (XS-based bindings I think). I have managed to compile it successfully on Ubuntu and Mac OS X, but WinVista is a whole different bag-o-worms altogether (untested AFAIK). For the record, I do not recommend using it; I'm unsure about it's maintenance status, so it may be released separately. It would be best to use something better supported, such as the HMMER wrapper in bioperl-run and the hmmer parsers in bioperl-core. We may also have wrappers for similar code available in biolib at some future point. chris On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ > Tools/";" at > the top of my script, which definately encompasses the directory it > should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ > per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 11 19:48:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:48:08 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925211.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> <24925211.post@talk.nabble.com> Message-ID: <3A5CA958-3B03-4252-B78F-07BBFF1FA355@illinois.edu> Any C-based code should use the same compiler used from whatever perl version you are running. ActiveState supports both VC/C++ (as Mark indicates) or mingw/gcc. I think Strawberry supports mainly the latter. Though you can use CygWin, I think a native Win module is the best way to go if possible. It will likely be a tricky road, so keep us updated and we'll attempt to help out the best we can. chris On Aug 11, 2009, at 3:31 PM, Jonny Dalzell wrote: > > OK, > > so is there any particular C-compiler which I should use? > > Thanks, > jonny > > > > Robert Buels wrote: >> >> Elaborating more, the 'C-compiled engine' error comes because >> Bio::Ext::HMM is not installed, because bioperl-ext is not installed >> (correctly), because Bio::Ext::HMM is an XS extension written in C. >> Which needs to be compiled. With a C compiler. As part of some >> kind of >> installation process, not just copying the files to a machine with >> subversion. >> >> Rob >> >> Robert Buels wrote: >>> Jonny, >>> >>> For quicker help you might want to try #bioperl on freenode. >>> >>> That said, the problem here is that when you get code from >>> subversion, >>> you are not really 'installing' it, you are just copying it to your >>> machine. Part of the installation process is compiling these >>> things, >>> and for that you need a working C compiler. >>> >>> I don't know anything about using BioPerl on Windows, but as a >>> general >>> recommendation I would say go back to the CPAN and/or ppm >>> directions and >>> getting those working. >>> >>> Rob >>> >>> >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Aug 11 20:09:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 20:09:01 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> Message-ID: <69BDE54FD5C943669BCD41A9A607634A@NewLife> [OOps. Sorry about that. The compiler ideas still apply however.] ----- Original Message ----- From: "Chris Fields" To: "Jonny Dalzell" Cc: Sent: Tuesday, August 11, 2009 7:44 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext > that generates HMM's (XS-based bindings I think). I have managed to compile > it successfully on Ubuntu and Mac OS X, but WinVista is a whole different > bag-o-worms altogether (untested AFAIK). > > For the record, I do not recommend using it; I'm unsure about it's > maintenance status, so it may be released separately. It would be best to > use something better supported, such as the HMMER wrapper in bioperl-run and > the hmmer parsers in bioperl-core. We may also have wrappers for similar > code available in biolib at some future point. > > chris > > On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > >> >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ Tools/";" at >> the top of my script, which definately encompasses the directory it should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. >> >> Jonny >> -- >> View this message in context: >> http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Aug 12 12:44:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 Aug 2009 11:44:37 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ED672.20701@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> <4A7ED672.20701@sendu.me.uk> Message-ID: <1F099DCC-073E-470E-873A-608E674375C1@illinois.edu> On Aug 9, 2009, at 9:00 AM, Sendu Bala wrote: > Chris Fields wrote: > ... >> As long as you're moving everything into /lib (which I fully >> support), we should consider hard_coding scripts into bp_foo.PLS >> syntax seeing as we're going through additional trouble of >> converting them over. That is, unless there is a specific purpose >> to keeping them without the 'bp_'. > > (The final suffix is supposed to be .pl - we convert from PLS to pl > in core, no conversion needed in db) Yes, had that reversed in my commit. Thanks. > Yes, for only a handful of scripts, it actually makes sense to > flatten them all into a new bin directory, which is the default > script location for Module::Build. > > So for example I'd do: > svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl > etc. Yes, exactly. It seems we're going out of our way to keep things as they were previously when using ExtUtil::MakeMaker/Makefile.PL. I'm not quite sure why we've bent over backwards to work around these issues when it is much easier to stick to simple standards that 99% of CPAN uses: scripts in bin (or whatever dir is passed to script_files), modules in lib. I'm not complaining, just haven't heard an explanation about that one way or the other. chris From rmb32 at cornell.edu Thu Aug 13 14:59:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 11:59:00 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A79A52E.7000104@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> Message-ID: <4A846274.4000600@cornell.edu> OK, commit 15927 adds some more info about -db options for Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, nuccore, nucgss, nucest, and unigene, and including a link to an (XML) page from NCBI that lists inputs that NCBI accepts. Could somebody who knows more about eUtils than me also review this patch and make corrections if necessary? Rob Robert Buels wrote: > I think you're looking for the -db => 'nucgss' option. > > I'll add a better listing of this (undocumented) options to the > Bio::DB::Query::GenBank docs. > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jdalzell03 at qub.ac.uk Thu Aug 13 15:27:14 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Thu, 13 Aug 2009 12:27:14 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24957222.post@talk.nabble.com> Fellows, thanks very much for the input. However, today I saw fit to dual-boot with ubuntu. I've installed everything, but I still get the same "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package " message! Is it ridiculous of me to expect ubuntu to take care of this for me? How do I go about compiling the HMM? Thanks in advance, Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24957222.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Aug 13 15:41:21 2009 From: jonathanmflowers at gmail.com (Jonathan Flowers) Date: Thu, 13 Aug 2009 12:41:21 -0700 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO Message-ID: Hi, I am trying to parse BLAST reports written in XML using Bio::SearchIO. When running the following code on a set of reports (multiple query results in a single file), I only get one ResultI object. I tried running the same code on a file in 'blast' format and obtained the expected results (ie one ResultI object for each query), suggesting that the issue is with blastxml. I found an old thread on this listserv where someone had had a similar problem, but could not find how it was resolved. I am using Bioperl 1.5.2 and the XML reports were generated using blastall with the -m7 option. my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'blastreport.xml' ); while( my $result = $in->next_result ) { print $result->query_name,"\n"; while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #do something with hsp } } } Thanks Jonathan From rmb32 at cornell.edu Thu Aug 13 17:37:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 14:37:21 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24957222.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> Message-ID: <4A848791.4010402@cornell.edu> Jonny Dalzell wrote: > Is it ridiculous of me to expect ubuntu to take care of this for me? How do > I go about compiling the HMM? Yes. This is a very specialized thing that you're doing, and Ubuntu does not have the resources to package every single thing. Unfortunately, it looks like bioperl-ext package is not installable under Ubuntu 9.04 anyway, which is what I'm running. For others on this list, if somebody is interested in doing maintaining it, I'd be happy to help out by testing on Debian-based Linux platforms. We need to clarify this package's maintenance status: if there is nobody interested in maintaining it, I would recommend that bioperl-ext be removed from distribution. It's not in anybody's interest to have unmaintained software out there causing confusion. So Jonny, in short, I would say "do not use bioperl-ext". Step back. What are you trying to accomplish? Chris already recommended some alternative methods in his email of 8/11 on this subject. Perhaps we can guide you to some software that is actively maintained and will meet your needs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 13 18:06:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:06:29 -0500 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A846274.4000600@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> <4A846274.4000600@cornell.edu> Message-ID: <916D0E26-EBB5-4E28-99AD-F689639BB93A@illinois.edu> It looks fine. As for the databases, you can always get the latest databases using a script from bioperl-live, which uses Bio::DB::EUtilities to access them directly (scripts/DB_EUtilities/ einfo.PLS, which should install as bp_einfo.pl). (looking at the below, what is blastdbinfo?) cjfields4:DB_EUtilities cjfields$ perl einfo.PLS pubmed protein nucleotide nuccore nucgss nucest structure genome biosystems blastdbinfo books cancerchromosomes cdd gap domains gene genomeprj gensat geo gds homologene journals mesh ncbisearch nlmcatalog omia omim pepdome pmc popset probe proteinclusters pcassay pccompound pcsubstance snp sra taxonomy toolkit unigene chris On Aug 13, 2009, at 1:59 PM, Robert Buels wrote: > OK, commit 15927 adds some more info about -db options for > Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, > nuccore, nucgss, nucest, and unigene, and including a link to an > (XML) page from NCBI that lists inputs that NCBI accepts. > > Could somebody who knows more about eUtils than me also review this > patch and make corrections if necessary? > > Rob > > Robert Buels wrote: >> I think you're looking for the -db => 'nucgss' option. >> I'll add a better listing of this (undocumented) options to the >> Bio::DB::Query::GenBank docs. >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:08:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:08:37 -0500 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO In-Reply-To: References: Message-ID: <65CC2787-7F0A-43C1-A840-554A2E4FD76A@illinois.edu> You should update to bioperl 1.6; I believe I fixed this issue after the 1.5.2 release. chris On Aug 13, 2009, at 2:41 PM, Jonathan Flowers wrote: > Hi, > > I am trying to parse BLAST reports written in XML using > Bio::SearchIO. When > running the following code on a set of reports (multiple query > results in a > single file), I only get one ResultI object. I tried running the > same code > on a file in 'blast' format and obtained the expected results (ie one > ResultI object for each query), suggesting that the issue is with > blastxml. > I found an old thread on this listserv where someone had had a similar > problem, but could not find how it was resolved. > > I am using Bioperl 1.5.2 and the XML reports were generated using > blastall > with the -m7 option. > > my $in = new Bio::SearchIO(-format => 'blastxml', -file => > 'blastreport.xml' ); > while( my $result = $in->next_result ) { > print $result->query_name,"\n"; > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #do something with hsp > } > } > } > > Thanks > > Jonathan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:18:57 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:18:57 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A848791.4010402@cornell.edu> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> <4A848791.4010402@cornell.edu> Message-ID: On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > Jonny Dalzell wrote: >> Is it ridiculous of me to expect ubuntu to take care of this for >> me? How do >> I go about compiling the HMM? > Yes. This is a very specialized thing that you're doing, and Ubuntu > does not have the resources to package every single thing. > > Unfortunately, it looks like bioperl-ext package is not installable > under Ubuntu 9.04 anyway, which is what I'm running. For others on > this list, if somebody is interested in doing maintaining it, I'd be > happy to help out by testing on Debian-based Linux platforms. We > need to clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that bioperl- > ext be removed from distribution. It's not in anybody's interest to > have unmaintained software out there causing confusion. I have cc'd Yee Man Chan for this. If there isn't a response or the message bounces, we do one of two things: 1) consider it deprecated (probably safest). 2) spin it out into a separate module. Just tried to comile it myself and am getting errors (using 64bit perl 5.10), so I think, unless someone wants to take this on, option #1 is best. > So Jonny, in short, I would say "do not use bioperl-ext". In general, that's a safe bet. We're moving most of our C/C++ bindings to BioLib. > Step back. What are you trying to accomplish? Chris already > recommended some alternative methods in his email of 8/11 on this > subject. Perhaps we can guide you to some software that is actively > maintained and will meet your needs. > > Rob Exactly. Lots of other (better supported!) options out there. HMMER, SeqAn, and others. chris From cjfields at illinois.edu Thu Aug 13 20:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 19:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <650586.94518.qm@web30407.mail.mud.yahoo.com> References: <650586.94518.qm@web30407.mail.mud.yahoo.com> Message-ID: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> (just to point out to everyone, Yee Man's contact information was in the POD) Yee Man, I have the output in the below link: http://gist.github.com/167542 There are similar problems popping up on 32- and 64-bit perl 5.10.0, Mac OS X 10.5. Haven't had time to debug it unfortunately. I think we should seriously consider spinning this code off into it's own distribution for CPAN. It's unfortunately bit-rotting away in bioperl-ext. If you want to continue supporting it I can help set that up. chris On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > Hi > > So is this an HMM only problem? Or does it apply to other bioperl- > ext modules? > > What exactly are the compilation errors for HMM? I believe my > implementation is just a simple one based on Rabiner's paper. > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > ~murphyk%2FBayes > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > I don't think I did anything fancy that makes it machine > dependent or non-ANSI C. > > Yee Man > > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Jonny Dalzell" , "BioPerl List" > >, "Yee Man Chan" >> Date: Thursday, August 13, 2009, 3:18 PM >> >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >> >>> Jonny Dalzell wrote: >>>> Is it ridiculous of me to expect ubuntu to take >> care of this for me? How do >>>> I go about compiling the HMM? >>> Yes. This is a very specialized thing that >> you're doing, and Ubuntu does not have the resources to >> package every single thing. >>> >>> Unfortunately, it looks like bioperl-ext package is >> not installable under Ubuntu 9.04 anyway, which is what I'm >> running. For others on this list, if somebody is >> interested in doing maintaining it, I'd be happy to help out >> by testing on Debian-based Linux platforms. We need to >> clarify this package's maintenance status: if there is >> nobody interested in maintaining it, I would recommend that >> bioperl-ext be removed from distribution. It's not in >> anybody's interest to have unmaintained software out there >> causing confusion. >> >> I have cc'd Yee Man Chan for this. If there isn't a >> response or the message bounces, we do one of two things: >> >> 1) consider it deprecated (probably safest). >> 2) spin it out into a separate module. >> >> Just tried to comile it myself and am getting errors (using >> 64bit perl 5.10), so I think, unless someone wants to take >> this on, option #1 is best. >> >>> So Jonny, in short, I would say "do not use >> bioperl-ext". >> >> In general, that's a safe bet. We're moving most of >> our C/C++ bindings to BioLib. >> >>> Step back. What are you trying to >> accomplish? Chris already recommended some alternative >> methods in his email of 8/11 on this subject. Perhaps >> we can guide you to some software that is actively >> maintained and will meet your needs. >>> >>> Rob >> >> Exactly. Lots of other (better supported!) options >> out there. HMMER, SeqAn, and others. >> >> chris >> > > > From ymc at yahoo.com Thu Aug 13 19:58:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 16:58:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <650586.94518.qm@web30407.mail.mud.yahoo.com> Hi So is this an HMM only problem? Or does it apply to other bioperl-ext modules? What exactly are the compilation errors for HMM? I believe my implementation is just a simple one based on Rabiner's paper. http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg I don't think I did anything fancy that makes it machine dependent or non-ANSI C. Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Jonny Dalzell" , "BioPerl List" , "Yee Man Chan" > Date: Thursday, August 13, 2009, 3:18 PM > > On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > > > Jonny Dalzell wrote: > >> Is it ridiculous of me to expect ubuntu to take > care of this for me?? How do > >> I go about compiling the HMM? > > Yes.? This is a very specialized thing that > you're doing, and Ubuntu does not have the resources to > package every single thing. > > > > Unfortunately, it looks like bioperl-ext package is > not installable under Ubuntu 9.04 anyway, which is what I'm > running.? For others on this list, if somebody is > interested in doing maintaining it, I'd be happy to help out > by testing on Debian-based Linux platforms.? We need to > clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that > bioperl-ext be removed from distribution.? It's not in > anybody's interest to have unmaintained software out there > causing confusion. > > I have cc'd Yee Man Chan for this.? If there isn't a > response or the message bounces, we do one of two things: > > 1) consider it deprecated (probably safest). > 2) spin it out into a separate module. > > Just tried to comile it myself and am getting errors (using > 64bit perl 5.10), so I think, unless someone wants to take > this on, option #1 is best. > > > So Jonny, in short, I would say "do not use > bioperl-ext". > > In general, that's a safe bet.? We're moving most of > our C/C++ bindings to BioLib. > > > Step back.? What are you trying to > accomplish?? Chris already recommended some alternative > methods in his email of 8/11 on this subject.? Perhaps > we can guide you to some software that is actively > maintained and will meet your needs. > > > > Rob > > Exactly.? Lots of other (better supported!) options > out there.? HMMER, SeqAn, and others. > > chris > From agulyaskov at mail.rockefeller.edu Thu Aug 13 20:40:22 2009 From: agulyaskov at mail.rockefeller.edu (Attila Gulyas-Kovacs) Date: Thu, 13 Aug 2009 20:40:22 -0400 Subject: [Bioperl-l] bus error when indexing large file Message-ID: <4A84B276.2040706@mail.rockefeller.edu> Dear all, I can index the SwissProt database without problem but I get bus error when I try to index the much larger TrEMBL database. Indexing failed with both the swissprot and fasta format (using Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke up TrEMBL into multiple files ('chunks'), about the size of the SwissProt database. Then I could could create separate indeces for each chunk. But I got bus error when I passed all chunks simultaneously to my script (below) to create a single index. Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. What do you suggest? Attila #! /usr/bin/perl use warnings; use strict; use Bio::Index::Swissprot; my $index_file_name = shift; my $inx = Bio::Index::Swissprot->new( -filename => $index_file_name, -write_flag => 1); $inx->make_index(@ARGV); -- Attila Gulyas-Kovacs Postdoctoral Associate Rockefeller University Gadsby Lab (Cardiac/Membrane Physiology) D.W. Bronk Building, Room 307 1230 York Avenue New York, NY, 10065 Tel: (212)327-8617 Fax: (212)327-7589 From ymc at yahoo.com Fri Aug 14 00:15:41 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 21:15:41 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> Message-ID: <528790.13637.qm@web30404.mail.mud.yahoo.com> Hi all Based on my understanding of the warning messages, the problem seems to come from the "typemap" file when I cast the return from SvIV from an integer to a pointer. I suppose this might cause problems in 64-bit machines. But when I look at perlguts and perlxs, it does seem to me that the way I did in typemap is the suggested way to do it because the IV type is "guaranteed to be big enough to hold a pointer". Nevertheless, I modified my typemap file to look exactly like what's in perlxs. (See PS) Does anyone know how to deal with this problem? Or can anyone of you give me access to a 64-bit machine to sort this out? Thank you! Yee Man PS This is a typemap file using exactly the same lines suggested by perlxs. It works in my 32-bit machine. Can someone try it on a 64-bit machine? Thanks ================================================ TYPEMAP HMM * T_HMM INPUT T_HMM if (sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG)) $var = ($type)SvIV((SV*)SvRV( $arg )); else{ warn( \"${Package}::$func_name() -- $var is not a blessed SV referenc e\" ); XSRETURN_UNDEF; } OUTPUT T_HMM sv_setref_pv($arg, "Bio::Ext::HMM::HMM", (void*) $var); ======================================================== --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > From ymc at yahoo.com Fri Aug 14 04:27:11 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 01:27:11 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <168012.97676.qm@web30405.mail.mud.yahoo.com> Ah.. I find that the typemap can become as simple as this ===================== TYPEMAP HMM * T_PTROBJ ===================== Then the generated HMM.c will have a function called INT2PTR to do the pointer conversion. I believe this should solve the warnings. Attached are the updated HMM.xs and typemap. Can someone with a 64-bit machine give it a try? Thank you Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5588 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: typemap Type: application/octet-stream Size: 26 bytes Desc: not available URL: From cjfields at illinois.edu Fri Aug 14 10:20:21 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:20:21 -0500 Subject: [Bioperl-l] bus error when indexing large file In-Reply-To: <4A84B276.2040706@mail.rockefeller.edu> References: <4A84B276.2040706@mail.rockefeller.edu> Message-ID: I can attempt to reproduce this (I have very similar specs). I'm wondering if it has something to do with large file support. Have you tried the perl packaged with Mac OS X? I think it's perl 5.8.8. chris On Aug 13, 2009, at 7:40 PM, Attila Gulyas-Kovacs wrote: > Dear all, > > I can index the SwissProt database without problem but I get bus > error when I try to index the much larger TrEMBL database. Indexing > failed with both the swissprot and fasta format (using > Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke > up TrEMBL into multiple files ('chunks'), about the size of the > SwissProt database. Then I could could create separate indeces for > each chunk. But I got bus error when I passed all chunks > simultaneously to my script (below) to create a single index. > Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. > > What do you suggest? > > Attila > > > #! /usr/bin/perl > use warnings; > use strict; > use Bio::Index::Swissprot; > my $index_file_name = shift; > my $inx = Bio::Index::Swissprot->new( > -filename => $index_file_name, > -write_flag => 1); > $inx->make_index(@ARGV); > > -- > Attila Gulyas-Kovacs > Postdoctoral Associate > > Rockefeller University > Gadsby Lab (Cardiac/Membrane Physiology) > D.W. Bronk Building, Room 307 1230 York Avenue > New York, NY, 10065 > Tel: (212)327-8617 > Fax: (212)327-7589 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Aug 14 10:10:33 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 16:10:33 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence Message-ID: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Hi everyone, I'm using Bio::AlignIO to read in a series of multiple alignments. Occasionally, an alignment will have a sequence which consists entirely of gaps (these are actually trimmed sub-alignments; that's why). Each time I read in such an alignment, an error will be raised when the Bio::LocatableSeq object is created for the all-gap sequence (actually, the error comes from the superclass Bio::PrimarySeq). To my way of thinking, an alignment is not invalid if it contains such all-gap sequences, so there shouldn't be an error. This could be done by having Bio::AlignIO::* passing the -nowarnonempty flag when creating the sequence objects. Any thoughts on this? Is there a better way to suppress the warning than changing the behavior of all the AlignIO modules? Dave From cjfields at illinois.edu Fri Aug 14 10:42:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:42:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Dave, Is this using bioperl-live? I recall this being a problem but I thought it was addressed in svn (and soon in the next point release). chris On Aug 14, 2009, at 9:10 AM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists > entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when > the > Bio::LocatableSeq object is created for the all-gap sequence > (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be > done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating > the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning > than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Fri Aug 14 10:44:42 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 14 Aug 2009 16:44:42 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <716af09c0908140744i4447dffg205ec07daeaaa571@mail.gmail.com> Hi Dave, I have observed the same (with bioperl 1.52) for the same reason. It would be nice not to have these errors as also in my view an all-gaps sequence is a sequence. I also found that sometimes parsing such alignments fails when the all-gaps sequence is the last in the alignment (bug 2744, in Bio::LocatableSeq). Regards, Bernd On Fri, Aug 14, 2009 at 4:10 PM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when the > Bio::LocatableSeq object is created for the all-gap sequence (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Aug 14 11:12:35 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 17:12:35 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Message-ID: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> > > Is this using bioperl-live? Sorry, should've said before. Yes, it's bioperl-live (r15927). I recall this being a problem but I thought it was addressed in svn (and > soon in the next point release). Hmm, the only recent somewhat related change I see (in Bio::AlignIO::*, anyway) is: ------------------------------------------------------------------------ r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 lines deprecate no_sequences/no_residues in main trunk (we can switch the version to 1.7 if deemed necessary) ------------------------------------------------------------------------ Perhaps this is what you were thinking of? Dave From cjfields at illinois.edu Fri Aug 14 11:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <168012.97676.qm@web30405.mail.mud.yahoo.com> References: <168012.97676.qm@web30405.mail.mud.yahoo.com> Message-ID: Yee Man, I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 64-bit) and on dev.open-bio.org (which is perl 5.8.8, appears to be 32-bit). The patch results in cleaning up warnings for 5.10.0 but results in similar warnings for 5.8.8 (linux or OS X). On OS X perl 5.8.8, this sometimes passes (note the first attempt fails, the second succeeds), so it's not entirely a 32-bit issue: http://gist.github.com/167860 OS X and perl 5.10.0, this always fails as the previous gist shows, but demonstrates similar behavior (multiple attempts to test get different responses): http://gist.github.com/167542 On linux, everything passes with or w/o the patched files (patched files have warnings as indicated above): Specs for all three perl executables (they vary a bit): http://gist.github.com/167883 chris On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > Ah.. I find that the typemap can become as simple as this > ===================== > TYPEMAP > HMM * T_PTROBJ > ===================== > > Then the generated HMM.c will have a function called INT2PTR to do > the pointer conversion. I believe this should solve the warnings. > > Attached are the updated HMM.xs and typemap. Can someone with a 64- > bit machine give it a try? > > Thank you > Yee Man > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" > >, "BioPerl List" >> Date: Thursday, August 13, 2009, 5:31 PM >> (just to point out to everyone, Yee >> Man's contact information was in the POD) >> >> Yee Man, >> >> I have the output in the below link: >> >> http://gist.github.com/167542 >> >> There are similar problems popping up on 32- and 64-bit >> perl 5.10.0, Mac OS X 10.5. Haven't had time to debug >> it unfortunately. >> >> I think we should seriously consider spinning this code off >> into it's own distribution for CPAN. It's >> unfortunately bit-rotting away in bioperl-ext. If you >> want to continue supporting it I can help set that up. >> >> chris >> >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >> >>> Hi >>> >>> So is this an HMM only problem? Or does >> it apply to other bioperl-ext modules? >>> >>> What exactly are the compilation errors >> for HMM? I believe my implementation is just a simple one >> based on Rabiner's paper. >>> >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>> ~murphyk%2FBayes >>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>> >>> I don't think I did anything fancy that >> makes it machine dependent or non-ANSI C. >>> >>> Yee Man >>> >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Jonny Dalzell" , >> "BioPerl List" , >> "Yee Man Chan" >>>> Date: Thursday, August 13, 2009, 3:18 PM >>>> >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >>>> >>>>> Jonny Dalzell wrote: >>>>>> Is it ridiculous of me to expect ubuntu to >> take >>>> care of this for me? How do >>>>>> I go about compiling the HMM? >>>>> Yes. This is a very specialized thing >> that >>>> you're doing, and Ubuntu does not have the >> resources to >>>> package every single thing. >>>>> >>>>> Unfortunately, it looks like bioperl-ext >> package is >>>> not installable under Ubuntu 9.04 anyway, which is >> what I'm >>>> running. For others on this list, if >> somebody is >>>> interested in doing maintaining it, I'd be happy >> to help out >>>> by testing on Debian-based Linux platforms. >> We need to >>>> clarify this package's maintenance status: if >> there is >>>> nobody interested in maintaining it, I would >> recommend that >>>> bioperl-ext be removed from distribution. >> It's not in >>>> anybody's interest to have unmaintained software >> out there >>>> causing confusion. >>>> >>>> I have cc'd Yee Man Chan for this. If there >> isn't a >>>> response or the message bounces, we do one of two >> things: >>>> >>>> 1) consider it deprecated (probably safest). >>>> 2) spin it out into a separate module. >>>> >>>> Just tried to comile it myself and am getting >> errors (using >>>> 64bit perl 5.10), so I think, unless someone wants >> to take >>>> this on, option #1 is best. >>>> >>>>> So Jonny, in short, I would say "do not use >>>> bioperl-ext". >>>> >>>> In general, that's a safe bet. We're moving >> most of >>>> our C/C++ bindings to BioLib. >>>> >>>>> Step back. What are you trying to >>>> accomplish? Chris already recommended some >> alternative >>>> methods in his email of 8/11 on this >> subject. Perhaps >>>> we can guide you to some software that is >> actively >>>> maintained and will meet your needs. >>>>> >>>>> Rob >>>> >>>> Exactly. Lots of other (better supported!) >> options >>>> out there. HMMER, SeqAn, and others. >>>> >>>> chris >>>> >>> >>> >>> >> >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Aug 14 11:53:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:53:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> Message-ID: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> On Aug 14, 2009, at 10:12 AM, Dave Messina wrote: > Is this using bioperl-live? > > Sorry, should've said before. Yes, it's bioperl-live (r15927). > > > I recall this being a problem but I thought it was addressed in svn > (and soon in the next point release). > > Hmm, the only recent somewhat related change I see (in > Bio::AlignIO::*, anyway) is: > > ------------------------------------------------------------------------ > r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 > lines > > deprecate no_sequences/no_residues in main trunk (we can switch the > version to 1.7 if deemed necessary) > ------------------------------------------------------------------------ > > > Perhaps this is what you were thinking of? > > Dave Maybe not, then (for some reason I thought this was fixed within LocatableSeq). I know that it is possible to have an all-gap LocatableSeq; this works, but the default start/end/length aren't correct, which is part of Bernd's bug: use Modern::Perl; use Bio::LocatableSeq; my $seq = Bio::LocatableSeq->new( -seq => '-------------', -alphabet => 'dna', ); say $seq->start; # 1 say $seq->end; # undef (?) say $seq->length; # 13, counts the gaps The problem is, to fix all this relies on a whole slew of refactors for LocatableSeq and SimpleAlign. Some of this touches root components as well, so it'll need to be tried on a branch and will very likely result in some API changes (and thus may not be included in 1.6). I'll start a branch to get the process started. chris From jncline at gmail.com Fri Aug 14 15:41:21 2009 From: jncline at gmail.com (Jonathan Cline) Date: Fri, 14 Aug 2009 14:41:21 -0500 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: <99E27D08408340B9B0611751A17DF266@NewLife> References: <99E27D08408340B9B0611751A17DF266@NewLife> Message-ID: <4A85BDE1.5020002@gmail.com> Mark A. Jensen wrote: > Sorry, I cut off the last script. The entire thing follows: > This is exactly what I was looking for - thanks. A method to modify Makefile.PL, install in Activestate, etc is great. Perhaps your method could also be improved for portability by using `cygpath` although few cygwin installs modify this beyond the default (to get rid of hardcoded "/cygdrive/x/"). I will definitely save your code for later. I've implemented another workaround, which is to use Win32::Pipe and other Win32:: methods. This has problems of it's own (support is not 100%) and error-free implementation not as easy as requiring Activestate Perl, however it should work with both Activestate and cygwin-perl (and Unix). ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## > ----- Original Message ----- From: "Jonathan Cline" > To: > Cc: > Sent: Friday, July 31, 2009 11:24 PM > Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl > > >> I recently mentioned working on Bio::Robotics for Tecan. Vendors >> being MS-Win specific, the vendor software allows third-party software >> communication through a named pipe (the literal filename is >> "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific >> and this pseudo-pipe is opened with sysopen() ). This is broken under >> cygwin-perl due to cygwin's method of handling paths -- the sysopen >> fails. However it works under ActiveState Perl and communication >> through the named pipe (to the robot hardware) is OK. The standard >> workaround is usually to use cygwin bash, and force the PATH to use >> ActiveState perl. (Typical MS Windows incompatibility problem.) The >> issue is: Perl module libraries for CPAN work under cygwin-perl >> (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN >> module use, or "make test", result in a bad list of incompatibility >> problems. Yet ActiveState Perl is required for communicating to the >> vendor application (unless there is some workaround to raw filesystem >> access in cygwin-perl that I haven't found in 2 days of working this). >> The stand-alone scripts I have work fine to access the named pipe >> (using ActiveState Perl) since the standalone scripts have no module >> INC dependencies, no CPAN module test harness, etc etc. >> >> This isn't specifically a Bio:: issue, though if anyone has >> suggestions please email. I could try msys and see if it handles the >> named-pipe-special-file better, if msys has an msys-perl distribution. >> >> -- >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Fri Aug 14 19:29:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 18:29:43 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring Message-ID: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> As we have pretty much everything in place for another point release (which I will start merging over this weekend into the 1.6 branch), I have gone ahead and made two branches for refactoring some of the more important pieces of bioperl code. Both refactors may require API changes; if so these will be part of a 1.7 release. 1) GFF - entail refactoring bioperl code to better handle GFF2/3. This is a large section of code, so small incremental changes may be merged to trunk over time (and thus may involve several branches). Included is refactoring of feature typing to be more consistent and lightweight, and will initially involve Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be deprecated in the process). See the following for additional details: http://www.bioperl.org/wiki/GFF_Refactor 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to address significant bugs but will also entail cleaning up SimpleAlign methods (factoring out more utility-like methods into Bio::Align::AlignUtils or similar). This also may involve several branches. See the following for additional details: http://www.bioperl.org/wiki/Align_Refactor Any help/suggestions for the above two would be greatly appreciated! Robert Buels may be heading up the initial FeatureIO work; I will likely start on LocatableSeq/Align (Mark, wanna help?). chris From maj at fortinbras.us Fri Aug 14 19:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 19:45:01 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Hey Chris et al, I'm there on LocatableSeq, definitely. I do have one project to finish this weekend before I move to that: I'm planning to move Chase Miller's excellent NeXML read/write implementation into the trunk, complete with tests. If we can get it to pass the test suite, is there room in the point release for it? MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, August 14, 2009 7:29 PM Subject: [Bioperl-l] GFF and LocatableSeq refactoring > As we have pretty much everything in place for another point release > (which I will start merging over this weekend into the 1.6 branch), I > have gone ahead and made two branches for refactoring some of the more > important pieces of bioperl code. Both refactors may require API > changes; if so these will be part of a 1.7 release. > > 1) GFF - entail refactoring bioperl code to better handle GFF2/3. > > This is a large section of code, so small incremental changes may be > merged to trunk over time (and thus may involve several branches). > Included is refactoring of feature typing to be more consistent and > lightweight, and will initially involve Bio::FeatureIO and > Bio::SeqFeature::Annotated (which may be deprecated in the process). > See the following for additional details: > > http://www.bioperl.org/wiki/GFF_Refactor > > 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI > (SimpleAlign) and LocatableSeq. This is primarily to address > significant bugs but will also entail cleaning up SimpleAlign methods > (factoring out more utility-like methods into Bio::Align::AlignUtils > or similar). This also may involve several branches. See the > following for additional details: > > http://www.bioperl.org/wiki/Align_Refactor > > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will > likely start on LocatableSeq/Align (Mark, wanna help?). > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Fri Aug 14 19:50:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 Aug 2009 16:50:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <4A85F83A.30800@cornell.edu> Chris Fields wrote: > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will likely > start on LocatableSeq/Align (Mark, wanna help?). Sure, I'll head up the gff_refactor branch work. If you're interested in what changes are being planned for Bio::SeqFeature::*, Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the implementation plan Chris and I developed just now on IRC, which is at http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan Now soliciting suggestions, comments, and assistance. Rob From cjfields at illinois.edu Fri Aug 14 21:03:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 20:03:41 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Mark, re: NeXML, yes, of course. There'll be an alpha release or two prior to core 1.6.1 (I need to test the Build.PL/Bio::Root::Build changes Sendu added in). chris On Aug 14, 2009, at 6:45 PM, Mark A. Jensen wrote: > Hey Chris et al, I'm there on LocatableSeq, definitely. I do have > one project to finish this weekend before I move to that: I'm > planning to move Chase Miller's > excellent NeXML read/write implementation into the trunk, complete > with tests. If we can get it to pass the test suite, is there room > in the point release for it? > MAJ > ----- Original Message ----- From: "Chris Fields" > > To: "BioPerl List" > Sent: Friday, August 14, 2009 7:29 PM > Subject: [Bioperl-l] GFF and LocatableSeq refactoring > > >> As we have pretty much everything in place for another point >> release (which I will start merging over this weekend into the 1.6 >> branch), I have gone ahead and made two branches for refactoring >> some of the more important pieces of bioperl code. Both refactors >> may require API changes; if so these will be part of a 1.7 release. >> 1) GFF - entail refactoring bioperl code to better handle GFF2/3. >> This is a large section of code, so small incremental changes may >> be merged to trunk over time (and thus may involve several >> branches). Included is refactoring of feature typing to be more >> consistent and lightweight, and will initially involve >> Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be >> deprecated in the process). See the following for additional >> details: >> http://www.bioperl.org/wiki/GFF_Refactor >> 2) Align/LocatableSeq - dealing with inconsistencies in >> Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to >> address significant bugs but will also entail cleaning up >> SimpleAlign methods (factoring out more utility-like methods into >> Bio::Align::AlignUtils or similar). This also may involve several >> branches. See the following for additional details: >> http://www.bioperl.org/wiki/Align_Refactor >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Fri Aug 14 22:32:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 22:32:01 -0400 Subject: [Bioperl-l] on BP documentation Message-ID: <1F899AA92F94415186CB0B25306F1114@NewLife> Hi All -- Off-list, an old colleague of mine had this insightful, if damning, comment: >I guess that from my perspective, after doing this stuff for >about 10 years, I personally would prefer to see a "summer of >documentation" for the bio* languages (or at least bioperl, as that is >the only one I ever look at). From my own experiences, and from those >of many colleagues, the documentation for bioperl has gone from >mediocre to quite poor in the last few years. I largely think the >wikification of the docs are to blame for this. Even SeqIO is hard >to figure out now--it took me an hour the other day to figure out that >"desc" returns the full Fasta header, and I had to get that from the >module code + trial-and-error, instead of the online docs. There is >far too much inside baseball going on in the documentation scheme. >So I worry more about the constant adding of features at the expense >of documenting what is already there. This is just my 2 cents, and it >is disappointing to see a downward trend for bioperl in this regard. I would be really interested in all responses from the list users. I must agree that BP docs are rather a rat's nest and of varying quality, but taken in toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount of useful and sophisticated information available. I think there are approaches we can take to reorganize and standardize the accession of it to make it more useful and inviting. I disagree with my pal about the wikification, but I wager that the power of the wiki could be leveraged to greater advantage (right, Dan?). I think that what we all as developers love is to code, and detest is to document. Since BP is all-volunteer, and volunteers tend to do what they like -- the beauty of open source, btw -- documentation reorg and cleanup probably must devolve to the Core. I am willing to lead such an effort, which will take some time, and more time the fewer volunteers there are. First let's hear some thoughts, and 'let it all hang out', as they said in my mom's era. cheers Mark From cjfields at illinois.edu Fri Aug 14 23:41:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 22:41:10 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> On Aug 14, 2009, at 9:32 PM, Mark A. Jensen wrote: > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >> I guess that from my perspective, after doing this stuff for >> about 10 years, I personally would prefer to see a "summer of >> documentation" for the bio* languages (or at least bioperl, as that >> is >> the only one I ever look at). From my own experiences, and from >> those >> of many colleagues, the documentation for bioperl has gone from >> mediocre to quite poor in the last few years. I largely think the >> wikification of the docs are to blame for this. Even SeqIO is hard >> to figure out now--it took me an hour the other day to figure out >> that >> "desc" returns the full Fasta header, and I had to get that from the >> module code + trial-and-error, instead of the online docs. There is >> far too much inside baseball going on in the documentation scheme. > >> So I worry more about the constant adding of features at the expense >> of documenting what is already there. This is just my 2 cents, and >> it >> is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I > must agree > that BP docs are rather a rat's nest and of varying quality, but > taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal > about the > wikification, but I wager that the power of the wiki could be > leveraged > to greater advantage (right, Dan?). To me good documentation should be a combination of both wiki docs (HOWTOs, scraps, cookbook-y code) and inline POD. We can't forsake one for the other. If I had a preference, I would take more up-to- date POD over wiki (maybe adding a Status: for the methods), but a good HOWTO goes a long way in helping. It's just too hard to cover every use case. It's unfortunate that documentation is very poor for many modules, but at the same time it's also exceptionally hard to write documentation for modules one has had no part in developing. I think this is the main reason the docs are in the state they are in (not to point the finger of blame at anyone, I'm just as much to blame). > I think that what we all as developers love is to code, and detest > is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it > all hang out', > as they said in my mom's era. > > cheers > Mark Two things: 1) Take advantage of the proposed restructuring effort (as well as some of the refactoring are doing) to add decent documentation where possible. This means updating method docs and updating the HOWTO's as needed, or adding new HOWTO's (Jason has indicated this in the past). 2) Pinpoint areas where docs are desperately needed first. Other wiki docs could also use updating. As an example, the above author's question on FASTA and desc() is actually answered in the FAQ, but the question doesn't make it easy to find: http://www.bioperl.org/wiki/FAQ#I_would_like_to_make_my_own_custom_fasta_header_-_how_do_I_do_this.3F chris From David.Messina at sbc.su.se Sat Aug 15 03:49:59 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 09:49:59 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <628aabb70908150049h64f83b8ewb30d916f0534e40d@mail.gmail.com> > > To me good documentation should be a combination of both wiki docs (HOWTOs, > scraps, cookbook-y code) and inline POD. We can't forsake one for the > other. > I think this notion is already kinda there de facto (inside baseball? :)), but perhaps we should make clear the idea that: - POD is the reference manual, with each method's capabilities described comprehensively and in detail. - The wiki is tutorials (bptutorial, Jason's slides), use cases (HOWTOs and Scrapbook), and FAQ And actually all the POD is accessible online from the wiki at doc.bioperl.org, too (although maybe a little hard to find -- it's under Developer--API Docs). > If I had a preference, I would take more up-to-date POD over wiki (maybe > adding a Status: for the methods), but a good HOWTO goes a long way in > helping. It's just too hard to cover every use case. > I'd agree with this, too, partly because I think the HOWTOs are in pretty good shape, covering the most common stuff pretty well, and partly because I think the reference manual has to be complete, both for a user coming to find out how to use it and for authors ensuring that their internal model of how the code works actually hangs together. Mark, one attack point for a documentation improvement effort would be to take a survey of the PODs and see how well they are fulfilling the role of a reference manual. But part of a good reference manual is knowing how to find what you're looking for, and indeed I think that's maybe the main overall problem with trying to document anything as big and complicated as BioPerl. So for me, the organization of our copious docs might benefit from some attention. The goal of providing a way to find information better handled by the wiki, which does searching and crossreferencing much better than POD. To take your friend's FASTA header example, I might expect to be able to search for 'FASTA' or 'FASTA header' on the wiki and find something which guides me to the answer. A search for 'FASTA' gives a list of pointers, including the 'FASTA sequence format' page. That page almost gives the right answer (see the Note section), but perhaps it might be a nice place to say that in BioPerl, a FASTA sequence is a Bio::Seq, and that the header is $seq->desc and the seq is $seq->seq. And there could be an equivalent page for the other common formats, breaking down how the format maps to an object. [...] it's also exceptionally hard to write documentation for modules one > has had no part in developing. I think this is the main reason the docs are > in the state they are in (not to point the finger of blame at anyone, I'm > just as much to blame). Absolutely, and maybe a first step would be to contact the authors of a module with out-of-date docs and ask for them to fix it, in the same way one would go to the author with a bug in their code. Core+volunteers will certainly be needed for organizing the effort and assessing the state of BioPerl documentation as a whole, but give authors the opportunity to take care of their code, too. Two things: > > 1) Take advantage of the proposed restructuring effort (as well as some of > the refactoring are doing) to add decent documentation where possible. This > means updating method docs and updating the HOWTO's as needed, or adding new > HOWTO's (Jason has indicated this in the past). > This is a great idea. > 2) Pinpoint areas where docs are desperately needed first. > > Other wiki docs could also use updating. As an example, the above author's > question on FASTA and desc() is actually answered in the FAQ, Absolutely. Maybe some of the FAQs could actually be added back to the relevant PODs? Dave From David.Messina at sbc.su.se Sat Aug 15 04:00:50 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 10:00:50 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> Message-ID: <628aabb70908150100ka8c21aahe2bf7d636fa94112@mail.gmail.com> > > I know that it is possible to have an all-gap LocatableSeq You can, but to avoid the "can't guess alphabet" error I'm getting you have to set the alphabet manually (which AlignIO does not). I'll start a branch to get the process started. Terrific! In the meantime, then, I'll just use the -nowarnonempty workaround in my local copy of AlignIO. Dave From bernd.web at gmail.com Sat Aug 15 07:17:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Sat, 15 Aug 2009 13:17:44 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Hi >>? Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. I was a bit surprised about $seq->desc retrieving the entire FASTA header line Actually, in Bioperl 1.52 at least $seq->desc returns the description only, so without the ID. Thus, to get the entire FASTA header line $seq->id . " " $seq->desc would be needed. For the modules I use (mainly related to sequences, such as SeqIO, SimpleAlign), I'd be happy to contribute on docs, checking docs, or examples. Regards, Bernd From sanjaysingh765 at gmail.com Sat Aug 15 09:38:18 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Sat, 15 Aug 2009 19:08:18 +0530 Subject: [Bioperl-l] BLINK PARSER Message-ID: Hi, I want to submit query to NCBI'S BLINK and parsed the result for the best hit. is there anyone have script to do so.i would be very grateful if someone would like to share it with me. regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From jimhu at tamu.edu Sat Aug 15 11:01:15 2009 From: jimhu at tamu.edu (Jim Hu) Date: Sat, 15 Aug 2009 10:01:15 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? Message-ID: Over on the Gbrowse list, Don Gilbert explained to me why genbank2gff3.pl is having problems with prokaryotic genomes. Has anyone written an alternative? Jim Hu ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Sat Aug 15 11:27:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:27:01 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: References: Message-ID: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> We (bioperl devs and users) would be very interested to have something like this included. I ran into a similar problem with genbank2gff3 a year ago with some of our work here on Archaea. I managed to get enough data out to get gbrowse up-and-running, but it required quite a bit of hand-editing. In fact, seeing as we're refactoring GFF and other aspects of Features in bioperl, this may be the best time to add something in. chris On Aug 15, 2009, at 10:01 AM, Jim Hu wrote: > Over on the Gbrowse list, Don Gilbert explained to me why > genbank2gff3.pl is having problems with prokaryotic genomes. Has > anyone written an alternative? > > Jim Hu > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 15 11:55:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:55:44 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Message-ID: On Aug 15, 2009, at 6:17 AM, Bernd Web wrote: > Hi > >>> Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. > I was a bit surprised about $seq->desc retrieving the entire FASTA > header line > Actually, in Bioperl 1.52 at least $seq->desc returns the description > only, so without the ID. Thus, to get the entire FASTA header line > $seq->id . " " $seq->desc would be needed. Odd, not seeing where a change was made that would cause this behavior. Can you post an example? > For the modules I use (mainly related to sequences, such as SeqIO, > SimpleAlign), I'd be happy to contribute on docs, checking docs, or > examples. > > Regards, > Bernd Would be nice to have an Align/SimpleAlign HOWTO, but seeing as we want to refactor large chunks of that code, it might be slightly premature. That is, unless we want to document what behavior we expect to see as a sort of ROADMAP (maybe as part of the refactoring page). That could then be converted over to a HOWTO. Feel free to chip in on this in any way possible. The more documentation the better. chris From rmb32 at cornell.edu Sat Aug 15 12:44:03 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 09:44:03 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <85143.35343.qm@web30404.mail.mud.yahoo.com> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A86E5D3.3030906@cornell.edu> The usual procedure for developing code is to exchange code via commits to a version control system. Yee, do you know how to use Subversion? Does Yee need a commit bit? Rob Yee Man Chan wrote: > Hi Chris > > I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) > > Please let me know if it works for you. > > Sorry for the bug... > Yee Man > > --- On Fri, 8/14/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" >> Date: Friday, August 14, 2009, 8:31 AM >> Yee Man, >> >> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >> appears to be 32-bit). The patch results in cleaning >> up warnings for 5.10.0 but results in similar warnings for >> 5.8.8 (linux or OS X). >> >> On OS X perl 5.8.8, this sometimes passes (note the first >> attempt fails, the second succeeds), so it's not entirely a >> 32-bit issue: >> >> http://gist.github.com/167860 >> >> OS X and perl 5.10.0, this always fails as the previous >> gist shows, but demonstrates similar behavior (multiple >> attempts to test get different responses): >> >> http://gist.github.com/167542 >> >> On linux, everything passes with or w/o the patched files >> (patched files have warnings as indicated above): >> >> Specs for all three perl executables (they vary a bit): >> >> http://gist.github.com/167883 >> >> chris >> >> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >> >>> Ah.. I find that the typemap can become as simple as >> this >>> ===================== >>> TYPEMAP >>> HMM * T_PTROBJ >>> ===================== >>> >>> Then the generated HMM.c will have a function called >> INT2PTR to do the pointer conversion. I believe this should >> solve the warnings. >>> Attached are the updated HMM.xs and typemap. Can >> someone with a 64-bit machine give it a try? >>> Thank you >>> Yee Man >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>> Date: Thursday, August 13, 2009, 5:31 PM >>>> (just to point out to everyone, Yee >>>> Man's contact information was in the POD) >>>> >>>> Yee Man, >>>> >>>> I have the output in the below link: >>>> >>>> http://gist.github.com/167542 >>>> >>>> There are similar problems popping up on 32- and >> 64-bit >>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >> to debug >>>> it unfortunately. >>>> >>>> I think we should seriously consider spinning this >> code off >>>> into it's own distribution for CPAN. It's >>>> unfortunately bit-rotting away in >> bioperl-ext. If you >>>> want to continue supporting it I can help set that >> up. >>>> chris >>>> >>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>> >>>>> Hi >>>>> >>>>> So is this an HMM only >> problem? Or does >>>> it apply to other bioperl-ext modules? >>>>> What exactly are the >> compilation errors >>>> for HMM? I believe my implementation is just a >> simple one >>>> based on Rabiner's paper. >>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>> >>>>> I don't think I did >> anything fancy that >>>> makes it machine dependent or non-ANSI C. >>>>> Yee Man >>>>> >>>>> --- On Thu, 8/13/09, Chris Fields >>>> wrote: >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Jonny Dalzell" , >>>> "BioPerl List" , >>>> "Yee Man Chan" >>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>> >>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >> wrote: >>>>>>> Jonny Dalzell wrote: >>>>>>>> Is it ridiculous of me to expect >> ubuntu to >>>> take >>>>>> care of this for me? How do >>>>>>>> I go about compiling the HMM? >>>>>>> Yes. This is a very specialized >> thing >>>> that >>>>>> you're doing, and Ubuntu does not have >> the >>>> resources to >>>>>> package every single thing. >>>>>>> Unfortunately, it looks like >> bioperl-ext >>>> package is >>>>>> not installable under Ubuntu 9.04 anyway, >> which is >>>> what I'm >>>>>> running. For others on this list, >> if >>>> somebody is >>>>>> interested in doing maintaining it, I'd be >> happy >>>> to help out >>>>>> by testing on Debian-based Linux >> platforms. >>>> We need to >>>>>> clarify this package's maintenance status: >> if >>>> there is >>>>>> nobody interested in maintaining it, I >> would >>>> recommend that >>>>>> bioperl-ext be removed from distribution. >>>> It's not in >>>>>> anybody's interest to have unmaintained >> software >>>> out there >>>>>> causing confusion. >>>>>> >>>>>> I have cc'd Yee Man Chan for this. >> If there >>>> isn't a >>>>>> response or the message bounces, we do one >> of two >>>> things: >>>>>> 1) consider it deprecated (probably >> safest). >>>>>> 2) spin it out into a separate module. >>>>>> >>>>>> Just tried to comile it myself and am >> getting >>>> errors (using >>>>>> 64bit perl 5.10), so I think, unless >> someone wants >>>> to take >>>>>> this on, option #1 is best. >>>>>> >>>>>>> So Jonny, in short, I would say "do >> not use >>>>>> bioperl-ext". >>>>>> >>>>>> In general, that's a safe bet. We're >> moving >>>> most of >>>>>> our C/C++ bindings to BioLib. >>>>>> >>>>>>> Step back. What are you trying >> to >>>>>> accomplish? Chris already >> recommended some >>>> alternative >>>>>> methods in his email of 8/11 on this >>>> subject. Perhaps >>>>>> we can guide you to some software that is >>>> actively >>>>>> maintained and will meet your needs. >>>>>>> Rob >>>>>> Exactly. Lots of other (better >> supported!) >>>> options >>>>>> out there. HMMER, SeqAn, and >> others. >>>>>> chris >>>>>> >>>>> >>>>> >>>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj at fortinbras.us Sat Aug 15 13:40:26 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 13:40:26 -0400 Subject: [Bioperl-l] BLINK PARSER In-Reply-To: References: Message-ID: <34DBCBEA5E2D49A892E5077AA780BA4E@NewLife> Hi Sanjay- I'm not sure BioPerl has an interface specifically for BLINK (I will be corrected if I'm wrong, so stay tuned). If you can obtain the "raw" blast output for the protein you're interested in ( doing [BLINK] then [Other Views: BLAST] then [Format:Show: Alignment as Plain text] ) that text can be parsed using the Bio::SearchIO tools, and you can use Bio::Search::Tiling to obtain the 'best' hsps. This may not be too helpful, I'm afraid, but it is where I would start. Mark ----- Original Message ----- From: "sanjay singh" To: Sent: Saturday, August 15, 2009 9:38 AM Subject: [Bioperl-l] BLINK PARSER > Hi, > I want to submit query to NCBI'S BLINK and parsed the result for the best > hit. is there anyone have script to do so.i would be very grateful if > someone would like to share it with me. > regards > sanjay > > -- > Happy moments , praise God. > Difficult moments, seek God. > Quiet moments, worship God. > Painful moments, trust God. > Every moment, thank God > > Sanjay Kumar Singh > Bose Institute > 93\1,A.P.C.Road > Kolkata-700 009 > West Bengal > India > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 15 15:11:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 14:11:48 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A86E5D3.3030906@cornell.edu> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> <4A86E5D3.3030906@cornell.edu> Message-ID: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> I'm not sure, but it makes more sense to commit these changes directly. Yee, need us to set you up with a commit bit? If so, fill out the information on this page: http://www.bioperl.org/wiki/SVN_Account_Request and forward it to support at open-bio.org. I'll sponsor you. chris On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > The usual procedure for developing code is to exchange code via > commits to a version control system. Yee, do you know how to use > Subversion? Does Yee need a commit bit? > > Rob > > Yee Man Chan wrote: >> Hi Chris >> I find that there is a memory access bug in my code. Attached is >> the fixed HMM.xs. This file together with the simpler typemap >> should fix all problems. (I hope..) >> Please let me know if it works for you. >> Sorry for the bug... >> Yee Man >> --- On Fri, 8/14/09, Chris Fields wrote: >>> From: Chris Fields >>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >>> WinVista? >>> To: "Yee Man Chan" >>> Cc: "Robert Buels" , "Jonny Dalzell" >> >, "BioPerl List" >>> Date: Friday, August 14, 2009, 8:31 AM >>> Yee Man, >>> >>> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >>> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >>> appears to be 32-bit). The patch results in cleaning >>> up warnings for 5.10.0 but results in similar warnings for >>> 5.8.8 (linux or OS X). >>> >>> On OS X perl 5.8.8, this sometimes passes (note the first >>> attempt fails, the second succeeds), so it's not entirely a >>> 32-bit issue: >>> >>> http://gist.github.com/167860 >>> >>> OS X and perl 5.10.0, this always fails as the previous >>> gist shows, but demonstrates similar behavior (multiple >>> attempts to test get different responses): >>> >>> http://gist.github.com/167542 >>> >>> On linux, everything passes with or w/o the patched files >>> (patched files have warnings as indicated above): >>> >>> Specs for all three perl executables (they vary a bit): >>> >>> http://gist.github.com/167883 >>> >>> chris >>> >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >>> >>>> Ah.. I find that the typemap can become as simple as >>> this >>>> ===================== >>>> TYPEMAP >>>> HMM * T_PTROBJ >>>> ===================== >>>> >>>> Then the generated HMM.c will have a function called >>> INT2PTR to do the pointer conversion. I believe this should >>> solve the warnings. >>>> Attached are the updated HMM.xs and typemap. Can >>> someone with a 64-bit machine give it a try? >>>> Thank you >>>> Yee Man >>>> --- On Thu, 8/13/09, Chris Fields >>> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >>> package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >>> "Jonny Dalzell" , >>> "BioPerl List" >>>>> Date: Thursday, August 13, 2009, 5:31 PM >>>>> (just to point out to everyone, Yee >>>>> Man's contact information was in the POD) >>>>> >>>>> Yee Man, >>>>> >>>>> I have the output in the below link: >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> There are similar problems popping up on 32- and >>> 64-bit >>>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >>> to debug >>>>> it unfortunately. >>>>> >>>>> I think we should seriously consider spinning this >>> code off >>>>> into it's own distribution for CPAN. It's >>>>> unfortunately bit-rotting away in >>> bioperl-ext. If you >>>>> want to continue supporting it I can help set that >>> up. >>>>> chris >>>>> >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> So is this an HMM only >>> problem? Or does >>>>> it apply to other bioperl-ext modules? >>>>>> What exactly are the >>> compilation errors >>>>> for HMM? I believe my implementation is just a >>> simple one >>>>> based on Rabiner's paper. >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>> ~murphyk%2FBayes >>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>> >>>>>> I don't think I did >>> anything fancy that >>>>> makes it machine dependent or non-ANSI C. >>>>>> Yee Man >>>>>> >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >>> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Robert Buels" >>>>>>> Cc: "Jonny Dalzell" , >>>>> "BioPerl List" , >>>>> "Yee Man Chan" >>>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>>> >>>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >>> wrote: >>>>>>>> Jonny Dalzell wrote: >>>>>>>>> Is it ridiculous of me to expect >>> ubuntu to >>>>> take >>>>>>> care of this for me? How do >>>>>>>>> I go about compiling the HMM? >>>>>>>> Yes. This is a very specialized >>> thing >>>>> that >>>>>>> you're doing, and Ubuntu does not have >>> the >>>>> resources to >>>>>>> package every single thing. >>>>>>>> Unfortunately, it looks like >>> bioperl-ext >>>>> package is >>>>>>> not installable under Ubuntu 9.04 anyway, >>> which is >>>>> what I'm >>>>>>> running. For others on this list, >>> if >>>>> somebody is >>>>>>> interested in doing maintaining it, I'd be >>> happy >>>>> to help out >>>>>>> by testing on Debian-based Linux >>> platforms. >>>>> We need to >>>>>>> clarify this package's maintenance status: >>> if >>>>> there is >>>>>>> nobody interested in maintaining it, I >>> would >>>>> recommend that >>>>>>> bioperl-ext be removed from distribution. >>>>> It's not in >>>>>>> anybody's interest to have unmaintained >>> software >>>>> out there >>>>>>> causing confusion. >>>>>>> >>>>>>> I have cc'd Yee Man Chan for this. >>> If there >>>>> isn't a >>>>>>> response or the message bounces, we do one >>> of two >>>>> things: >>>>>>> 1) consider it deprecated (probably >>> safest). >>>>>>> 2) spin it out into a separate module. >>>>>>> >>>>>>> Just tried to comile it myself and am >>> getting >>>>> errors (using >>>>>>> 64bit perl 5.10), so I think, unless >>> someone wants >>>>> to take >>>>>>> this on, option #1 is best. >>>>>>> >>>>>>>> So Jonny, in short, I would say "do >>> not use >>>>>>> bioperl-ext". >>>>>>> >>>>>>> In general, that's a safe bet. We're >>> moving >>>>> most of >>>>>>> our C/C++ bindings to BioLib. >>>>>>> >>>>>>>> Step back. What are you trying >>> to >>>>>>> accomplish? Chris already >>> recommended some >>>>> alternative >>>>>>> methods in his email of 8/11 on this >>>>> subject. Perhaps >>>>>>> we can guide you to some software that is >>>>> actively >>>>>>> maintained and will meet your needs. >>>>>>>> Rob >>>>>>> Exactly. Lots of other (better >>> supported!) >>>>> options >>>>>>> out there. HMMER, SeqAn, and >>> others. >>>>>>> chris >>>>>>> >>>>>> >>>>>> >>>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam >>> protection around >>>> http://mail.yahoo.com >>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From hlapp at gmx.net Sat Aug 15 15:41:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 15:41:56 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: On Aug 14, 2009, at 11:41 PM, Chris Fields wrote: > I would take more up-to-date POD over wiki (maybe adding a Status: > for the methods), but a good HOWTO goes a long way in helping. It's > just too hard to cover every use case. I'd very much second this. An API documentation should arguably be written by the developer(s) and hence I would expect to find in the PODs. Use-cases, however, and how to solve those in BioPerl can and should be contributed by everyone, and the wiki is just way better at facilitating this. As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 15:53:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 15:53:31 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: ----- Original Message ----- From: "Hilmar Lapp" ... > As for the FASTA example, I can understand - I've heard repeatedly > from people that one of the things that they are missing is > documentation for every SeqIO format we support (such as GenBank, > UniProt, FASTA, etc) about where to find a particular piece of the > format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ From hlapp at gmx.net Sat Aug 15 16:09:14 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:09:14 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > I'm planning to move Chase Miller's excellent NeXML read/write > implementation into the trunk, complete with tests. If we can get it > to pass the test suite, is there room in the point release for it? We've in the past stayed away from adding new features to stable branches with the exception of new methods in existing classes and that didn't do anything complicated. I'm not sure I remember everything but I think the NeXML support does exceed that level, doesn't it? Can it be rolled into its own pre- release that is a drop-in to an existing 1.6.x installation for those who want to go there? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 15 16:12:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:12:35 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A85F83A.30800@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: Great! Two suggestions: > ? deprecate the get_Annotations(Str) method in favor of > get_annotation(Str), which adheres better to standard perl method > naming Yes, but also is then inconsistent with existing BioPerl naming, with the method name indicating what type of object you get back (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in Bio::SeqI). > ? finally, split Bio::FeatureIO modules off into their own CPAN > distribution Wouldn't one start with this? -hilmar On Aug 14, 2009, at 7:50 PM, Robert Buels wrote: > Chris Fields wrote: >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). > > Sure, I'll head up the gff_refactor branch work. If you're > interested in what changes are being planned for Bio::SeqFeature::*, > Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the > implementation plan Chris and I developed just now on IRC, which is at > > http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan > > Now soliciting suggestions, comments, and assistance. > > Rob > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Sat Aug 15 16:24:35 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 13:24:35 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <4A871983.4010702@cornell.edu> Hilmar Lapp wrote: > I'm not sure I remember everything but I think the NeXML support does > exceed that level, doesn't it? Can it be rolled into its own pre-release > that is a drop-in to an existing 1.6.x installation for those who want > to go there? So split it out into its own CPAN dist. Rob From maj at fortinbras.us Sat Aug 15 16:36:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 16:36:47 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Yes, I'd say the Nexml support exceeds the 'complicated' test. There are no modifications to existing modules (except for the addition of annotation attributes to members of the Bio::PopGen model, which are don't-cares to anything out there currently). The manifest of a NeXML drop-in would look like Bio/NexmlIO.pm Bio/Nexml/Factory.pm Bio/SeqIO/nexml.pm Bio/AlignIO/nexml.pm Bio/TreeIO/nexml.pm and, if I get it completed, support for arbitrary characters via Bio::PopGen Bio/PopGen/IO/nexml.pm (all based on hacks of Chase's code, btw; we thought it would round out the package nicely...) Of course, the big dependency that not everyone will need or want is Rutger's Bio::Phylo, so the Nexml support will have to be optional even in 1.7, I think. I am adding run-time checks for Bio::Phylo in the modules so they die relatively gracefully and informatively, rather than just barf. Also, the tests will have appropriate skip blocks. I do want to get the code into bioperl-live, however, unless there's a gotcha there I'm not seeing-- cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:09 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > >> I'm planning to move Chase Miller's excellent NeXML read/write >> implementation into the trunk, complete with tests. If we can get it to pass >> the test suite, is there room in the point release for it? > > > We've in the past stayed away from adding new features to stable branches > with the exception of new methods in existing classes and that didn't do > anything complicated. > > I'm not sure I remember everything but I think the NeXML support does exceed > that level, doesn't it? Can it be rolled into its own pre- release that is a > drop-in to an existing 1.6.x installation for those who want to go there? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From hlapp at gmx.net Sat Aug 15 16:49:22 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:49:22 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Message-ID: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > I do want to get the code into bioperl-live, however, unless there's > a gotcha there I'm not seeing-- That sounds great to me, though it may make some of Chris' hair stand on end if he wants this to go into a separate module from the start :) Maybe a phylogenetics module can be carved out that this would become part of? Though I recall someone saying recently that Bio::Species and by extension Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to split out. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 17:07:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 17:07:30 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> Message-ID: <659CA35CE3AD464AA516D18B313311BE@NewLife> I'm all for an attempt to split out phylogenetic stuff, it seems natural, and think in terms of a phylo package dependent upon a sequence package, and if necessary vice versa -- although if the Bio::Species - Bio::Tree::Node connection is relatively loose, perhaps we can refactor to make some attributes/methods optional features that carp when the phylo package is not installed. (Roles, anyone?) However, probably 1.6.x doesn't sound like the place to do that! I myself wouldn't have any problem waiting till 1.7 for 'official' Nexml support--but I hope Chase will chime in on that. What does Chris think? MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:49 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > >> I do want to get the code into bioperl-live, however, unless there's a >> gotcha there I'm not seeing-- > > > That sounds great to me, though it may make some of Chris' hair stand on end > if he wants this to go into a separate module from the start :) Maybe a > phylogenetics module can be carved out that this would become part of? Though > I recall someone saying recently that Bio::Species and by extension > Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to > split out. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From rmb32 at cornell.edu Sat Aug 15 17:23:40 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:23:40 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: <4A87275C.5040300@cornell.edu> Hilmar Lapp wrote: >> ? deprecate the get_Annotations(Str) method in favor of >> get_annotation(Str), which adheres better to standard perl method naming > > Yes, but also is then inconsistent with existing BioPerl naming, with > the method name indicating what type of object you get back > (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in > Bio::SeqI). Blech. OK never mind about the method rename then. > >> ? finally, split Bio::FeatureIO modules off into their own CPAN >> distribution > > Wouldn't one start with this? Yeah....I've kind of been vacillating back and forth about whether it would be best to *start* with this, or to end with this. Probably makes more sense to start with it, since it gives more freedom to add dependencies on more CPAN stuff without worrying too much. Like...oh...I don't know...Moose? Thoughts on this? Rob From rmb32 at cornell.edu Sat Aug 15 17:25:51 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:25:51 -0700 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> Message-ID: <4A8727DF.7000204@cornell.edu> Chris Fields wrote: > In fact, seeing as we're refactoring GFF and other aspects of Features > in bioperl, this may be the best time to add something in. Reading that thread, it sounds like most of the issues revolve around when and how to use the unflattener. Perhaps just adding another command line switch or two to the script would be appropriate? Editorializing a bit, it's really disheartening that Genbank stores features in such a lossy way. Rob From cjfields at illinois.edu Sat Aug 15 22:05:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:05:41 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <241652.96493.qm@web30404.mail.mud.yahoo.com> References: <241652.96493.qm@web30404.mail.mud.yahoo.com> Message-ID: I'm still seeing the same errors on Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl (v5.8.8) passes fine now (as well as perl 5.8.8 on dev.open-bio.org). I'm wondering if this is a problem with my local perl build. I'm very tempted to push the HMM-related code into a separate distribution (bioperl-hmm) and make a CPAN release out of it so it gets wider testing via CPAN testers; it would just require a minimum bioperl 1.6 installation for Bio::Tools::HMM and any related modules. Yee, would that be okay with you? chris On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > I just committed HMM.xs and typemap to SVN. Can you test it to > confirm it works in 64-bit machines? > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 12:11 PM >> I'm not sure, but it makes more sense >> to commit these changes directly. Yee, need us to set >> you up with a commit bit? If so, fill out the >> information on this page: >> >> http://www.bioperl.org/wiki/SVN_Account_Request >> >> and forward it to support at open-bio.org. >> I'll sponsor you. >> >> chris >> >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >> >>> The usual procedure for developing code is to exchange >> code via commits to a version control system. Yee, do >> you know how to use Subversion? Does Yee need a commit bit? >>> >>> Rob >>> >>> Yee Man Chan wrote: >>>> Hi Chris >>>> I find that there is a memory >> access bug in my code. Attached is the fixed HMM.xs. This >> file together with the simpler typemap should fix all >> problems. (I hope..) >>>> Please let me know if it works >> for you. >>>> Sorry for the bug... >>>> Yee Man >>>> --- On Fri, 8/14/09, Chris Fields >> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>>> Date: Friday, August 14, 2009, 8:31 AM >>>>> Yee Man, >>>>> >>>>> I tested this out locally (perl 5.8.8 32-bit, >> perl 5.10.0 >>>>> 64-bit) and on dev.open-bio.org (which is perl >> 5.8.8, >>>>> appears to be 32-bit). The patch results >> in cleaning >>>>> up warnings for 5.10.0 but results in similar >> warnings for >>>>> 5.8.8 (linux or OS X). >>>>> >>>>> On OS X perl 5.8.8, this sometimes passes >> (note the first >>>>> attempt fails, the second succeeds), so it's >> not entirely a >>>>> 32-bit issue: >>>>> >>>>> http://gist.github.com/167860 >>>>> >>>>> OS X and perl 5.10.0, this always fails as the >> previous >>>>> gist shows, but demonstrates similar behavior >> (multiple >>>>> attempts to test get different responses): >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> On linux, everything passes with or w/o the >> patched files >>>>> (patched files have warnings as indicated >> above): >>>>> >>>>> Specs for all three perl executables (they >> vary a bit): >>>>> >>>>> http://gist.github.com/167883 >>>>> >>>>> chris >>>>> >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan >> wrote: >>>>> >>>>>> Ah.. I find that the typemap can become as >> simple as >>>>> this >>>>>> ===================== >>>>>> TYPEMAP >>>>>> HMM * T_PTROBJ >>>>>> ===================== >>>>>> >>>>>> Then the generated HMM.c will have a >> function called >>>>> INT2PTR to do the pointer conversion. I >> believe this should >>>>> solve the warnings. >>>>>> Attached are the updated HMM.xs and >> typemap. Can >>>>> someone with a 64-bit machine give it a try? >>>>>> Thank you >>>>>> Yee Man >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>>> "Jonny Dalzell" , >>>>> "BioPerl List" >>>>>>> Date: Thursday, August 13, 2009, 5:31 >> PM >>>>>>> (just to point out to everyone, Yee >>>>>>> Man's contact information was in the >> POD) >>>>>>> >>>>>>> Yee Man, >>>>>>> >>>>>>> I have the output in the below link: >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> There are similar problems popping up >> on 32- and >>>>> 64-bit >>>>>>> perl 5.10.0, Mac OS X 10.5. >> Haven't had time >>>>> to debug >>>>>>> it unfortunately. >>>>>>> >>>>>>> I think we should seriously consider >> spinning this >>>>> code off >>>>>>> into it's own distribution for >> CPAN. It's >>>>>>> unfortunately bit-rotting away in >>>>> bioperl-ext. If you >>>>>>> want to continue supporting it I can >> help set that >>>>> up. >>>>>>> chris >>>>>>> >>>>>>> On Aug 13, 2009, at 6:58 PM, Yee Man >> Chan wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> So is this >> an HMM only >>>>> problem? Or does >>>>>>> it apply to other bioperl-ext >> modules? >>>>>>>> What >> exactly are the >>>>> compilation errors >>>>>>> for HMM? I believe my implementation >> is just a >>>>> simple one >>>>>>> based on Rabiner's paper. >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>> ~murphyk%2FBayes >>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>> >>>>>>>> I don't >> think I did >>>>> anything fancy that >>>>>>> makes it machine dependent or non-ANSI >> C. >>>>>>>> Yee Man >>>>>>>> >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Robert Buels" >>>>>>>>> Cc: "Jonny Dalzell" , >>>>>>> "BioPerl List" , >>>>>>> "Yee Man Chan" >>>>>>>>> Date: Thursday, August 13, >> 2009, 3:18 PM >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 4:37 PM, >> Robert Buels >>>>> wrote: >>>>>>>>>> Jonny Dalzell wrote: >>>>>>>>>>> Is it ridiculous of me >> to expect >>>>> ubuntu to >>>>>>> take >>>>>>>>> care of this for me? How >> do >>>>>>>>>>> I go about compiling >> the HMM? >>>>>>>>>> Yes. This is a very >> specialized >>>>> thing >>>>>>> that >>>>>>>>> you're doing, and Ubuntu does >> not have >>>>> the >>>>>>> resources to >>>>>>>>> package every single thing. >>>>>>>>>> Unfortunately, it looks >> like >>>>> bioperl-ext >>>>>>> package is >>>>>>>>> not installable under Ubuntu >> 9.04 anyway, >>>>> which is >>>>>>> what I'm >>>>>>>>> running. For others on >> this list, >>>>> if >>>>>>> somebody is >>>>>>>>> interested in doing >> maintaining it, I'd be >>>>> happy >>>>>>> to help out >>>>>>>>> by testing on Debian-based >> Linux >>>>> platforms. >>>>>>> We need to >>>>>>>>> clarify this package's >> maintenance status: >>>>> if >>>>>>> there is >>>>>>>>> nobody interested in >> maintaining it, I >>>>> would >>>>>>> recommend that >>>>>>>>> bioperl-ext be removed from >> distribution. >>>>>>> It's not in >>>>>>>>> anybody's interest to have >> unmaintained >>>>> software >>>>>>> out there >>>>>>>>> causing confusion. >>>>>>>>> >>>>>>>>> I have cc'd Yee Man Chan for >> this. >>>>> If there >>>>>>> isn't a >>>>>>>>> response or the message >> bounces, we do one >>>>> of two >>>>>>> things: >>>>>>>>> 1) consider it deprecated >> (probably >>>>> safest). >>>>>>>>> 2) spin it out into a separate >> module. >>>>>>>>> >>>>>>>>> Just tried to comile it myself >> and am >>>>> getting >>>>>>> errors (using >>>>>>>>> 64bit perl 5.10), so I think, >> unless >>>>> someone wants >>>>>>> to take >>>>>>>>> this on, option #1 is best. >>>>>>>>> >>>>>>>>>> So Jonny, in short, I >> would say "do >>>>> not use >>>>>>>>> bioperl-ext". >>>>>>>>> >>>>>>>>> In general, that's a safe >> bet. We're >>>>> moving >>>>>>> most of >>>>>>>>> our C/C++ bindings to BioLib. >>>>>>>>> >>>>>>>>>> Step back. What are >> you trying >>>>> to >>>>>>>>> accomplish? Chris >> already >>>>> recommended some >>>>>>> alternative >>>>>>>>> methods in his email of 8/11 >> on this >>>>>>> subject. Perhaps >>>>>>>>> we can guide you to some >> software that is >>>>>>> actively >>>>>>>>> maintained and will meet your >> needs. >>>>>>>>>> Rob >>>>>>>>> Exactly. Lots of other >> (better >>>>> supported!) >>>>>>> options >>>>>>>>> out there. HMMER, SeqAn, >> and >>>>> others. >>>>>>>>> chris >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >> __________________________________________________ >>>>>> Do You Yahoo!? >>>>>> Tired of spam? Yahoo! Mail has the >> best spam >>>>> protection around >>>>>> http://mail.yahoo.com >>>>> >> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> >>> >>> --Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >> >> > > > From cjfields at illinois.edu Sat Aug 15 22:49:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:49:25 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <659CA35CE3AD464AA516D18B313311BE@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> Message-ID: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> On Aug 15, 2009, at 4:07 PM, Mark A. Jensen wrote: > I'm all for an attempt to split out phylogenetic stuff, it > seems natural, and think in terms of a phylo package > dependent upon a sequence package, and if necessary > vice versa -- although if the Bio::Species - Bio::Tree::Node > connection is relatively loose, perhaps we can refactor to > make some attributes/methods optional features that carp > when the phylo package is not installed. (Roles, anyone?) I'm pretty sure they're linked very tightly (Species is-a Bio::Taxon is-a Bio::Tree::Node). This may be something Sendu needs to chime in on; he refactored much of that code prior to 1.5.2. As a suggestion, maybe we can use a combined strategy: fall back to a very simple Bio::Species container class if a bioperl-phylo isn't installed, but utilize Bio::Taxon when it is. > However, probably 1.6.x doesn't sound like the place to > do that! I myself wouldn't have any problem waiting till > 1.7 for 'official' Nexml support--but I hope Chase will chime > in on that. What does Chris think? > MAJ Robert's suggestion of a separate distribution makes sense; it may be one avenue of slowly migrating out phylo-specific code into it's own distribution. Not sure about calling it bioperl-phylo (which might be confused with Rutger's Bio::Phylo). chris From cjfields at illinois.edu Sat Aug 15 22:47:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:47:36 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <4A8727DF.7000204@cornell.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> <4A8727DF.7000204@cornell.edu> Message-ID: <81C3E545-4F0E-4B1F-9F06-398D1EE7A3CF@illinois.edu> On Aug 15, 2009, at 4:25 PM, Robert Buels wrote: > Chris Fields wrote: > > In fact, seeing as we're refactoring GFF and other aspects of > Features > > in bioperl, this may be the best time to add something in. > > Reading that thread, it sounds like most of the issues revolve > around when and how to use the unflattener. Perhaps just adding > another command line switch or two to the script would be appropriate? > > Editorializing a bit, it's really disheartening that Genbank stores > features in such a lossy way. > > Rob Just remembered: NCBI does supply GFF3 files for bacterial genomes, but I'm not sure how well they correspond to the GFF3 specification. For example: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Aquifex_aeolicus/NC_000918.gff A quick glance looks okay, but they don't include FASTA sequence. I think much of the problem with NCBI/GenBank has to do with lack of curation on how submissions are made (lots of inconsistencies). I'm not sure how easy they will be to deal with, but the only way we can deal with that is looking at examples of problematic data (IIRC the Sulfolobus solfataricus genome GB file was a mess, so maybe that's worth a look). chris From cjfields at illinois.edu Sun Aug 16 01:38:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 00:38:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <846546.73578.qm@web30404.mail.mud.yahoo.com> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> Message-ID: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Yee, I took the liberty of making a few simple changes to Bio::Tools::HMM in svn to point out the problem and possible solutions. Feel free to revert these as needed. I'm seeing two errors, which appear randomly when running 'make test'. The first is easily fixable, the second, I'm not so sure. I'll let you make the decisions on both. 1) There is an assumption in the module that, when adding floating points, you will always get 1.0. You may run into problems: see 'perldoc -q long decimals'. Lines like this (two places in the module): ... if ($sum != 1.0) { $self->throw("Sum of probabilities for each state must be 1.0; got $sum\n"); } ... won't work as expected (note I added a simple diagnostic, just print out the 'bad' sum). With perl 5.8.8, this appears to work fine, but this is what I get with perl 5.10 (64-bit): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== Initial Probability Array: 0.499978 0.500022 Transition Probability Matrix: 0.499978 0.500022 0.499978 0.500022 Emission Probability Matrix: 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 Log Probability of sequence 1: -521.808 Log Probability of sequence 2: -426.057 Statistical Training ==================== Initial Probability Array: 1 0 Transition Probability Matrix: ------------- EXCEPTION ------------- MSG: Sum of probabilities for each from-state must be 1.0; got 0.999999999999999976 STACK Bio::Tools::HMM::transition_prob /Users/cjfields/bioperl/bioperl- live/Bio/Tools/HMM.pm:499 STACK toplevel test.pl:82 ------------------------------------- make: *** [test_dynamic] Error 255 I'm assuming this needs to simply be rounded up to 1.0. That could be accomplished with something like 'if (sprintf("%.2f", $sum) != 1.0) {...}' 2) The second error is a little stranger. I have been randomly getting this: pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 When I add strict and warnings pragmas to Bio::Tools::HMM (with a little additional cleanup to get things running), I get an additional warning (arrow): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Argument "FL" isn't numeric in numeric lt (<) at /Users/cjfields/ bioperl/bioperl-live/Bio/Tools/HMM.pm line 188. <---- Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 So something is not being converted as expected. chris On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > When are you going to release 1.6? Maybe let me work on it before it > releases. If it doesn't resolve the problem, then we can think about > other alternatives. > > Also, please show me the latest errors you have for 5.10.0. > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 7:05 PM >> I'm still seeing the same errors on >> Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl >> (v5.8.8) passes fine now (as well as perl 5.8.8 on >> dev.open-bio.org). >> >> I'm wondering if this is a problem with my local perl >> build. I'm very tempted to push the HMM-related code >> into a separate distribution (bioperl-hmm) and make a CPAN >> release out of it so it gets wider testing via CPAN testers; >> it would just require a minimum bioperl 1.6 installation for >> Bio::Tools::HMM and any related modules. Yee, would >> that be okay with you? >> >> chris >> >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >> >>> >>> I just committed HMM.xs and typemap to SVN. Can you >> test it to confirm it works in 64-bit machines? >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Yee Man Chan" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 12:11 PM >>>> I'm not sure, but it makes more sense >>>> to commit these changes directly. Yee, need >> us to set >>>> you up with a commit bit? If so, fill out >> the >>>> information on this page: >>>> >>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>> >>>> and forward it to support at open-bio.org. >>>> I'll sponsor you. >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >>>> >>>>> The usual procedure for developing code is to >> exchange >>>> code via commits to a version control >> system. Yee, do >>>> you know how to use Subversion? Does Yee need a >> commit bit? >>>>> >>>>> Rob >>>>> >>>>> Yee Man Chan wrote: >>>>>> Hi Chris >>>>>> I find that there is a >> memory >>>> access bug in my code. Attached is the fixed >> HMM.xs. This >>>> file together with the simpler typemap should fix >> all >>>> problems. (I hope..) >>>>>> Please let me know if it >> works >>>> for you. >>>>>> Sorry for the bug... >>>>>> Yee Man >>>>>> --- On Fri, 8/14/09, Chris Fields >>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems >> with >>>> Bioperl-ext package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>> "Jonny Dalzell" , >>>> "BioPerl List" >>>>>>> Date: Friday, August 14, 2009, 8:31 >> AM >>>>>>> Yee Man, >>>>>>> >>>>>>> I tested this out locally (perl 5.8.8 >> 32-bit, >>>> perl 5.10.0 >>>>>>> 64-bit) and on dev.open-bio.org (which >> is perl >>>> 5.8.8, >>>>>>> appears to be 32-bit). The patch >> results >>>> in cleaning >>>>>>> up warnings for 5.10.0 but results in >> similar >>>> warnings for >>>>>>> 5.8.8 (linux or OS X). >>>>>>> >>>>>>> On OS X perl 5.8.8, this sometimes >> passes >>>> (note the first >>>>>>> attempt fails, the second succeeds), >> so it's >>>> not entirely a >>>>>>> 32-bit issue: >>>>>>> >>>>>>> http://gist.github.com/167860 >>>>>>> >>>>>>> OS X and perl 5.10.0, this always >> fails as the >>>> previous >>>>>>> gist shows, but demonstrates similar >> behavior >>>> (multiple >>>>>>> attempts to test get different >> responses): >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> On linux, everything passes with or >> w/o the >>>> patched files >>>>>>> (patched files have warnings as >> indicated >>>> above): >>>>>>> >>>>>>> Specs for all three perl executables >> (they >>>> vary a bit): >>>>>>> >>>>>>> http://gist.github.com/167883 >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Aug 14, 2009, at 3:27 AM, Yee Man >> Chan >>>> wrote: >>>>>>> >>>>>>>> Ah.. I find that the typemap can >> become as >>>> simple as >>>>>>> this >>>>>>>> ===================== >>>>>>>> TYPEMAP >>>>>>>> HMM * T_PTROBJ >>>>>>>> ===================== >>>>>>>> >>>>>>>> Then the generated HMM.c will have >> a >>>> function called >>>>>>> INT2PTR to do the pointer conversion. >> I >>>> believe this should >>>>>>> solve the warnings. >>>>>>>> Attached are the updated HMM.xs >> and >>>> typemap. Can >>>>>>> someone with a 64-bit machine give it >> a try? >>>>>>>> Thank you >>>>>>>> Yee Man >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>>> "Jonny Dalzell" , >>>>>>> "BioPerl List" >>>>>>>>> Date: Thursday, August 13, >> 2009, 5:31 >>>> PM >>>>>>>>> (just to point out to >> everyone, Yee >>>>>>>>> Man's contact information was >> in the >>>> POD) >>>>>>>>> >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I have the output in the below >> link: >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> There are similar problems >> popping up >>>> on 32- and >>>>>>> 64-bit >>>>>>>>> perl 5.10.0, Mac OS X 10.5. >>>> Haven't had time >>>>>>> to debug >>>>>>>>> it unfortunately. >>>>>>>>> >>>>>>>>> I think we should seriously >> consider >>>> spinning this >>>>>>> code off >>>>>>>>> into it's own distribution >> for >>>> CPAN. It's >>>>>>>>> unfortunately bit-rotting away >> in >>>>>>> bioperl-ext. If you >>>>>>>>> want to continue supporting it >> I can >>>> help set that >>>>>>> up. >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 6:58 PM, >> Yee Man >>>> Chan wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> So is >> this >>>> an HMM only >>>>>>> problem? Or does >>>>>>>>> it apply to other bioperl-ext >>>> modules? >>>>>>>>>> What >>>> exactly are the >>>>>>> compilation errors >>>>>>>>> for HMM? I believe my >> implementation >>>> is just a >>>>>>> simple one >>>>>>>>> based on Rabiner's paper. >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>> >>>>>>>>>> I >> don't >>>> think I did >>>>>>> anything fancy that >>>>>>>>> makes it machine dependent or >> non-ANSI >>>> C. >>>>>>>>>> Yee Man >>>>>>>>>> >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Robert Buels" >> >>>>>>>>>>> Cc: "Jonny Dalzell" >> , >>>>>>>>> "BioPerl List" , >>>>>>>>> "Yee Man Chan" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 3:18 PM >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 4:37 PM, >>>> Robert Buels >>>>>>> wrote: >>>>>>>>>>>> Jonny Dalzell >> wrote: >>>>>>>>>>>>> Is it >> ridiculous of me >>>> to expect >>>>>>> ubuntu to >>>>>>>>> take >>>>>>>>>>> care of this for >> me? How >>>> do >>>>>>>>>>>>> I go about >> compiling >>>> the HMM? >>>>>>>>>>>> Yes. This is >> a very >>>> specialized >>>>>>> thing >>>>>>>>> that >>>>>>>>>>> you're doing, and >> Ubuntu does >>>> not have >>>>>>> the >>>>>>>>> resources to >>>>>>>>>>> package every single >> thing. >>>>>>>>>>>> Unfortunately, it >> looks >>>> like >>>>>>> bioperl-ext >>>>>>>>> package is >>>>>>>>>>> not installable under >> Ubuntu >>>> 9.04 anyway, >>>>>>> which is >>>>>>>>> what I'm >>>>>>>>>>> running. For >> others on >>>> this list, >>>>>>> if >>>>>>>>> somebody is >>>>>>>>>>> interested in doing >>>> maintaining it, I'd be >>>>>>> happy >>>>>>>>> to help out >>>>>>>>>>> by testing on >> Debian-based >>>> Linux >>>>>>> platforms. >>>>>>>>> We need to >>>>>>>>>>> clarify this >> package's >>>> maintenance status: >>>>>>> if >>>>>>>>> there is >>>>>>>>>>> nobody interested in >>>> maintaining it, I >>>>>>> would >>>>>>>>> recommend that >>>>>>>>>>> bioperl-ext be removed >> from >>>> distribution. >>>>>>>>> It's not in >>>>>>>>>>> anybody's interest to >> have >>>> unmaintained >>>>>>> software >>>>>>>>> out there >>>>>>>>>>> causing confusion. >>>>>>>>>>> >>>>>>>>>>> I have cc'd Yee Man >> Chan for >>>> this. >>>>>>> If there >>>>>>>>> isn't a >>>>>>>>>>> response or the >> message >>>> bounces, we do one >>>>>>> of two >>>>>>>>> things: >>>>>>>>>>> 1) consider it >> deprecated >>>> (probably >>>>>>> safest). >>>>>>>>>>> 2) spin it out into a >> separate >>>> module. >>>>>>>>>>> >>>>>>>>>>> Just tried to comile >> it myself >>>> and am >>>>>>> getting >>>>>>>>> errors (using >>>>>>>>>>> 64bit perl 5.10), so I >> think, >>>> unless >>>>>>> someone wants >>>>>>>>> to take >>>>>>>>>>> this on, option #1 is >> best. >>>>>>>>>>> >>>>>>>>>>>> So Jonny, in >> short, I >>>> would say "do >>>>>>> not use >>>>>>>>>>> bioperl-ext". >>>>>>>>>>> >>>>>>>>>>> In general, that's a >> safe >>>> bet. We're >>>>>>> moving >>>>>>>>> most of >>>>>>>>>>> our C/C++ bindings to >> BioLib. >>>>>>>>>>> >>>>>>>>>>>> Step back. >> What are >>>> you trying >>>>>>> to >>>>>>>>>>> accomplish? >> Chris >>>> already >>>>>>> recommended some >>>>>>>>> alternative >>>>>>>>>>> methods in his email >> of 8/11 >>>> on this >>>>>>>>> subject. Perhaps >>>>>>>>>>> we can guide you to >> some >>>> software that is >>>>>>>>> actively >>>>>>>>>>> maintained and will >> meet your >>>> needs. >>>>>>>>>>>> Rob >>>>>>>>>>> Exactly. Lots of >> other >>>> (better >>>>>>> supported!) >>>>>>>>> options >>>>>>>>>>> out there. >> HMMER, SeqAn, >>>> and >>>>>>> others. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> __________________________________________________ >>>>>>>> Do You Yahoo!? >>>>>>>> Tired of spam? Yahoo! Mail >> has the >>>> best spam >>>>>>> protection around >>>>>>>> http://mail.yahoo.com >>>>>>> >>>> >> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>> >>>>> >>>>> --Robert Buels >>>>> Bioinformatics Analyst, Sol Genomics Network >>>>> Boyce Thompson Institute for Plant Research >>>>> Tower Rd >>>>> Ithaca, NY 14853 >>>>> Tel: 503-889-8539 >>>>> rmb32 at cornell.edu >>>>> http://www.sgn.cornell.edu >>>> >>>> >>> >>> >>> >> >> > > > From abhishek.vit at gmail.com Sun Aug 16 04:06:49 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 04:06:49 -0400 Subject: [Bioperl-l] About binning data for histograms Message-ID: Hi All After a lot of look up on forums I could google, I am finally posting my question here. I think it may not be appropriate for this mailing list. I apologize for this first up. The question is regarding dynamic binning of data points for histogram plots. So I have many hashes, each having a "numerical" coverage data obtained from Next generation sequencing data analysis. Now each hash may have couple of hundred to thousands entry "contig_name => coverage". What I want to do is to plot a histogram for each hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N has to be binned according to the data size). I am using Chart::Gnuplot for this but I am not able to figure out how to bin the data points to fit nicely on a screen. Is there any smart/quick method to do this. Any pointers will help a great deal. Best Regards, -Abhi From bix at sendu.me.uk Sun Aug 16 05:21:11 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 16 Aug 2009 10:21:11 +0100 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <4A87CF87.7030803@sendu.me.uk> Abhishek Pratap wrote: > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width Like it says, it depends on the data, but it's worth trying them out to see if one of them gives you anything sensible. From sdavis2 at mail.nih.gov Sun Aug 16 07:48:23 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 16 Aug 2009 07:48:23 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <264855a00908160448i2691fc08t472fc0d83afbb356@mail.gmail.com> On Sun, Aug 16, 2009 at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > Hi, Abhi. You could use R, but you got that already. ; ) However, you might look here for a perl solution. http://search.cpan.org/~whizdog/GDGraph-histogram-1.1/lib/GD/Graph/histogram.pm Sean From cjfields at illinois.edu Sun Aug 16 08:53:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 07:53:29 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <217259.7083.qm@web30408.mail.mud.yahoo.com> References: <217259.7083.qm@web30408.mail.mud.yahoo.com> Message-ID: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> That worked! Thanks Yee Man! chris ps - let me know how you want to deal with a release. On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > Hi Chris > > Thanks for your suggestions. I think it is indeed better to check > sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm > > I also fixed codes that will lead to warnings with use warnings. > > So now the only problem left is that "monotonic increasing" error. > For that part of the code, I was trying to perform an expectation > maximization step. Theoretically, the expectation should > monotonically increase in every step. But I suppose this is not > necessarily true when double precision floating point numbers are > involved. I don't know why I used a 1e-100 tolerance for this. > Therefore I "fixed" it by using the same tolerance to terminate the > maximization step (ie .000001). I suppose this "fix" will make it > much more unlikely to throw exception with your 5.10.0 perl. > > Can you give that a try again and see if it works now. > > Thank you > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 10:38 PM >> Yee, >> >> I took the liberty of making a few simple changes to >> Bio::Tools::HMM in svn to point out the problem and possible >> solutions. Feel free to revert these as needed. >> >> I'm seeing two errors, which appear randomly when running >> 'make test'. The first is easily fixable, the second, >> I'm not so sure. I'll let you make the decisions on >> both. >> >> 1) There is an assumption in the module that, when >> adding floating points, you will always get 1.0. You >> may run into problems: see 'perldoc -q long decimals'. >> Lines like this (two places in the module): >> ... >> if ($sum != 1.0) { >> $self->throw("Sum of >> probabilities for each state must be 1.0; got $sum\n"); >> } >> ... >> >> won't work as expected (note I added a simple diagnostic, >> just print out the 'bad' sum). With perl 5.8.8, this >> appears to work fine, but this is what I get with perl 5.10 >> (64-bit): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> Initial Probability Array: >> 0.499978 0.500022 >> Transition Probability Matrix: >> 0.499978 0.500022 >> 0.499978 0.500022 >> Emission Probability Matrix: >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> >> Log Probability of sequence 1: -521.808 >> Log Probability of sequence 2: -426.057 >> >> Statistical Training >> ==================== >> Initial Probability Array: >> 1 0 >> Transition Probability Matrix: >> >> ------------- EXCEPTION ------------- >> MSG: Sum of probabilities for each from-state must be 1.0; >> got 0.999999999999999976 >> >> STACK Bio::Tools::HMM::transition_prob >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 >> STACK toplevel test.pl:82 >> ------------------------------------- >> >> make: *** [test_dynamic] Error 255 >> >> I'm assuming this needs to simply be rounded up to >> 1.0. That could be accomplished with something like >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' >> >> 2) The second error is a little stranger. I have been >> randomly getting this: >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> When I add strict and warnings pragmas to Bio::Tools::HMM >> (with a little additional cleanup to get things running), I >> get an additional warning (arrow): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Argument "FL" isn't numeric in numeric lt (<) at >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line >> 188. <---- >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> So something is not being converted as expected. >> >> chris >> >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: >> >>> When are you going to release 1.6? Maybe let me work >> on it before it releases. If it doesn't resolve the problem, >> then we can think about other alternatives. >>> >>> Also, please show me the latest errors you have for >> 5.10.0. >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 7:05 PM >>>> I'm still seeing the same errors on >>>> Mac OS X for 64-bit perl 5.10.0. Mac OS X, >> native perl >>>> (v5.8.8) passes fine now (as well as perl 5.8.8 >> on >>>> dev.open-bio.org). >>>> >>>> I'm wondering if this is a problem with my local >> perl >>>> build. I'm very tempted to push the >> HMM-related code >>>> into a separate distribution (bioperl-hmm) and >> make a CPAN >>>> release out of it so it gets wider testing via >> CPAN testers; >>>> it would just require a minimum bioperl 1.6 >> installation for >>>> Bio::Tools::HMM and any related modules. >> Yee, would >>>> that be okay with you? >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >>>> >>>>> >>>>> I just committed HMM.xs and typemap to SVN. >> Can you >>>> test it to confirm it works in 64-bit machines? >>>>> >>>>> Thanks >>>>> Yee Man >>>>> >>>>> --- On Sat, 8/15/09, Chris Fields >>>> wrote: >>>>> >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Yee Man Chan" , >>>> "BioPerl List" >>>>>> Date: Saturday, August 15, 2009, 12:11 PM >>>>>> I'm not sure, but it makes more sense >>>>>> to commit these changes directly. >> Yee, need >>>> us to set >>>>>> you up with a commit bit? If so, >> fill out >>>> the >>>>>> information on this page: >>>>>> >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>>>> >>>>>> and forward it to support at open-bio.org. >>>>>> I'll sponsor you. >>>>>> >>>>>> chris >>>>>> >>>>>> On Aug 15, 2009, at 11:44 AM, Robert Buels >> wrote: >>>>>> >>>>>>> The usual procedure for developing >> code is to >>>> exchange >>>>>> code via commits to a version control >>>> system. Yee, do >>>>>> you know how to use Subversion? Does Yee >> need a >>>> commit bit? >>>>>>> >>>>>>> Rob >>>>>>> >>>>>>> Yee Man Chan wrote: >>>>>>>> Hi Chris >>>>>>>> I find >> that there is a >>>> memory >>>>>> access bug in my code. Attached is the >> fixed >>>> HMM.xs. This >>>>>> file together with the simpler typemap >> should fix >>>> all >>>>>> problems. (I hope..) >>>>>>>> Please let >> me know if it >>>> works >>>>>> for you. >>>>>>>> Sorry for the bug... >>>>>>>> Yee Man >>>>>>>> --- On Fri, 8/14/09, Chris Fields >> >>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems >>>> with >>>>>> Bioperl-ext package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>> "Jonny Dalzell" , >>>>>> "BioPerl List" >>>>>>>>> Date: Friday, August 14, 2009, >> 8:31 >>>> AM >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I tested this out locally >> (perl 5.8.8 >>>> 32-bit, >>>>>> perl 5.10.0 >>>>>>>>> 64-bit) and on >> dev.open-bio.org (which >>>> is perl >>>>>> 5.8.8, >>>>>>>>> appears to be 32-bit). >> The patch >>>> results >>>>>> in cleaning >>>>>>>>> up warnings for 5.10.0 but >> results in >>>> similar >>>>>> warnings for >>>>>>>>> 5.8.8 (linux or OS X). >>>>>>>>> >>>>>>>>> On OS X perl 5.8.8, this >> sometimes >>>> passes >>>>>> (note the first >>>>>>>>> attempt fails, the second >> succeeds), >>>> so it's >>>>>> not entirely a >>>>>>>>> 32-bit issue: >>>>>>>>> >>>>>>>>> http://gist.github.com/167860 >>>>>>>>> >>>>>>>>> OS X and perl 5.10.0, this >> always >>>> fails as the >>>>>> previous >>>>>>>>> gist shows, but demonstrates >> similar >>>> behavior >>>>>> (multiple >>>>>>>>> attempts to test get >> different >>>> responses): >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> On linux, everything passes >> with or >>>> w/o the >>>>>> patched files >>>>>>>>> (patched files have warnings >> as >>>> indicated >>>>>> above): >>>>>>>>> >>>>>>>>> Specs for all three perl >> executables >>>> (they >>>>>> vary a bit): >>>>>>>>> >>>>>>>>> http://gist.github.com/167883 >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 14, 2009, at 3:27 AM, >> Yee Man >>>> Chan >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Ah.. I find that the >> typemap can >>>> become as >>>>>> simple as >>>>>>>>> this >>>>>>>>>> ===================== >>>>>>>>>> TYPEMAP >>>>>>>>>> HMM * >> T_PTROBJ >>>>>>>>>> ===================== >>>>>>>>>> >>>>>>>>>> Then the generated HMM.c >> will have >>>> a >>>>>> function called >>>>>>>>> INT2PTR to do the pointer >> conversion. >>>> I >>>>>> believe this should >>>>>>>>> solve the warnings. >>>>>>>>>> Attached are the updated >> HMM.xs >>>> and >>>>>> typemap. Can >>>>>>>>> someone with a 64-bit machine >> give it >>>> a try? >>>>>>>>>> Thank you >>>>>>>>>> Yee Man >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Yee Man Chan" >> >>>>>>>>>>> Cc: "Robert Buels" >> , >>>>>>>>> "Jonny Dalzell" , >>>>>>>>> "BioPerl List" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 5:31 >>>>>> PM >>>>>>>>>>> (just to point out to >>>> everyone, Yee >>>>>>>>>>> Man's contact >> information was >>>> in the >>>>>> POD) >>>>>>>>>>> >>>>>>>>>>> Yee Man, >>>>>>>>>>> >>>>>>>>>>> I have the output in >> the below >>>> link: >>>>>>>>>>> >>>>>>>>>>> http://gist.github.com/167542 >>>>>>>>>>> >>>>>>>>>>> There are similar >> problems >>>> popping up >>>>>> on 32- and >>>>>>>>> 64-bit >>>>>>>>>>> perl 5.10.0, Mac OS X >> 10.5. >>>>>> Haven't had time >>>>>>>>> to debug >>>>>>>>>>> it unfortunately. >>>>>>>>>>> >>>>>>>>>>> I think we should >> seriously >>>> consider >>>>>> spinning this >>>>>>>>> code off >>>>>>>>>>> into it's own >> distribution >>>> for >>>>>> CPAN. It's >>>>>>>>>>> unfortunately >> bit-rotting away >>>> in >>>>>>>>> bioperl-ext. If you >>>>>>>>>>> want to continue >> supporting it >>>> I can >>>>>> help set that >>>>>>>>> up. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 6:58 PM, >>>> Yee Man >>>>>> Chan wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> >> So is >>>> this >>>>>> an HMM only >>>>>>>>> problem? Or does >>>>>>>>>>> it apply to other >> bioperl-ext >>>>>> modules? >>>>>>>>>>>> >> What >>>>>> exactly are the >>>>>>>>> compilation errors >>>>>>>>>>> for HMM? I believe my >>>> implementation >>>>>> is just a >>>>>>>>> simple one >>>>>>>>>>> based on Rabiner's >> paper. >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>>>> >>>>>>>>>>>> >> I >>>> don't >>>>>> think I did >>>>>>>>> anything fancy that >>>>>>>>>>> makes it machine >> dependent or >>>> non-ANSI >>>>>> C. >>>>>>>>>>>> Yee Man >>>>>>>>>>>> >>>>>>>>>>>> --- On Thu, >> 8/13/09, Chris >>>> Fields >>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> From: Chris >> Fields >>>> >>>>>>>>>>>>> Subject: Re: >>>> [Bioperl-l] >>>>>> Problems with >>>>>>>>> Bioperl-ext >>>>>>>>>>> package on WinVista? >>>>>>>>>>>>> To: "Robert >> Buels" >>>> >>>>>>>>>>>>> Cc: "Jonny >> Dalzell" >>>> , >>>>>>>>>>> "BioPerl List" , >>>>>>>>>>> "Yee Man Chan" >>>>>>>>>>>>> Date: >> Thursday, August >>>> 13, >>>>>> 2009, 3:18 PM >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 13, >> 2009, at >>>> 4:37 PM, >>>>>> Robert Buels >>>>>>>>> wrote: >>>>>>>>>>>>>> Jonny >> Dalzell >>>> wrote: >>>>>>>>>>>>>>> Is it >>>> ridiculous of me >>>>>> to expect >>>>>>>>> ubuntu to >>>>>>>>>>> take >>>>>>>>>>>>> care of this >> for >>>> me? How >>>>>> do >>>>>>>>>>>>>>> I go >> about >>>> compiling >>>>>> the HMM? >>>>>>>>>>>>>> Yes. >> This is >>>> a very >>>>>> specialized >>>>>>>>> thing >>>>>>>>>>> that >>>>>>>>>>>>> you're doing, >> and >>>> Ubuntu does >>>>>> not have >>>>>>>>> the >>>>>>>>>>> resources to >>>>>>>>>>>>> package every >> single >>>> thing. >>>>>>>>>>>>>> >> Unfortunately, it >>>> looks >>>>>> like >>>>>>>>> bioperl-ext >>>>>>>>>>> package is >>>>>>>>>>>>> not >> installable under >>>> Ubuntu >>>>>> 9.04 anyway, >>>>>>>>> which is >>>>>>>>>>> what I'm >>>>>>>>>>>>> running. >> For >>>> others on >>>>>> this list, >>>>>>>>> if >>>>>>>>>>> somebody is >>>>>>>>>>>>> interested in >> doing >>>>>> maintaining it, I'd be >>>>>>>>> happy >>>>>>>>>>> to help out >>>>>>>>>>>>> by testing on >>>> Debian-based >>>>>> Linux >>>>>>>>> platforms. >>>>>>>>>>> We need to >>>>>>>>>>>>> clarify this >>>> package's >>>>>> maintenance status: >>>>>>>>> if >>>>>>>>>>> there is >>>>>>>>>>>>> nobody >> interested in >>>>>> maintaining it, I >>>>>>>>> would >>>>>>>>>>> recommend that >>>>>>>>>>>>> bioperl-ext be >> removed >>>> from >>>>>> distribution. >>>>>>>>>>> It's not in >>>>>>>>>>>>> anybody's >> interest to >>>> have >>>>>> unmaintained >>>>>>>>> software >>>>>>>>>>> out there >>>>>>>>>>>>> causing >> confusion. >>>>>>>>>>>>> >>>>>>>>>>>>> I have cc'd >> Yee Man >>>> Chan for >>>>>> this. >>>>>>>>> If there >>>>>>>>>>> isn't a >>>>>>>>>>>>> response or >> the >>>> message >>>>>> bounces, we do one >>>>>>>>> of two >>>>>>>>>>> things: >>>>>>>>>>>>> 1) consider >> it >>>> deprecated >>>>>> (probably >>>>>>>>> safest). >>>>>>>>>>>>> 2) spin it out >> into a >>>> separate >>>>>> module. >>>>>>>>>>>>> >>>>>>>>>>>>> Just tried to >> comile >>>> it myself >>>>>> and am >>>>>>>>> getting >>>>>>>>>>> errors (using >>>>>>>>>>>>> 64bit perl >> 5.10), so I >>>> think, >>>>>> unless >>>>>>>>> someone wants >>>>>>>>>>> to take >>>>>>>>>>>>> this on, >> option #1 is >>>> best. >>>>>>>>>>>>> >>>>>>>>>>>>>> So Jonny, >> in >>>> short, I >>>>>> would say "do >>>>>>>>> not use >>>>>>>>>>>>> bioperl-ext". >>>>>>>>>>>>> >>>>>>>>>>>>> In general, >> that's a >>>> safe >>>>>> bet. We're >>>>>>>>> moving >>>>>>>>>>> most of >>>>>>>>>>>>> our C/C++ >> bindings to >>>> BioLib. >>>>>>>>>>>>> >>>>>>>>>>>>>> Step >> back. >>>> What are >>>>>> you trying >>>>>>>>> to >>>>>>>>>>>>> accomplish? >>>> Chris >>>>>> already >>>>>>>>> recommended some >>>>>>>>>>> alternative >>>>>>>>>>>>> methods in his >> email >>>> of 8/11 >>>>>> on this >>>>>>>>>>> subject. >> Perhaps >>>>>>>>>>>>> we can guide >> you to >>>> some >>>>>> software that is >>>>>>>>>>> actively >>>>>>>>>>>>> maintained and >> will >>>> meet your >>>>>> needs. >>>>>>>>>>>>>> Rob >>>>>>>>>>>>> Exactly. >> Lots of >>>> other >>>>>> (better >>>>>>>>> supported!) >>>>>>>>>>> options >>>>>>>>>>>>> out there. >>>> HMMER, SeqAn, >>>>>> and >>>>>>>>> others. >>>>>>>>>>>>> chris >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> __________________________________________________ >>>>>>>>>> Do You Yahoo!? >>>>>>>>>> Tired of spam? >> Yahoo! Mail >>>> has the >>>>>> best spam >>>>>>>>> protection around >>>>>>>>>> http://mail.yahoo.com >>>>>>>>> >>>>>> >>>> >> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --Robert Buels >>>>>>> Bioinformatics Analyst, Sol Genomics >> Network >>>>>>> Boyce Thompson Institute for Plant >> Research >>>>>>> Tower Rd >>>>>>> Ithaca, NY 14853 >>>>>>> Tel: 503-889-8539 >>>>>>> rmb32 at cornell.edu >>>>>>> http://www.sgn.cornell.edu >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > From hlapp at gmx.net Sun Aug 16 11:07:39 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:07:39 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > I'm assuming this needs to simply be rounded up to 1.0. That could > be accomplished with something like 'if (sprintf("%.2f", $sum) != > 1.0) {...}' Couldn't you just test for the absolute difference being smaller than some reasonable epsilon? That might be more efficient (and more explicit) than printing to a string. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 16 11:13:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:13:54 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > Not sure about calling it bioperl-phylo (which might be confused > with Rutger's Bio::Phylo). Frankly, it seems to me that either is more powerful in combination with the other, so I don't quite see how the name suggesting some linkage isn't a Good Thing rather than bad. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Sun Aug 16 11:42:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:42:50 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> Message-ID: On Aug 16, 2009, at 10:07 AM, Hilmar Lapp wrote: > > On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > >> I'm assuming this needs to simply be rounded up to 1.0. That could >> be accomplished with something like 'if (sprintf("%.2f", $sum) != >> 1.0) {...}' > > > Couldn't you just test for the absolute difference being smaller > than some reasonable epsilon? That might be more efficient (and more > explicit) than printing to a string. > > -hilmar Yes, either way is fine. Re: floating point and sprintf, acc. to the perlfaq4, as perl doesn't have a round() function the sprintf() idiom is suggested (and commonly used). chris From cjfields at illinois.edu Sun Aug 16 11:48:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:48:52 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > >> Not sure about calling it bioperl-phylo (which might be confused >> with Rutger's Bio::Phylo). > > > Frankly, it seems to me that either is more powerful in combination > with the other, so I don't quite see how the name suggesting some > linkage isn't a Good Thing rather than bad. > > -hilmar I don't have a problem as long as there is some emphasis they are two separate, but related, projects. There is quite a bit of crossover between the two (particularly with the last few bioperl-related GSoC projects), but I would rather not have to worry about users emailing the list wondering why something in bioperl-phylo doesn't work when they installed Bio::Phylo instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended module with bioperl-phylo to alleviate that? chris From maj at fortinbras.us Sun Aug 16 12:59:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 16 Aug 2009 12:59:40 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: <44D32BE895F446A9917A5550485AB102@NewLife> I see both points- I think Chris's suggestion is good. The nexml support won't work without Bio::Phylo, but not everyone will need that support, so if the install can be chatty about this that would be great- ----- Original Message ----- From: "Chris Fields" To: "Hilmar Lapp" Cc: "BioPerl List" ; "Mark A. Jensen" ; "chase Miller" Sent: Sunday, August 16, 2009 11:48 AM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > >> On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: >> >>> Not sure about calling it bioperl-phylo (which might be confused with >>> Rutger's Bio::Phylo). >> >> >> Frankly, it seems to me that either is more powerful in combination with the >> other, so I don't quite see how the name suggesting some linkage isn't a >> Good Thing rather than bad. >> >> -hilmar > > I don't have a problem as long as there is some emphasis they are two > separate, but related, projects. There is quite a bit of crossover between > the two (particularly with the last few bioperl-related GSoC projects), but I > would rather not have to worry about users emailing the list wondering why > something in bioperl-phylo doesn't work when they installed Bio::Phylo > instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended > module with bioperl-phylo to alleviate that? > > chris > > From rmb32 at cornell.edu Sun Aug 16 13:16:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 16 Aug 2009 10:16:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <44D32BE895F446A9917A5550485AB102@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> Message-ID: <4A883EE2.3060101@cornell.edu> Mark A. Jensen wrote: > I see both points- I think Chris's suggestion is good. The nexml support > won't work without Bio::Phylo, but not everyone will need that support, > so if the install can be chatty about this that would be great- Maybe the parts that have differing dependencies should be in different distros then? Rob From jason at bioperl.org Sun Aug 16 13:25:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 13:25:08 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> For binning of a distribution see the perl module Statistics::Descriptive - http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm function: frequency_distritibution I would also look at R histogram function for the plotting. This would be one of the easiest ways - I would just make a perl script that generates the correct R code that can be used to make the plots. On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > > Best Regards, > -Abhi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From abhishek.vit at gmail.com Sun Aug 16 13:34:54 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 13:34:54 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> References: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> Message-ID: Thanks All. I completely forgot and dint realize that histogram function in R could auto bin based on the data. Cheers, -Abhi On Sun, Aug 16, 2009 at 1:25 PM, Jason Stajich wrote: > For binning of a distribution see the perl module Statistics::Descriptive - > http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm?function: > frequency_distritibution > > I would also look at R histogram function for the plotting. ?This would be > one of the easiest ways - I would just make a perl script that generates the > correct R code that can be used to make the plots. > > > On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > >> Hi All >> >> After a lot of look up on forums I could google, I am finally posting >> my question here. I think it may not be appropriate for this mailing >> list. I apologize for this first up. The question is regarding dynamic >> binning of data points for histogram plots. >> >> So I have many hashes, each having a "numerical" coverage data >> obtained from Next generation sequencing data analysis. Now each hash >> may have couple of hundred to thousands entry "contig_name => >> coverage". ?What I want to do is to plot a histogram for each >> hash/dataset. ?"Coverage v/s Count of contigs with coverage > #N " ( N >> has to be binned according to the data size). >> >> I am using Chart::Gnuplot for this but I am not able to figure out how >> to bin the data points to fit nicely on a screen. Is there any >> smart/quick method to do this. >> >> Any pointers will help a great deal. >> >> Best Regards, >> -Abhi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From robert.bradbury at gmail.com Sun Aug 16 15:16:09 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 16 Aug 2009 15:16:09 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? Message-ID: Hello, I am trying to use get_sequence() to fetch the sequence NS_000198 for the fungus *Podospora anserina* with the databases "GenBank" and when that didn't work "Gene". This is a simple script which fetches the sequence then writes out the fasta and genbank files from the data structure. The errors I got suggested that the system was running out of memory which I thought was unlikely since I've got something like 3GB of main memory and 9GB of swap space. After running strace on the script (which takes a while) I determined that the brk() calls were generating ENOMEM at ~3GB. This turns out to be due to the limit of the Linux memory model I am using (3GB/1GB) on a Pentium IV (Prescott). Now, I think the total genome size for the fungus is ~70MB but haven't verified this so I "should" be able to fetch it unless Bioperl (or perl itself) is doing extremely poor memory management (perhaps not coalescing memory segments into one large sequence) as the reads take place? [1]. Has anyone encountered this problem (fetching say large mammalian chromosomes)? Does anyone know what the limits are for "fetching" sequence files (on 32/64 bit machines?. The reason I am using get_sequence and BioPerl is that I can't seem to find the *Podospora anserina* sequence in a FTP database anywhere (so I can't use "wget or ftp"). I haven't tested accessing the GenBank file in a browser (I don't know what browsers would do with a HTML file that large but suspect it would not be pretty). Thanks in advance, Robert Bradbury 1. The strace seems to indicate periodic brk() calls to expand the process data segment size between which there are lots of read() calls of size 4096, presumably reading the socket from NCBI. I don't know if there is an easy way to trace perl's memory allocation/manipulation at a higher level. From jason at bioperl.org Sun Aug 16 15:22:35 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 15:22:35 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? In-Reply-To: References: Message-ID: <93672502-26EB-4C30-A37E-F3B593E57279@bioperl.org> Robert - Posting your script will help us replicate and diagnose - I am not sure which GenBank fetch option you are using. I have a feeling it is trying to do recursive calls to stitch together the pseudoscaffold. I presume it works find though if you request the each chromosome scaffold like CU607053,CU633438, ... I guess posting it via a bugzilla bug is the best way unless you have a git account and wanted to post it as a 'gist'. -jason -- Jason Stajich jason at bioperl.org http://fungalgenomes.org/ On Aug 16, 2009, at 3:16 PM, Robert Bradbury wrote: > Hello, > > I am trying to use get_sequence() to fetch the sequence NS_000198 > for the > fungus *Podospora anserina* with the databases "GenBank" and when that > didn't work "Gene". This is a simple script which fetches the > sequence then > writes out the fasta and genbank files from the data structure. > > The errors I got suggested that the system was running out of memory > which I > thought was unlikely since I've got something like 3GB of main > memory and > 9GB of swap space. After running strace on the script (which takes > a while) > I determined that the brk() calls were generating ENOMEM at ~3GB. > This > turns out to be due to the limit of the Linux memory model I am using > (3GB/1GB) on a Pentium IV (Prescott). > > Now, I think the total genome size for the fungus is ~70MB but haven't > verified this so I "should" be able to fetch it unless Bioperl (or > perl > itself) is doing extremely poor memory management (perhaps not > coalescing > memory segments into one large sequence) as the reads take place? [1]. > > Has anyone encountered this problem (fetching say large mammalian > chromosomes)? Does anyone know what the limits are for "fetching" > sequence > files (on 32/64 bit machines?. The reason I am using get_sequence and > BioPerl is that I can't seem to find the *Podospora anserina* > sequence in a > FTP database anywhere (so I can't use "wget or ftp"). I haven't > tested > accessing the GenBank file in a browser (I don't know what browsers > would do > with a HTML file that large but suspect it would not be pretty). > > Thanks in advance, > Robert Bradbury > > 1. The strace seems to indicate periodic brk() calls to expand the > process > data segment size between which there are lots of read() calls of > size 4096, > presumably reading the socket from NCBI. I don't know if there is > an easy > way to trace perl's memory allocation/manipulation at a higher level. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Aug 16 15:42:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 14:42:56 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A883EE2.3060101@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> <4A883EE2.3060101@cornell.edu> Message-ID: <69B8C887-1C5E-47B4-9168-8509BB0A5528@illinois.edu> On Aug 16, 2009, at 12:16 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> I see both points- I think Chris's suggestion is good. The nexml >> support >> won't work without Bio::Phylo, but not everyone will need that >> support, >> so if the install can be chatty about this that would be great- > > Maybe the parts that have differing dependencies should be in > different distros then? > > Rob I'm guessing large chunks of that code would have Bio::Root::Root as a base, so I think maintaining related code split into two distributions too problematic. Simple to indicate that Bio::Phylo is required only for NeXML (so listing it as a 'recommends') and keep everything NeXML- related and requiring Bio::Root::Root in one spot. It's possible something inheriting from Bio::Phylo could go there, but that's up to Rutger. chris From maj at fortinbras.us Mon Aug 17 08:43:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 08:43:33 -0400 Subject: [Bioperl-l] new NeXML I/O modules Message-ID: Hi All- I'm pleased to announce that my Google Summer of Code student Chase Miller and I have successfully migrated his modules for NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is Rutger Vos' highly flexible, highly annotable standard for evolutionary data exchange, that is catching on in the evolutionary DB world. We hope these modules will help move that process along. I also want to say that Chase has been a terrific student and collaborator. He learned the not only the complexities of BioPerl IO from scratch, but also grokked Rutger's Bio::Phylo internals, and became familiar with and applied modern OO concepts. He also wrote tests (which pass!), complete POD, and a HOWTO (at http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this work. Best of all, he finished! (Well, as much as anything is ever finished around here.) I for one hope he will continue to use his commit bit for good and not evil. cheers, Mark From deequan at gmail.com Mon Aug 17 09:06:44 2009 From: deequan at gmail.com (David Quan) Date: Mon, 17 Aug 2009 09:06:44 -0400 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? Message-ID: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Hello there, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Hints and recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From akarger at CGR.Harvard.edu Mon Aug 17 09:04:29 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 17 Aug 2009 09:04:29 -0400 Subject: [Bioperl-l] on BP documentation References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger From cjfields at illinois.edu Mon Aug 17 10:06:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:06:15 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: Congrats Chase! chris On Aug 17, 2009, at 7:43 AM, Mark A. Jensen wrote: > Hi All- > > I'm pleased to announce that my Google Summer of Code student > Chase Miller and I have successfully migrated his modules for > NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is > Rutger Vos' highly flexible, highly annotable standard for > evolutionary data exchange, that is catching on in the > evolutionary DB world. We hope these modules will help move that > process along. > > I also want to say that Chase has been a terrific student and > collaborator. He learned the not only the complexities of BioPerl > IO from scratch, but also grokked Rutger's Bio::Phylo internals, > and became familiar with and applied modern OO concepts. He also > wrote tests (which pass!), complete POD, and a HOWTO (at > http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this > work. Best of all, he finished! (Well, as much as anything is > ever finished around here.) I for one hope he will continue to > use his commit bit for good and not evil. > > cheers, > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:22:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:22:26 -0500 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? In-Reply-To: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> References: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Message-ID: <74D10663-5770-43DA-ABDB-27FA5D532497@illinois.edu> That's possible, yes. Use the hit information and use Bio::DB::GenBank to pull the sequence out, in the below example. Note that strand is different than BioPerl's -1/0/1; efetch strand: 1 = normal (default), 2 = comp. ================================ my $factory = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $seqstart, -seq_stop => $seqend, -strand => $strand, # 1=plus, 2=minus ); $factory->get_Seq_by_id($id); # should be UID, use get_Seq_by_acc() for accessions ================================ This pulls everything into a Bio::Seq, though, so you'll need to push it out to a SeqIO output stream. You can also use Bio::DB::EUtilities to get the raw sequence via efetch, something like (untested): ================================ my $fetcher = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'gb'); # loop: for each hit/HSP, grab sequence... my $fetcher->set_parameters( -id => $id # UID or accession -seq_start => $seqstart, # hit start -seq_stop => $seqend, # hit end -strand => $strand # 1=plus, 2=minus ); # then get raw content $fetcher->get_Response(-file => ">$id.gb"); ================================ You could probably plug into ENSembl similarly if the db versions match; see: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences chris On Aug 17, 2009, at 8:06 AM, David Quan wrote: > Hello there, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Hints and recommendations for > elements that would be capable of behaving in such a way would be > greatly appreciated. Thanks very much. > > David N. Quan > > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:47:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:47:31 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: On Aug 17, 2009, at 8:04 AM, Amir Karger wrote: >> -----Original Message----- >> From: Mark A. Jensen [mailto:maj at fortinbras.us] >> >> From: "Hilmar Lapp" >> ... >>> As for the FASTA example, I can understand - I've heard >> repeatedly >>> from people that one of the things that they are missing is >>> documentation for every SeqIO format we support (such as >> GenBank, >>> UniProt, FASTA, etc) about where to find a particular piece of >> the >>> format in the object model. >> >> This is the right thread for list lurkers to contribute their betes >> noires >> such as this one. I encourage ALL to post these issues and help >> create >> our list of action items. >> MAJ > > I wish you the best of luck on this ambitious and crucial project. I > teach intro Perl classes to biologists and always tell them that > Bioperl > is amazingly useful, but only if you can figure out how to use it. If > what you want to do isn't in the howtos, you can be in big trouble. > > I was trying to remember specific examples of where I've gotten lost, > and unfortunately can't give any. But I can tell you that often I've > run > into trouble because the particular method I'm looking for is three > parent classes away from the module I'm actually looking at. The > deobfuscator helps some, but only for people who know about that. Do > you > think you could automate a tool that would add the following to the > bottom of each module? > > =head2 Inherited methods > > =over 4 > > =item desc > > See Bio::Seq::Basic > > =back > > This would make browsing through the docs on bioperl.org more fun too. > > -Amir Karger For many modules this is already in place, but yes this could be improved. One of the problems I suggest we avoid when doing this is placing these interspersed within code. It has been demonstrated that doing so actually slows down the perl interpreter slightly; it has to slog through lots of POD to find the code at the compilation step. This occurs only upon on initial compilation, but it is significant enough that the overall recommendation by most perl brethren (and in Perl Best Practices) has been to place any POD after an __END__ marker. This way the compiler doesn't have to look at it at all, but perldoc can still find it. Also, acc to PBP, although the inline POD would seemingly be easier to take care of, apparently the opposite is true in most cases (though it can come down to styling differences). Interspersed code is much harder to maintain in a consistent state, tends to be choppier, and can be laid out in odd ways due to being scattered throughout the file. I know this can come down to a difference in style, but the arguments do make sense enough to me that in Biome I am pushing to have all docs after the __END__ marker. Lincoln already practices this within bioperl and Bio::Graphics, and I plan on moving much on my documentation similarly within my code in BioPerl. The additional comments in the PBP chapter "Documentation" are well- worth reading if you can get your hands on it. chris From rmb32 at cornell.edu Mon Aug 17 11:21:08 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:21:08 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A897564.2090203@cornell.edu> Hurrah! GSoC strikes again! Rob From rmb32 at cornell.edu Mon Aug 17 11:45:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:45:18 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <474354.59886.qm@web30408.mail.mud.yahoo.com> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A897B0E.7060208@cornell.edu> Yee Man Chan wrote: > As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. We're not pushing to spin it off because of code quality, we're pushing to spin it off because we're spinning everything off. The plan is to break BioPerl up into many discrete distributions on CPAN with the dependencies between them well-known and codified. This will make maintenance of BioPerl *much* easier in the long run. So this means that the plan of action should be 1.) get the code so that it's working on all platforms, 2.) create a CPAN distribution for it and put it on CPAN, 3.) remove it from bioperl-ext Also, doing a search for bioperl-ext on CPAN brings to light a couple of issues that probably need to be dealt with. To wit: 1.) there is an ancient version of bioperl-ext that probably needs to be removed, it's under ~birney's account. Thoughts on this? 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on bioperl-ext, which suggests that these really need to be split off, each with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the first case of this: * make a dir in the repos called Bio-Tools-HMM alongside bioperl-live, having trunk/, and branches/ subdirs * move Bio::Tools::HMM out of bioperl-live into that * move Bio::Ext::HMM stuff out of bioperl-ext into that * repeat with Bio::Tools::dpAlign and pSW, which would probably go together into a Bio-Tools-Align distro, I think Sounds like this is moving along nicely. Rob From rmb32 at cornell.edu Mon Aug 17 11:48:10 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:48:10 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <4A897BBA.2070204@cornell.edu> Also, I volunteer to make this branch and module machinery and such if you want. I just don't want to step on any ongoing development you guys are going in the bioperl-ext trunk. If you want me to do it, just say the word, either here or in #bioperl. Rob Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So if >> the HMM doesn't meet that standard, I am ok with it being spinned off. > > We're not pushing to spin it off because of code quality, we're pushing > to spin it off because we're spinning everything off. The plan is to > break BioPerl up into many discrete distributions on CPAN with the > dependencies between them well-known and codified. This will make > maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a couple of > issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs to be > removed, it's under ~birney's account. Thoughts on this? > > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on > bioperl-ext, which suggests that these really need to be split off, each > with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the > first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside > bioperl-live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Mon Aug 17 12:58:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 11:58:24 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> On Aug 17, 2009, at 10:45 AM, Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So >> if the HMM doesn't meet that standard, I am ok with it being >> spinned off. > > We're not pushing to spin it off because of code quality, we're > pushing to spin it off because we're spinning everything off. The > plan is to break BioPerl up into many discrete distributions on CPAN > with the dependencies between them well-known and codified. This > will make maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a > couple of issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs > to be removed, it's under ~birney's account. Thoughts on this? This subject just recently popped up on perl.module.authors, more in relation to abandonware, but a similar thing. Andreas has indicate there is an abandoned flag that can be set so it's worth looking into, but using it requires another release. I have been in contact with that group on ideas for the split; libwin32 did the same thing, so I'll contact Jan Dubois on the matter for some pointers. > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend > on bioperl-ext, which suggests that these really need to be split > off, each with the Bio::Ext::Modules they depend on. > Bio::Tools::HMM could be the first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside bioperl- > live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob Yes, that's essentially the idea. The more significant impact of this (both here and in core) is allowing updates to be made as needed, and not be blocked due to issues in unrelated modules. We have been waiting years for fixes to pSW, Staden::read, Align w/o progress, which has hindered overall releases of bioperl-ext. Similar problems exist in bp-core. Re: bioperl-ext, BioLib has rendered some of those implementations obsolete. I would rather do that incrementally (individual implementations) vs. wait for a full-blown bioperl-ext release, so splitting these up makes that possible. chris From robert.bradbury at gmail.com Mon Aug 17 13:14:57 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 17 Aug 2009 13:14:57 -0400 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers Message-ID: One of the questions facing people working in bioinformatics is "How do we present information so that it can be effectively interpreted by non-informatics specialists?" Now, my expertise lies in computer science (esp. O.S. & databases) and as a second vocation the biology of aging (DNA damage & repair, to a lesser extent cancer and pathologies of aging, etc.). Now by my estimate there are perhaps 5 people in the world who are able to effectively discuss computer science X aging (gerontology) [3]. There are perhaps several dozen people where those areas, esp aging, may overlap with DNA damage & repair. But then there is a wider audience of perhaps a few hundred members of AGE, and maybe a thousand or so who are members of the scientific subgroup of GSA. But most of those individuals are "old school" scientists who know relatively little about bioinformatics. So one has barriers to presenting bioinformatics information in ways that they can use usefully. I have found in my limited experience that homology graphs of conserved protein domains, such as those displayed in HomloGene or those in Ensembl (including phylogeny graphs) can be quite useful in reaching interesting conclusions. For example, double strand break repair processes which may involve 8-10 relatively conserved proteins, may have a critical role in the mechanisms of aging. In particular two of those proteins, WRN & DCLRE1C (Artemis) contain complementary exonuclease activities which chew up the DNA in order to prepare the strands for ligation. Of course, programmers may appreciate better than gerontologists the significance of deleting random bytes from instruction sequences in ones code. At the recent AGE meeting in June several discussions arose as to possible differences in "aging" in yeast, *C. elegans* and mammals. [1]. A quick database search showed that *C. elegans* seems to be lacking the exonuclease domain on the WRN homologue and may be missing a DCLRE1C homologue entirely (which if true would lead to conclusions that aging in *C. elegans* may be fundamentally different from aging in vertebrates). Explaining this to researchers can best be done using pictures. I've been through PubMed and have several papers (NAR / BMC Bioinformatics) regarding programs to do homology comparisons and phylogeny trees. However these seem to lean towards producing less condensed bioinformatics-ish information. I do not know however whether the outputs from databases like PubMed HomoloGene or Ensembl have been packaged in tools that might be part of BioPerl. I am interested in programs that can be run on a regular basis to draw "pretty pictures" that can be used for publication and/or internet browsing. In particular I'm interested in running such programs on species of interest to various gerontological communities [2] which involves subsets of databases which seem to be scattered around the world. Thanks. 1. Of course there has been lots of discussion and rationalization over the last 15+ years about how "aging" is largely the same in more complex and simpler organisms -- in part to justify sequencing some organisms and in part to justify funding research at certain laboratories. A closer examination based on some of the complete and emerging genome sequences may suggest this is a very swampy discussion. 2. For example, nematode DNA repair gene comparisons would be interesting to nematode researchers, insect DNA repair gene comparisons to insect researchers, both to invertebrate researchers, etc. 3. The recently published textbooks *Aging of the Genome* by Jan Vijg and the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg *et al*, go a long way towards moving these areas from the stacks of research libraries into areas for more general discussion. Both volumes deal extensively with the ~150 DNA repair genes. From cjfields at illinois.edu Mon Aug 17 13:15:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 12:15:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897BBA.2070204@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <4A897BBA.2070204@cornell.edu> Message-ID: I say go for it if Yee Man is okay with the idea. It gets the code out there that much faster. This also doesn't depend on core being split up (only need a 'requires' bioperl 1.6.0). chris On Aug 17, 2009, at 10:48 AM, Robert Buels wrote: > Also, I volunteer to make this branch and module machinery and such > if you want. I just don't want to step on any ongoing development > you guys are going in the bioperl-ext trunk. > > If you want me to do it, just say the word, either here or in > #bioperl. > > Rob > > Robert Buels wrote: >> Yee Man Chan wrote: >>> As to the release, my thinking is that I do understand that >>> your desire to maintain a high level of quality in BioPerl code >>> base. So if the HMM doesn't meet that standard, I am ok with it >>> being spinned off. >> We're not pushing to spin it off because of code quality, we're >> pushing to spin it off because we're spinning everything off. The >> plan is to break BioPerl up into many discrete distributions on >> CPAN with the dependencies between them well-known and codified. >> This will make maintenance of BioPerl *much* easier in the long run. >> So this means that the plan of action should be >> 1.) get the code so that it's working on all platforms, >> 2.) create a CPAN distribution for it and put it on CPAN, >> 3.) remove it from bioperl-ext >> Also, doing a search for bioperl-ext on CPAN brings to light a >> couple of issues that probably need to be dealt with. To wit: >> 1.) there is an ancient version of bioperl-ext that probably needs >> to be removed, it's under ~birney's account. Thoughts on this? >> 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend >> on bioperl-ext, which suggests that these really need to be split >> off, each with the Bio::Ext::Modules they depend on. >> Bio::Tools::HMM could be the first case of this: >> * make a dir in the repos called Bio-Tools-HMM alongside bioperl- >> live, having trunk/, and branches/ subdirs >> * move Bio::Tools::HMM out of bioperl-live into that >> * move Bio::Ext::HMM stuff out of bioperl-ext into that >> * repeat with Bio::Tools::dpAlign and pSW, which would probably >> go together into a Bio-Tools-Align distro, I think >> Sounds like this is moving along nicely. >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chmille4 at gmail.com Mon Aug 17 14:44:09 2009 From: chmille4 at gmail.com (Chase Miller) Date: Mon, 17 Aug 2009 14:44:09 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A897564.2090203@cornell.edu> References: <4A897564.2090203@cornell.edu> Message-ID: <991fb8210908171144t3f7107f0ldaf02dfdc762ae27@mail.gmail.com> Thanks! It was a great experience. I couldn't have done it without Mark who was a fantastic mentor. cheers, Chase On Mon, Aug 17, 2009 at 11:21 AM, Robert Buels wrote: > Hurrah! GSoC strikes again! > > Rob > From rmb32 at cornell.edu Mon Aug 17 16:32:14 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:32:14 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> Message-ID: <4A89BE4E.7090901@cornell.edu> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro at Bio-Tools-HMM in the repo. The tests are not passing, I think that some bugs need to be fixed in the logic of things. Yee Man, could you have a look? To download the newly repackaged code: svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM perl Build.PL; ./Build test Please check that things are compiling OK, check the test logic, upgrade the tests to use Test::More, and get the tests to the point where they are passing. At that point, it should be ready for CPAN, but we need to decide how we want to coordinate that with releases of bioperl-live and bioperl-ext. Rob From rmb32 at cornell.edu Mon Aug 17 16:45:42 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:45:42 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A89C176.3050109@cornell.edu> Mark A. Jensen wrote: > wrote tests (which pass!), complete POD, and a HOWTO (at The tests for this are depending on Bio::Phylo and fail if it's not installed. Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a "recommended" module, or what? Gotta clarify our dependencies. Rob From cjfields at illinois.edu Mon Aug 17 16:54:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 15:54:05 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: On Aug 17, 2009, at 3:45 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not > installed. Are we going to add Bio::Phylo as a bioperl dependency, > or band-aid it as a "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob 'recommends', should skip all tests as a 'pass' with message that 'Bio::Phylo is required' or somesuch. chris From maj at fortinbras.us Mon Aug 17 16:55:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 16:55:19 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: <3D65CA5234EB4BDF892F280D575FB01D@NewLife> I meant to add a skip tests on a runtime check for bio::phylo. Gotta do that. It's necessary only for these modules. ----- Original Message ----- From: "Robert Buels" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Rutger Vos" ; "Chase Miller" Sent: Monday, August 17, 2009 4:45 PM Subject: Re: [Bioperl-l] new NeXML I/O modules > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not installed. > Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a > "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Aug 17 17:22:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:22:00 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89BE4E.7090901@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> Message-ID: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Still seeing that odd warning popping up: cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line 185. Have you tried using Yee Man's original Makefile.PL to see if it works better? There appear to be some differences in the compilation, including a linking warning popping up. chris On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro > at Bio-Tools-HMM in the repo. The tests are not passing, I think > that some bugs need to be fixed in the logic of things. > > Yee Man, could you have a look? To download the newly repackaged > code: > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ > bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM > > perl Build.PL; ./Build test > > Please check that things are compiling OK, check the test logic, > upgrade the tests to use Test::More, and get the tests to the point > where they are passing. > > At that point, it should be ready for CPAN, but we need to decide > how we want to coordinate that with releases of bioperl-live and > bioperl-ext. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 17:28:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:28:05 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Take that back. Yes the 'FL' warning is still there, but no tests are run b/c (simply put) there are no regression tests (no use of Test or Test::More). If you run './Build test --verbose' you can see the run, but no test output. That should be easy to fix, though. chris On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > Still seeing that odd warning popping up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / > Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line > 185. > > Have you tried using Yee Man's original Makefile.PL to see if it > works better? There appear to be some differences in the > compilation, including a linking warning popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro >> at Bio-Tools-HMM in the repo. The tests are not passing, I think >> that some bugs need to be fixed in the logic of things. >> >> Yee Man, could you have a look? To download the newly repackaged >> code: >> >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >> bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM >> >> perl Build.PL; ./Build test >> >> Please check that things are compiling OK, check the test logic, >> upgrade the tests to use Test::More, and get the tests to the point >> where they are passing. >> >> At that point, it should be ready for CPAN, but we need to decide >> how we want to coordinate that with releases of bioperl-live and >> bioperl-ext. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 18:26:19 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 17:26:19 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <419432.62970.qm@web30403.mail.mud.yahoo.com> References: <419432.62970.qm@web30403.mail.mud.yahoo.com> Message-ID: <227EADF3-D769-413D-B1BF-22C919C8D097@illinois.edu> Yee Man, Will look into that. I do recall that disappearing last night, so I'll go look at the commit log. I have committed some regression tests using Bio::Root::Test. This'll need to be extensively tested b/c we're comparing floating point numbers, though I do use our custom float_is() test to run these (so we only compare first six signif). These are passing for me on 64bit perl 5.10.0; I may try these on a local 64bit linux (I need to set up bioperl on it first). chris On Aug 17, 2009, at 5:19 PM, Yee Man Chan wrote: > I believe this warnings should have been fixed with the latest Bio/ > Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/ > HMM.pm? I noticed that there are two pairs of "use strict" and "use > warnings" in this version. :P > > Yee Man > > --- On Mon, 8/17/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "BioPerl List" , "Yee Man Chan" > > >> Date: Monday, August 17, 2009, 2:22 PM >> Still seeing that odd warning popping >> up: >> >> cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose >> t/001_basics.t .. Argument "FL" isn't numeric in numeric lt >> (<) at >> /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm >> line 185. >> >> Have you tried using Yee Man's original Makefile.PL to see >> if it works better? There appear to be some >> differences in the compilation, including a linking warning >> popping up. >> >> chris >> >> On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: >> >>> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into >> a new distro at Bio-Tools-HMM in the repo. The tests >> are not passing, I think that some bugs need to be fixed in >> the logic of things. >>> >>> Yee Man, could you have a look? To download the >> newly repackaged code: >>> >>> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >>> bioperl/Bio-Tools-HMM/trunk >> Bio-Tools-HMM >>> >>> perl Build.PL; ./Build test >>> >>> Please check that things are compiling OK, check the >> test logic, upgrade the tests to use Test::More, and get the >> tests to the point where they are passing. >>> >>> At that point, it should be ready for CPAN, but we >> need to decide how we want to coordinate that with releases >> of bioperl-live and bioperl-ext. >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From abhishek.vit at gmail.com Mon Aug 17 18:53:19 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 18:53:19 -0400 Subject: [Bioperl-l] Error Copying Hashes Message-ID: Hi Guys I think this one should be appropriate for here. I am trying to copy a hash (spaced out below for the sake of readability} % { $OUTPUT->{$dir}->{'file'}->{$file}->{'additive'} } =%ADDITIVE_COUNT; ## Where %ADDITIVE_COUNT is a simple hash. (key/value) No references : I am getting this error :- Odd number of elements in hash assignment at ./assessCoverage.pl line 258 Seeing the dump of hash I see this $VAR1 = { '/local/seq/' => { 'read_len' => 36, 'file' => { 's_3_sorted.txt' => { 'additive' => { '8979/16384' => undef #### I dont understand this behavior. Something unusual is going on ????? }}}}} From rmb32 at cornell.edu Mon Aug 17 19:00:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:00:00 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <360578.66990.qm@web30403.mail.mud.yahoo.com> References: <360578.66990.qm@web30403.mail.mud.yahoo.com> Message-ID: <4A89E0F0.8010307@cornell.edu> Yee Man Chan wrote: > I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Please read my email above with instructions for checkout out the new Bio-Tools-HMM component, where Bio::Tools::HMM has been moved. Please do not add the Bio::Tools::HMM module back into bioperl-live. I think you might be confused about the functions of 'svn add', 'svn commit', etc, because I don't see any actual addition of the module in the commit logs. Please read through the SVN manual at http://svnbook.red-bean.com/ if you need clarification. Rob From rmb32 at cornell.edu Mon Aug 17 19:30:07 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:30:07 -0700 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: References: Message-ID: <4A89E7FF.1020603@cornell.edu> Well for one thing, it looks like somewhere a hash is getting accidentally evaluated in scalar context. '8979/16384' is a typical result of doing, for example, my $x = %some_hash; This might not be the proximate cause of your problem, it would be better to post your whole script somewhere so people can look over it. That said, this isn't the right list for this, this list is specifically for discussing the BioPerl toolkit, not just perl that is used in biology. IRC probably the quickest place to get perl help, try the #perl-help channel on the server irc.perl.org. Otherwise, you might try asking on a general perl mailing list, there seem to be some listed at http://perl-begin.org/mailing-lists/ Best of luck! Rob From abhishek.vit at gmail.com Mon Aug 17 19:33:41 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 19:33:41 -0400 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: <4A89E7FF.1020603@cornell.edu> References: <4A89E7FF.1020603@cornell.edu> Message-ID: Ok great. Thanks for pointing me to the right places to post later. best, -Abhi On Mon, Aug 17, 2009 at 7:30 PM, Robert Buels wrote: > Well for one thing, it looks like somewhere a hash is getting accidentally > evaluated in scalar context. '8979/16384' is a typical result of doing, for > example, my $x = %some_hash; This might not be the proximate cause of your > problem, it would be better to post your whole script somewhere so people > can look over it. > > That said, this isn't the right list for this, this list is specifically > for discussing the BioPerl toolkit, not just perl that is used in biology. > > IRC probably the quickest place to get perl help, try the #perl-help > channel on the server irc.perl.org. > > Otherwise, you might try asking on a general perl mailing list, there seem > to be some listed at > http://perl-begin.org/mailing-lists/ > > Best of luck! > > Rob > From rmb32 at cornell.edu Mon Aug 17 19:42:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:42:21 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A87275C.5040300@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> Message-ID: <4A89EADD.9050509@cornell.edu> I'm digging into the second item on implementation plan, having mostly finished splitting off Bio::FeatureIO (in a branch): * Rename some TypedSeqFeatureI methods as suggested in Hilmar's post Where Hilmar's post is at http://article.gmane.org/gmane.comp.lang.perl.bio.general/15846 Now, he refers to an interesting thing in there that I haven't heard discussed before, which is the concept of having the feature's source_tag by typed with an ontology term also, as source_term(). I can see how this might be a good idea, or it might be overkill. Anybody have thoughts on having feature _sources_ strongly typed with ontology terms? Rob From Kevin.M.Brown at asu.edu Mon Aug 17 20:36:34 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 17 Aug 2009 17:36:34 -0700 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> The obfuscator does help, but even it is a little sparse on data for modules. Especially information on the realities of the returned data from a method call. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger Sent: Monday, August 17, 2009 6:04 AM To: Mark A. Jensen; BioPerl List Subject: Re: [Bioperl-l] on BP documentation > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From sidd.basu at gmail.com Tue Aug 18 07:01:03 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 18 Aug 2009 06:01:03 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> Message-ID: <20090818110102.GA27010@seinfeld> Putting it in the bioperl list, makes more sense here, On Wed, 12 Aug 2009, Chris Fields wrote: > (BTW, this is re: the reimplementation of major chunks of BioPerl using > Moose, Biome: http://github.com/cjfields/biome/tree/) > > Locations should use a Role (specifically, Biome::Role::Range), so > start/end/strand should be attributes, not methods. With attributes the > best way to do this is probably with a builder, and lazily (start > requires end, and vice versa). Factor out the common code as Tomas > indicates. BTW, the $self->throw() is akin to BioPerl's $self->throw() > exception handling; it simply catches any exceptions and passes them to > the metaclass exception handling. > > I've been thinking about making the Range role abstract for this very > reason (or defining very basic attributes); something like: > > ---------------------------- > > package Bio::Role::Range; > > requires qw(_build_start _build_end _build_strand); > > # also require other methods which need to be defined in implementation > > has 'start' => ( > isa => 'Int', > is => 'rw', > builder => '_build_start', > lazy => 1 > ); > > # same for end, strand (except strand has a different isa via > MooseX::Types) > .... > > package Bio::Location::Foo; > > with 'Bio::Role::Range'; > > sub _build_start { > # for location-specific start > } > > sub _build_end { > # for location-specific end > } > > sub _build_strand { > # for location-specific strand > } > > sub _common_build_method { > # factor out common code here, call from other builders > } > > ---------------------------- This plan makes things much clearer. Currently the BioMe::Role::Location has a 'requires' keyword and rest of the location modules consume that role to have its own implementation. At this point on BioMe::Location::Atomic has attribute based 'start' and 'end' implememtation. I got a bit confused because in current bioperl 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when i am trying to follow that path in BioMe it has to override that method. So, my question is do all the location modules really needs to inherits from each other. I am totally aware about the origianl design ideas but it would be better to have a flatten hierarchy if possible. One more thing, what about putting the 'start', 'end' and the other common base attributes in BioMe::Role::Location instead of BioMe::Role::Range. I am not sure which would be correct from bioperl stand of view, just throwing out an idea. > > Also, I think the Coordinate-related stuff should be simplified down to a > trait or an attribute; they bring in way too much overhead in bioperl w/o > much added value. You mean instead of having 'builder' method, having a specialized traits handling those. That sounds like even better. -siddhartha > > And now back to your regular Moose-related broadcast... > > chris > > On Aug 11, 2009, at 9:27 PM, Siddhartha Basu wrote: > > > Hi, > > In one my classes i have this boilerplate code block that is repeated > > all > > over .... > > > > sub start { > > my ( $self, $value ) = @_; > > $self->{'_start'} = $value if defined $value; > > > > ## -- from here > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > return $self->{'_start'}; > > ## -- here > > > > } > > > > then again .... > > > > sub end { > > my ( $self, $value ) = @_; > > > > $self->{'_end'} = $value if defined $value; > > > > #assume end is the same as start if not defined > > if ( !defined $self->{'_end'} ) { > > if ( !defined $self->{'_start'} ) { > > $self->warn('Calling end without a defined start > > position'); > > return; > > } > > $self->warn('Setting start equal to end'); > > $self->{'_end'} = $self->{'_start'}; > > } > > > > ## ---- > > > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > > > return $self->{'_end'}; > > #--------- > > } > > > > > > Is there any way moose can be used here for more code resuage. I > > thought > > about converted it to a type but still couldn't figure out how that > > can > > be done. > > > > > > thanks, > > -siddhartha > From deequan at gmail.com Fri Aug 14 15:02:06 2009 From: deequan at gmail.com (David Quan) Date: Fri, 14 Aug 2009 15:02:06 -0400 Subject: [Bioperl-l] bioperl capability Message-ID: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Hello, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From ymc at yahoo.com Fri Aug 14 22:57:15 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 19:57:15 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <85143.35343.qm@web30404.mail.mud.yahoo.com> Hi Chris I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) Please let me know if it works for you. Sorry for the bug... Yee Man --- On Fri, 8/14/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Friday, August 14, 2009, 8:31 AM > Yee Man, > > I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 > 64-bit) and on dev.open-bio.org (which is perl 5.8.8, > appears to be 32-bit).? The patch results in cleaning > up warnings for 5.10.0 but results in similar warnings for > 5.8.8 (linux or OS X). > > On OS X perl 5.8.8, this sometimes passes (note the first > attempt fails, the second succeeds), so it's not entirely a > 32-bit issue: > > http://gist.github.com/167860 > > OS X and perl 5.10.0, this always fails as the previous > gist shows, but demonstrates similar behavior (multiple > attempts to test get different responses): > > http://gist.github.com/167542 > > On linux, everything passes with or w/o the patched files > (patched files have warnings as indicated above): > > Specs for all three perl executables (they vary a bit): > > http://gist.github.com/167883 > > chris > > On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > > > Ah.. I find that the typemap can become as simple as > this > > ===================== > > TYPEMAP > > HMM *? ? T_PTROBJ > > ===================== > > > > Then the generated HMM.c will have a function called > INT2PTR to do the pointer conversion. I believe this should > solve the warnings. > > > > Attached are the updated HMM.xs and typemap. Can > someone with a 64-bit machine give it a try? > > > > Thank you > > Yee Man > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >> Date: Thursday, August 13, 2009, 5:31 PM > >> (just to point out to everyone, Yee > >> Man's contact information was in the POD) > >> > >> Yee Man, > >> > >> I have the output in the below link: > >> > >> http://gist.github.com/167542 > >> > >> There are similar problems popping up on 32- and > 64-bit > >> perl 5.10.0, Mac OS X 10.5.? Haven't had time > to debug > >> it unfortunately. > >> > >> I think we should seriously consider spinning this > code off > >> into it's own distribution for CPAN.? It's > >> unfortunately bit-rotting away in > bioperl-ext.? If you > >> want to continue supporting it I can help set that > up. > >> > >> chris > >> > >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > >> > >>> Hi > >>> > >>>? ???So is this an HMM only > problem? Or does > >> it apply to other bioperl-ext modules? > >>> > >>>? ???What exactly are the > compilation errors > >> for HMM? I believe my implementation is just a > simple one > >> based on Rabiner's paper. > >>> > >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>> > >>>? ???I don't think I did > anything fancy that > >> makes it machine dependent or non-ANSI C. > >>> > >>> Yee Man > >>> > >>> --- On Thu, 8/13/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Jonny Dalzell" , > >> "BioPerl List" , > >> "Yee Man Chan" > >>>> Date: Thursday, August 13, 2009, 3:18 PM > >>>> > >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels > wrote: > >>>> > >>>>> Jonny Dalzell wrote: > >>>>>> Is it ridiculous of me to expect > ubuntu to > >> take > >>>> care of this for me?? How do > >>>>>> I go about compiling the HMM? > >>>>> Yes.? This is a very specialized > thing > >> that > >>>> you're doing, and Ubuntu does not have > the > >> resources to > >>>> package every single thing. > >>>>> > >>>>> Unfortunately, it looks like > bioperl-ext > >> package is > >>>> not installable under Ubuntu 9.04 anyway, > which is > >> what I'm > >>>> running.? For others on this list, > if > >> somebody is > >>>> interested in doing maintaining it, I'd be > happy > >> to help out > >>>> by testing on Debian-based Linux > platforms. > >> We need to > >>>> clarify this package's maintenance status: > if > >> there is > >>>> nobody interested in maintaining it, I > would > >> recommend that > >>>> bioperl-ext be removed from distribution. > >> It's not in > >>>> anybody's interest to have unmaintained > software > >> out there > >>>> causing confusion. > >>>> > >>>> I have cc'd Yee Man Chan for this.? > If there > >> isn't a > >>>> response or the message bounces, we do one > of two > >> things: > >>>> > >>>> 1) consider it deprecated (probably > safest). > >>>> 2) spin it out into a separate module. > >>>> > >>>> Just tried to comile it myself and am > getting > >> errors (using > >>>> 64bit perl 5.10), so I think, unless > someone wants > >> to take > >>>> this on, option #1 is best. > >>>> > >>>>> So Jonny, in short, I would say "do > not use > >>>> bioperl-ext". > >>>> > >>>> In general, that's a safe bet.? We're > moving > >> most of > >>>> our C/C++ bindings to BioLib. > >>>> > >>>>> Step back.? What are you trying > to > >>>> accomplish?? Chris already > recommended some > >> alternative > >>>> methods in his email of 8/11 on this > >> subject.? Perhaps > >>>> we can guide you to some software that is > >> actively > >>>> maintained and will meet your needs. > >>>>> > >>>>> Rob > >>>> > >>>> Exactly.? Lots of other (better > supported!) > >> options > >>>> out there.? HMMER, SeqAn, and > others. > >>>> > >>>> chris > >>>> > >>> > >>> > >>> > >> > >> > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam?? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5614 bytes Desc: not available URL: From ymc at yahoo.com Sat Aug 15 21:23:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 18:23:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> Message-ID: <241652.96493.qm@web30404.mail.mud.yahoo.com> I just committed HMM.xs and typemap to SVN. Can you test it to confirm it works in 64-bit machines? Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Saturday, August 15, 2009, 12:11 PM > I'm not sure, but it makes more sense > to commit these changes directly.? Yee, need us to set > you up with a commit bit?? If so, fill out the > information on this page: > > http://www.bioperl.org/wiki/SVN_Account_Request > > and forward it to support at open-bio.org.? > I'll sponsor you. > > chris > > On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > > > The usual procedure for developing code is to exchange > code via commits to a version control system.? Yee, do > you know how to use Subversion? Does Yee need a commit bit? > > > > Rob > > > > Yee Man Chan wrote: > >> Hi Chris > >>???I find that there is a memory > access bug in my code. Attached is the fixed HMM.xs. This > file together with the simpler typemap should fix all > problems. (I hope..) > >>???Please let me know if it works > for you. > >> Sorry for the bug... > >> Yee Man > >> --- On Fri, 8/14/09, Chris Fields > wrote: > >>> From: Chris Fields > >>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext package on WinVista? > >>> To: "Yee Man Chan" > >>> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >>> Date: Friday, August 14, 2009, 8:31 AM > >>> Yee Man, > >>> > >>> I tested this out locally (perl 5.8.8 32-bit, > perl 5.10.0 > >>> 64-bit) and on dev.open-bio.org (which is perl > 5.8.8, > >>> appears to be 32-bit).? The patch results > in cleaning > >>> up warnings for 5.10.0 but results in similar > warnings for > >>> 5.8.8 (linux or OS X). > >>> > >>> On OS X perl 5.8.8, this sometimes passes > (note the first > >>> attempt fails, the second succeeds), so it's > not entirely a > >>> 32-bit issue: > >>> > >>> http://gist.github.com/167860 > >>> > >>> OS X and perl 5.10.0, this always fails as the > previous > >>> gist shows, but demonstrates similar behavior > (multiple > >>> attempts to test get different responses): > >>> > >>> http://gist.github.com/167542 > >>> > >>> On linux, everything passes with or w/o the > patched files > >>> (patched files have warnings as indicated > above): > >>> > >>> Specs for all three perl executables (they > vary a bit): > >>> > >>> http://gist.github.com/167883 > >>> > >>> chris > >>> > >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan > wrote: > >>> > >>>> Ah.. I find that the typemap can become as > simple as > >>> this > >>>> ===================== > >>>> TYPEMAP > >>>> HMM *? ? T_PTROBJ > >>>> ===================== > >>>> > >>>> Then the generated HMM.c will have a > function called > >>> INT2PTR to do the pointer conversion. I > believe this should > >>> solve the warnings. > >>>> Attached are the updated HMM.xs and > typemap. Can > >>> someone with a 64-bit machine give it a try? > >>>> Thank you > >>>> Yee Man > >>>> --- On Thu, 8/13/09, Chris Fields > >>> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >>> package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >>> "Jonny Dalzell" , > >>> "BioPerl List" > >>>>> Date: Thursday, August 13, 2009, 5:31 > PM > >>>>> (just to point out to everyone, Yee > >>>>> Man's contact information was in the > POD) > >>>>> > >>>>> Yee Man, > >>>>> > >>>>> I have the output in the below link: > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> There are similar problems popping up > on 32- and > >>> 64-bit > >>>>> perl 5.10.0, Mac OS X 10.5.? > Haven't had time > >>> to debug > >>>>> it unfortunately. > >>>>> > >>>>> I think we should seriously consider > spinning this > >>> code off > >>>>> into it's own distribution for > CPAN.? It's > >>>>> unfortunately bit-rotting away in > >>> bioperl-ext.? If you > >>>>> want to continue supporting it I can > help set that > >>> up. > >>>>> chris > >>>>> > >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man > Chan wrote: > >>>>> > >>>>>> Hi > >>>>>> > >>>>>>? ???So is this > an HMM only > >>> problem? Or does > >>>>> it apply to other bioperl-ext > modules? > >>>>>>? ???What > exactly are the > >>> compilation errors > >>>>> for HMM? I believe my implementation > is just a > >>> simple one > >>>>> based on Rabiner's paper. > >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>> > >>>>>>? ???I don't > think I did > >>> anything fancy that > >>>>> makes it machine dependent or non-ANSI > C. > >>>>>> Yee Man > >>>>>> > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >>> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Robert Buels" > >>>>>>> Cc: "Jonny Dalzell" , > >>>>> "BioPerl List" , > >>>>> "Yee Man Chan" > >>>>>>> Date: Thursday, August 13, > 2009, 3:18 PM > >>>>>>> > >>>>>>> On Aug 13, 2009, at 4:37 PM, > Robert Buels > >>> wrote: > >>>>>>>> Jonny Dalzell wrote: > >>>>>>>>> Is it ridiculous of me > to expect > >>> ubuntu to > >>>>> take > >>>>>>> care of this for me?? How > do > >>>>>>>>> I go about compiling > the HMM? > >>>>>>>> Yes.? This is a very > specialized > >>> thing > >>>>> that > >>>>>>> you're doing, and Ubuntu does > not have > >>> the > >>>>> resources to > >>>>>>> package every single thing. > >>>>>>>> Unfortunately, it looks > like > >>> bioperl-ext > >>>>> package is > >>>>>>> not installable under Ubuntu > 9.04 anyway, > >>> which is > >>>>> what I'm > >>>>>>> running.? For others on > this list, > >>> if > >>>>> somebody is > >>>>>>> interested in doing > maintaining it, I'd be > >>> happy > >>>>> to help out > >>>>>>> by testing on Debian-based > Linux > >>> platforms. > >>>>> We need to > >>>>>>> clarify this package's > maintenance status: > >>> if > >>>>> there is > >>>>>>> nobody interested in > maintaining it, I > >>> would > >>>>> recommend that > >>>>>>> bioperl-ext be removed from > distribution. > >>>>> It's not in > >>>>>>> anybody's interest to have > unmaintained > >>> software > >>>>> out there > >>>>>>> causing confusion. > >>>>>>> > >>>>>>> I have cc'd Yee Man Chan for > this. > >>> If there > >>>>> isn't a > >>>>>>> response or the message > bounces, we do one > >>> of two > >>>>> things: > >>>>>>> 1) consider it deprecated > (probably > >>> safest). > >>>>>>> 2) spin it out into a separate > module. > >>>>>>> > >>>>>>> Just tried to comile it myself > and am > >>> getting > >>>>> errors (using > >>>>>>> 64bit perl 5.10), so I think, > unless > >>> someone wants > >>>>> to take > >>>>>>> this on, option #1 is best. > >>>>>>> > >>>>>>>> So Jonny, in short, I > would say "do > >>> not use > >>>>>>> bioperl-ext". > >>>>>>> > >>>>>>> In general, that's a safe > bet.? We're > >>> moving > >>>>> most of > >>>>>>> our C/C++ bindings to BioLib. > >>>>>>> > >>>>>>>> Step back.? What are > you trying > >>> to > >>>>>>> accomplish?? Chris > already > >>> recommended some > >>>>> alternative > >>>>>>> methods in his email of 8/11 > on this > >>>>> subject.? Perhaps > >>>>>>> we can guide you to some > software that is > >>>>> actively > >>>>>>> maintained and will meet your > needs. > >>>>>>>> Rob > >>>>>>> Exactly.? Lots of other > (better > >>> supported!) > >>>>> options > >>>>>>> out there.? HMMER, SeqAn, > and > >>> others. > >>>>>>> chris > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > __________________________________________________ > >>>> Do You Yahoo!? > >>>> Tired of spam?? Yahoo! Mail has the > best spam > >>> protection around > >>>> http://mail.yahoo.com > >>> > _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > > > > > > --Robert Buels > > Bioinformatics Analyst, Sol Genomics Network > > Boyce Thompson Institute for Plant Research > > Tower Rd > > Ithaca, NY? 14853 > > Tel: 503-889-8539 > > rmb32 at cornell.edu > > http://www.sgn.cornell.edu > > From ymc at yahoo.com Sun Aug 16 00:32:19 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 21:32:19 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <846546.73578.qm@web30404.mail.mud.yahoo.com> When are you going to release 1.6? Maybe let me work on it before it releases. If it doesn't resolve the problem, then we can think about other alternatives. Also, please show me the latest errors you have for 5.10.0. Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 7:05 PM > I'm still seeing the same errors on > Mac OS X for 64-bit perl 5.10.0.? Mac OS X, native perl > (v5.8.8) passes fine now (as well as perl 5.8.8 on > dev.open-bio.org). > > I'm wondering if this is a problem with my local perl > build.? I'm very tempted to push the HMM-related code > into a separate distribution (bioperl-hmm) and make a CPAN > release out of it so it gets wider testing via CPAN testers; > it would just require a minimum bioperl 1.6 installation for > Bio::Tools::HMM and any related modules.? Yee, would > that be okay with you? > > chris > > On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > > > > I just committed HMM.xs and typemap to SVN. Can you > test it to confirm it works in 64-bit machines? > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 12:11 PM > >> I'm not sure, but it makes more sense > >> to commit these changes directly.? Yee, need > us to set > >> you up with a commit bit?? If so, fill out > the > >> information on this page: > >> > >> http://www.bioperl.org/wiki/SVN_Account_Request > >> > >> and forward it to support at open-bio.org. > >> I'll sponsor you. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > >> > >>> The usual procedure for developing code is to > exchange > >> code via commits to a version control > system.? Yee, do > >> you know how to use Subversion? Does Yee need a > commit bit? > >>> > >>> Rob > >>> > >>> Yee Man Chan wrote: > >>>> Hi Chris > >>>>? ? I find that there is a > memory > >> access bug in my code. Attached is the fixed > HMM.xs. This > >> file together with the simpler typemap should fix > all > >> problems. (I hope..) > >>>>? ? Please let me know if it > works > >> for you. > >>>> Sorry for the bug... > >>>> Yee Man > >>>> --- On Fri, 8/14/09, Chris Fields > >> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >> "Jonny Dalzell" , > >> "BioPerl List" > >>>>> Date: Friday, August 14, 2009, 8:31 > AM > >>>>> Yee Man, > >>>>> > >>>>> I tested this out locally (perl 5.8.8 > 32-bit, > >> perl 5.10.0 > >>>>> 64-bit) and on dev.open-bio.org (which > is perl > >> 5.8.8, > >>>>> appears to be 32-bit).? The patch > results > >> in cleaning > >>>>> up warnings for 5.10.0 but results in > similar > >> warnings for > >>>>> 5.8.8 (linux or OS X). > >>>>> > >>>>> On OS X perl 5.8.8, this sometimes > passes > >> (note the first > >>>>> attempt fails, the second succeeds), > so it's > >> not entirely a > >>>>> 32-bit issue: > >>>>> > >>>>> http://gist.github.com/167860 > >>>>> > >>>>> OS X and perl 5.10.0, this always > fails as the > >> previous > >>>>> gist shows, but demonstrates similar > behavior > >> (multiple > >>>>> attempts to test get different > responses): > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> On linux, everything passes with or > w/o the > >> patched files > >>>>> (patched files have warnings as > indicated > >> above): > >>>>> > >>>>> Specs for all three perl executables > (they > >> vary a bit): > >>>>> > >>>>> http://gist.github.com/167883 > >>>>> > >>>>> chris > >>>>> > >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man > Chan > >> wrote: > >>>>> > >>>>>> Ah.. I find that the typemap can > become as > >> simple as > >>>>> this > >>>>>> ===================== > >>>>>> TYPEMAP > >>>>>> HMM *? ? T_PTROBJ > >>>>>> ===================== > >>>>>> > >>>>>> Then the generated HMM.c will have > a > >> function called > >>>>> INT2PTR to do the pointer conversion. > I > >> believe this should > >>>>> solve the warnings. > >>>>>> Attached are the updated HMM.xs > and > >> typemap. Can > >>>>> someone with a 64-bit machine give it > a try? > >>>>>> Thank you > >>>>>> Yee Man > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>>> "Jonny Dalzell" , > >>>>> "BioPerl List" > >>>>>>> Date: Thursday, August 13, > 2009, 5:31 > >> PM > >>>>>>> (just to point out to > everyone, Yee > >>>>>>> Man's contact information was > in the > >> POD) > >>>>>>> > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I have the output in the below > link: > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> There are similar problems > popping up > >> on 32- and > >>>>> 64-bit > >>>>>>> perl 5.10.0, Mac OS X 10.5. > >> Haven't had time > >>>>> to debug > >>>>>>> it unfortunately. > >>>>>>> > >>>>>>> I think we should seriously > consider > >> spinning this > >>>>> code off > >>>>>>> into it's own distribution > for > >> CPAN.? It's > >>>>>>> unfortunately bit-rotting away > in > >>>>> bioperl-ext.? If you > >>>>>>> want to continue supporting it > I can > >> help set that > >>>>> up. > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 13, 2009, at 6:58 PM, > Yee Man > >> Chan wrote: > >>>>>>> > >>>>>>>> Hi > >>>>>>>> > >>>>>>>>? ? ? So is > this > >> an HMM only > >>>>> problem? Or does > >>>>>>> it apply to other bioperl-ext > >> modules? > >>>>>>>>? ? ? What > >> exactly are the > >>>>> compilation errors > >>>>>>> for HMM? I believe my > implementation > >> is just a > >>>>> simple one > >>>>>>> based on Rabiner's paper. > >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>> > >>>>>>>>? ? ? I > don't > >> think I did > >>>>> anything fancy that > >>>>>>> makes it machine dependent or > non-ANSI > >> C. > >>>>>>>> Yee Man > >>>>>>>> > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Robert Buels" > > >>>>>>>>> Cc: "Jonny Dalzell" > , > >>>>>>> "BioPerl List" , > >>>>>>> "Yee Man Chan" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 3:18 PM > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 4:37 PM, > >> Robert Buels > >>>>> wrote: > >>>>>>>>>> Jonny Dalzell > wrote: > >>>>>>>>>>> Is it > ridiculous of me > >> to expect > >>>>> ubuntu to > >>>>>>> take > >>>>>>>>> care of this for > me?? How > >> do > >>>>>>>>>>> I go about > compiling > >> the HMM? > >>>>>>>>>> Yes.? This is > a very > >> specialized > >>>>> thing > >>>>>>> that > >>>>>>>>> you're doing, and > Ubuntu does > >> not have > >>>>> the > >>>>>>> resources to > >>>>>>>>> package every single > thing. > >>>>>>>>>> Unfortunately, it > looks > >> like > >>>>> bioperl-ext > >>>>>>> package is > >>>>>>>>> not installable under > Ubuntu > >> 9.04 anyway, > >>>>> which is > >>>>>>> what I'm > >>>>>>>>> running.? For > others on > >> this list, > >>>>> if > >>>>>>> somebody is > >>>>>>>>> interested in doing > >> maintaining it, I'd be > >>>>> happy > >>>>>>> to help out > >>>>>>>>> by testing on > Debian-based > >> Linux > >>>>> platforms. > >>>>>>> We need to > >>>>>>>>> clarify this > package's > >> maintenance status: > >>>>> if > >>>>>>> there is > >>>>>>>>> nobody interested in > >> maintaining it, I > >>>>> would > >>>>>>> recommend that > >>>>>>>>> bioperl-ext be removed > from > >> distribution. > >>>>>>> It's not in > >>>>>>>>> anybody's interest to > have > >> unmaintained > >>>>> software > >>>>>>> out there > >>>>>>>>> causing confusion. > >>>>>>>>> > >>>>>>>>> I have cc'd Yee Man > Chan for > >> this. > >>>>> If there > >>>>>>> isn't a > >>>>>>>>> response or the > message > >> bounces, we do one > >>>>> of two > >>>>>>> things: > >>>>>>>>> 1) consider it > deprecated > >> (probably > >>>>> safest). > >>>>>>>>> 2) spin it out into a > separate > >> module. > >>>>>>>>> > >>>>>>>>> Just tried to comile > it myself > >> and am > >>>>> getting > >>>>>>> errors (using > >>>>>>>>> 64bit perl 5.10), so I > think, > >> unless > >>>>> someone wants > >>>>>>> to take > >>>>>>>>> this on, option #1 is > best. > >>>>>>>>> > >>>>>>>>>> So Jonny, in > short, I > >> would say "do > >>>>> not use > >>>>>>>>> bioperl-ext". > >>>>>>>>> > >>>>>>>>> In general, that's a > safe > >> bet.? We're > >>>>> moving > >>>>>>> most of > >>>>>>>>> our C/C++ bindings to > BioLib. > >>>>>>>>> > >>>>>>>>>> Step back.? > What are > >> you trying > >>>>> to > >>>>>>>>> accomplish?? > Chris > >> already > >>>>> recommended some > >>>>>>> alternative > >>>>>>>>> methods in his email > of 8/11 > >> on this > >>>>>>> subject.? Perhaps > >>>>>>>>> we can guide you to > some > >> software that is > >>>>>>> actively > >>>>>>>>> maintained and will > meet your > >> needs. > >>>>>>>>>> Rob > >>>>>>>>> Exactly.? Lots of > other > >> (better > >>>>> supported!) > >>>>>>> options > >>>>>>>>> out there.? > HMMER, SeqAn, > >> and > >>>>> others. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >> > __________________________________________________ > >>>>>> Do You Yahoo!? > >>>>>> Tired of spam?? Yahoo! Mail > has the > >> best spam > >>>>> protection around > >>>>>> http://mail.yahoo.com > >>>>> > >> > _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>> > >>> > >>> --Robert Buels > >>> Bioinformatics Analyst, Sol Genomics Network > >>> Boyce Thompson Institute for Plant Research > >>> Tower Rd > >>> Ithaca, NY? 14853 > >>> Tel: 503-889-8539 > >>> rmb32 at cornell.edu > >>> http://www.sgn.cornell.edu > >> > >> > > > > > > > > From ymc at yahoo.com Sun Aug 16 05:36:59 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 02:36:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <217259.7083.qm@web30408.mail.mud.yahoo.com> Hi Chris Thanks for your suggestions. I think it is indeed better to check sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm I also fixed codes that will lead to warnings with use warnings. So now the only problem left is that "monotonic increasing" error. For that part of the code, I was trying to perform an expectation maximization step. Theoretically, the expectation should monotonically increase in every step. But I suppose this is not necessarily true when double precision floating point numbers are involved. I don't know why I used a 1e-100 tolerance for this. Therefore I "fixed" it by using the same tolerance to terminate the maximization step (ie .000001). I suppose this "fix" will make it much more unlikely to throw exception with your 5.10.0 perl. Can you give that a try again and see if it works now. Thank you Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 10:38 PM > Yee, > > I took the liberty of making a few simple changes to > Bio::Tools::HMM in svn to point out the problem and possible > solutions.? Feel free to revert these as needed. > > I'm seeing two errors, which appear randomly when running > 'make test'.? The first is easily fixable, the second, > I'm not so sure.? I'll let you make the decisions on > both. > > 1)? There is an assumption in the module that, when > adding floating points, you will always get 1.0.? You > may run into problems: see 'perldoc -q long decimals'.? > Lines like this (two places in the module): > ? ... > ? if ($sum != 1.0) { > ? ???$self->throw("Sum of > probabilities for each state must be 1.0; got $sum\n"); > ? } > ? ... > > won't work as expected (note I added a simple diagnostic, > just print out the 'bad' sum).? With perl 5.8.8, this > appears to work fine, but this is what I get with perl 5.10 > (64-bit): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > Initial Probability Array: > 0.499978??? 0.500022??? > Transition Probability Matrix: > 0.499978??? 0.500022??? > 0.499978??? 0.500022??? > Emission Probability Matrix: > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > > Log Probability of sequence 1: -521.808 > Log Probability of sequence 2: -426.057 > > Statistical Training > ==================== > Initial Probability Array: > 1??? 0??? > Transition Probability Matrix: > > ------------- EXCEPTION ------------- > MSG: Sum of probabilities for each from-state must be 1.0; > got 0.999999999999999976 > > STACK Bio::Tools::HMM::transition_prob > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > STACK toplevel test.pl:82 > ------------------------------------- > > make: *** [test_dynamic] Error 255 > > I'm assuming this needs to simply be rounded up to > 1.0.? That could be accomplished with something like > 'if (sprintf("%.2f", $sum) != 1.0) {...}' > > 2) The second error is a little stranger.? I have been > randomly getting this: > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > When I add strict and warnings pragmas to Bio::Tools::HMM > (with a little additional cleanup to get things running), I > get an additional warning (arrow): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Argument "FL" isn't numeric in numeric lt (<) at > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > 188. <---- > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > So something is not being converted as expected. > > chris > > On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > > > When are you going to release 1.6? Maybe let me work > on it before it releases. If it doesn't resolve the problem, > then we can think about other alternatives. > > > > Also, please show me the latest errors you have for > 5.10.0. > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 7:05 PM > >> I'm still seeing the same errors on > >> Mac OS X for 64-bit perl 5.10.0.? Mac OS X, > native perl > >> (v5.8.8) passes fine now (as well as perl 5.8.8 > on > >> dev.open-bio.org). > >> > >> I'm wondering if this is a problem with my local > perl > >> build.? I'm very tempted to push the > HMM-related code > >> into a separate distribution (bioperl-hmm) and > make a CPAN > >> release out of it so it gets wider testing via > CPAN testers; > >> it would just require a minimum bioperl 1.6 > installation for > >> Bio::Tools::HMM and any related modules.? > Yee, would > >> that be okay with you? > >> > >> chris > >> > >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > >> > >>> > >>> I just committed HMM.xs and typemap to SVN. > Can you > >> test it to confirm it works in 64-bit machines? > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Yee Man Chan" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 12:11 PM > >>>> I'm not sure, but it makes more sense > >>>> to commit these changes directly.? > Yee, need > >> us to set > >>>> you up with a commit bit?? If so, > fill out > >> the > >>>> information on this page: > >>>> > >>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>> > >>>> and forward it to support at open-bio.org. > >>>> I'll sponsor you. > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels > wrote: > >>>> > >>>>> The usual procedure for developing > code is to > >> exchange > >>>> code via commits to a version control > >> system.? Yee, do > >>>> you know how to use Subversion? Does Yee > need a > >> commit bit? > >>>>> > >>>>> Rob > >>>>> > >>>>> Yee Man Chan wrote: > >>>>>> Hi Chris > >>>>>>? ???I find > that there is a > >> memory > >>>> access bug in my code. Attached is the > fixed > >> HMM.xs. This > >>>> file together with the simpler typemap > should fix > >> all > >>>> problems. (I hope..) > >>>>>>? ???Please let > me know if it > >> works > >>>> for you. > >>>>>> Sorry for the bug... > >>>>>> Yee Man > >>>>>> --- On Fri, 8/14/09, Chris Fields > > >>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems > >> with > >>>> Bioperl-ext package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>> "Jonny Dalzell" , > >>>> "BioPerl List" > >>>>>>> Date: Friday, August 14, 2009, > 8:31 > >> AM > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I tested this out locally > (perl 5.8.8 > >> 32-bit, > >>>> perl 5.10.0 > >>>>>>> 64-bit) and on > dev.open-bio.org (which > >> is perl > >>>> 5.8.8, > >>>>>>> appears to be 32-bit).? > The patch > >> results > >>>> in cleaning > >>>>>>> up warnings for 5.10.0 but > results in > >> similar > >>>> warnings for > >>>>>>> 5.8.8 (linux or OS X). > >>>>>>> > >>>>>>> On OS X perl 5.8.8, this > sometimes > >> passes > >>>> (note the first > >>>>>>> attempt fails, the second > succeeds), > >> so it's > >>>> not entirely a > >>>>>>> 32-bit issue: > >>>>>>> > >>>>>>> http://gist.github.com/167860 > >>>>>>> > >>>>>>> OS X and perl 5.10.0, this > always > >> fails as the > >>>> previous > >>>>>>> gist shows, but demonstrates > similar > >> behavior > >>>> (multiple > >>>>>>> attempts to test get > different > >> responses): > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> On linux, everything passes > with or > >> w/o the > >>>> patched files > >>>>>>> (patched files have warnings > as > >> indicated > >>>> above): > >>>>>>> > >>>>>>> Specs for all three perl > executables > >> (they > >>>> vary a bit): > >>>>>>> > >>>>>>> http://gist.github.com/167883 > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 14, 2009, at 3:27 AM, > Yee Man > >> Chan > >>>> wrote: > >>>>>>> > >>>>>>>> Ah.. I find that the > typemap can > >> become as > >>>> simple as > >>>>>>> this > >>>>>>>> ===================== > >>>>>>>> TYPEMAP > >>>>>>>> HMM *? ? > T_PTROBJ > >>>>>>>> ===================== > >>>>>>>> > >>>>>>>> Then the generated HMM.c > will have > >> a > >>>> function called > >>>>>>> INT2PTR to do the pointer > conversion. > >> I > >>>> believe this should > >>>>>>> solve the warnings. > >>>>>>>> Attached are the updated > HMM.xs > >> and > >>>> typemap. Can > >>>>>>> someone with a 64-bit machine > give it > >> a try? > >>>>>>>> Thank you > >>>>>>>> Yee Man > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>>> "Jonny Dalzell" , > >>>>>>> "BioPerl List" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 5:31 > >>>> PM > >>>>>>>>> (just to point out to > >> everyone, Yee > >>>>>>>>> Man's contact > information was > >> in the > >>>> POD) > >>>>>>>>> > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I have the output in > the below > >> link: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> There are similar > problems > >> popping up > >>>> on 32- and > >>>>>>> 64-bit > >>>>>>>>> perl 5.10.0, Mac OS X > 10.5. > >>>> Haven't had time > >>>>>>> to debug > >>>>>>>>> it unfortunately. > >>>>>>>>> > >>>>>>>>> I think we should > seriously > >> consider > >>>> spinning this > >>>>>>> code off > >>>>>>>>> into it's own > distribution > >> for > >>>> CPAN.? It's > >>>>>>>>> unfortunately > bit-rotting away > >> in > >>>>>>> bioperl-ext.? If you > >>>>>>>>> want to continue > supporting it > >> I can > >>>> help set that > >>>>>>> up. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 6:58 PM, > >> Yee Man > >>>> Chan wrote: > >>>>>>>>> > >>>>>>>>>> Hi > >>>>>>>>>> > >>>>>>>>>>? ? > ???So is > >> this > >>>> an HMM only > >>>>>>> problem? Or does > >>>>>>>>> it apply to other > bioperl-ext > >>>> modules? > >>>>>>>>>>? ? > ???What > >>>> exactly are the > >>>>>>> compilation errors > >>>>>>>>> for HMM? I believe my > >> implementation > >>>> is just a > >>>>>>> simple one > >>>>>>>>> based on Rabiner's > paper. > >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>> > >>>>>>>>>>? ? > ???I > >> don't > >>>> think I did > >>>>>>> anything fancy that > >>>>>>>>> makes it machine > dependent or > >> non-ANSI > >>>> C. > >>>>>>>>>> Yee Man > >>>>>>>>>> > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Robert > Buels" > >> > >>>>>>>>>>> Cc: "Jonny > Dalzell" > >> , > >>>>>>>>> "BioPerl List" , > >>>>>>>>> "Yee Man Chan" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 3:18 PM > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 4:37 PM, > >>>> Robert Buels > >>>>>>> wrote: > >>>>>>>>>>>> Jonny > Dalzell > >> wrote: > >>>>>>>>>>>>> Is it > >> ridiculous of me > >>>> to expect > >>>>>>> ubuntu to > >>>>>>>>> take > >>>>>>>>>>> care of this > for > >> me?? How > >>>> do > >>>>>>>>>>>>> I go > about > >> compiling > >>>> the HMM? > >>>>>>>>>>>> Yes.? > This is > >> a very > >>>> specialized > >>>>>>> thing > >>>>>>>>> that > >>>>>>>>>>> you're doing, > and > >> Ubuntu does > >>>> not have > >>>>>>> the > >>>>>>>>> resources to > >>>>>>>>>>> package every > single > >> thing. > >>>>>>>>>>>> > Unfortunately, it > >> looks > >>>> like > >>>>>>> bioperl-ext > >>>>>>>>> package is > >>>>>>>>>>> not > installable under > >> Ubuntu > >>>> 9.04 anyway, > >>>>>>> which is > >>>>>>>>> what I'm > >>>>>>>>>>> running.? > For > >> others on > >>>> this list, > >>>>>>> if > >>>>>>>>> somebody is > >>>>>>>>>>> interested in > doing > >>>> maintaining it, I'd be > >>>>>>> happy > >>>>>>>>> to help out > >>>>>>>>>>> by testing on > >> Debian-based > >>>> Linux > >>>>>>> platforms. > >>>>>>>>> We need to > >>>>>>>>>>> clarify this > >> package's > >>>> maintenance status: > >>>>>>> if > >>>>>>>>> there is > >>>>>>>>>>> nobody > interested in > >>>> maintaining it, I > >>>>>>> would > >>>>>>>>> recommend that > >>>>>>>>>>> bioperl-ext be > removed > >> from > >>>> distribution. > >>>>>>>>> It's not in > >>>>>>>>>>> anybody's > interest to > >> have > >>>> unmaintained > >>>>>>> software > >>>>>>>>> out there > >>>>>>>>>>> causing > confusion. > >>>>>>>>>>> > >>>>>>>>>>> I have cc'd > Yee Man > >> Chan for > >>>> this. > >>>>>>> If there > >>>>>>>>> isn't a > >>>>>>>>>>> response or > the > >> message > >>>> bounces, we do one > >>>>>>> of two > >>>>>>>>> things: > >>>>>>>>>>> 1) consider > it > >> deprecated > >>>> (probably > >>>>>>> safest). > >>>>>>>>>>> 2) spin it out > into a > >> separate > >>>> module. > >>>>>>>>>>> > >>>>>>>>>>> Just tried to > comile > >> it myself > >>>> and am > >>>>>>> getting > >>>>>>>>> errors (using > >>>>>>>>>>> 64bit perl > 5.10), so I > >> think, > >>>> unless > >>>>>>> someone wants > >>>>>>>>> to take > >>>>>>>>>>> this on, > option #1 is > >> best. > >>>>>>>>>>> > >>>>>>>>>>>> So Jonny, > in > >> short, I > >>>> would say "do > >>>>>>> not use > >>>>>>>>>>> bioperl-ext". > >>>>>>>>>>> > >>>>>>>>>>> In general, > that's a > >> safe > >>>> bet.? We're > >>>>>>> moving > >>>>>>>>> most of > >>>>>>>>>>> our C/C++ > bindings to > >> BioLib. > >>>>>>>>>>> > >>>>>>>>>>>> Step > back. > >> What are > >>>> you trying > >>>>>>> to > >>>>>>>>>>> accomplish? > >> Chris > >>>> already > >>>>>>> recommended some > >>>>>>>>> alternative > >>>>>>>>>>> methods in his > email > >> of 8/11 > >>>> on this > >>>>>>>>> subject.? > Perhaps > >>>>>>>>>>> we can guide > you to > >> some > >>>> software that is > >>>>>>>>> actively > >>>>>>>>>>> maintained and > will > >> meet your > >>>> needs. > >>>>>>>>>>>> Rob > >>>>>>>>>>> Exactly.? > Lots of > >> other > >>>> (better > >>>>>>> supported!) > >>>>>>>>> options > >>>>>>>>>>> out there. > >> HMMER, SeqAn, > >>>> and > >>>>>>> others. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>> Do You Yahoo!? > >>>>>>>> Tired of spam?? > Yahoo! Mail > >> has the > >>>> best spam > >>>>>>> protection around > >>>>>>>> http://mail.yahoo.com > >>>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> --Robert Buels > >>>>> Bioinformatics Analyst, Sol Genomics > Network > >>>>> Boyce Thompson Institute for Plant > Research > >>>>> Tower Rd > >>>>> Ithaca, NY? 14853 > >>>>> Tel: 503-889-8539 > >>>>> rmb32 at cornell.edu > >>>>> http://www.sgn.cornell.edu > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Sun Aug 16 23:34:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 20:34:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> Message-ID: <474354.59886.qm@web30408.mail.mud.yahoo.com> Hi Chris Good to hear that it is working and thanks for testing. As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. So please pass around the updated code and test it extensively, if no one complains about the new code by the time of release, I would think it should go into the next bioperl-ext release. If people uncover new errors with the new code and the errors can't be fixed on time, then it should be spinned off. What do you think? Best Regards, Yee Man --- On Sun, 8/16/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Sunday, August 16, 2009, 5:53 AM > That worked!? Thanks Yee Man! > > chris > > ps - let me know how you want to deal with a release. > > On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > > > Hi Chris > > > >???Thanks for your suggestions. I think > it is indeed better to check? > > sum to 1.0 using sprintf. I fixed this in the newly > committed HMM.pm > > > >???I also fixed codes that will lead to > warnings with use warnings. > > > >???So now the only problem left is that > "monotonic increasing" error.? > > For that part of the code, I was trying to perform an > expectation? > > maximization step. Theoretically, the expectation > should? > > monotonically increase in every step. But I suppose > this is not? > > necessarily true when double precision floating point > numbers are? > > involved. I don't know why I used a 1e-100 tolerance > for this.? > > Therefore I "fixed" it by using the same tolerance to > terminate the? > > maximization step (ie .000001). I suppose this "fix" > will make it? > > much more unlikely to throw exception with your 5.10.0 > perl. > > > >???Can you give that a try again and see > if it works now. > > > > Thank you > > Yee Man > > > > > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on? > >> WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> > > >> Date: Saturday, August 15, 2009, 10:38 PM > >> Yee, > >> > >> I took the liberty of making a few simple changes > to > >> Bio::Tools::HMM in svn to point out the problem > and possible > >> solutions.? Feel free to revert these as > needed. > >> > >> I'm seeing two errors, which appear randomly when > running > >> 'make test'.? The first is easily fixable, > the second, > >> I'm not so sure.? I'll let you make the > decisions on > >> both. > >> > >> 1)? There is an assumption in the module > that, when > >> adding floating points, you will always get > 1.0.? You > >> may run into problems: see 'perldoc -q long > decimals'. > >> Lines like this (two places in the module): > >>???... > >>???if ($sum != 1.0) { > >>? ? ? $self->throw("Sum of > >> probabilities for each state must be 1.0; got > $sum\n"); > >>???} > >>???... > >> > >> won't work as expected (note I added a simple > diagnostic, > >> just print out the 'bad' sum).? With perl > 5.8.8, this > >> appears to work fine, but this is what I get with > perl 5.10 > >> (64-bit): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> Initial Probability Array: > >> 0.499978? ? 0.500022 > >> Transition Probability Matrix: > >> 0.499978? ? 0.500022 > >> 0.499978? ? 0.500022 > >> Emission Probability Matrix: > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> > >> Log Probability of sequence 1: -521.808 > >> Log Probability of sequence 2: -426.057 > >> > >> Statistical Training > >> ==================== > >> Initial Probability Array: > >> 1? ? 0 > >> Transition Probability Matrix: > >> > >> ------------- EXCEPTION ------------- > >> MSG: Sum of probabilities for each from-state must > be 1.0; > >> got 0.999999999999999976 > >> > >> STACK Bio::Tools::HMM::transition_prob > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > >> STACK toplevel test.pl:82 > >> ------------------------------------- > >> > >> make: *** [test_dynamic] Error 255 > >> > >> I'm assuming this needs to simply be rounded up > to > >> 1.0.? That could be accomplished with > something like > >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' > >> > >> 2) The second error is a little stranger.? I > have been > >> randomly getting this: > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> When I add strict and warnings pragmas to > Bio::Tools::HMM > >> (with a little additional cleanup to get things > running), I > >> get an additional warning (arrow): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Argument "FL" isn't numeric in numeric lt (<) > at > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > >> 188. <---- > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> So something is not being converted as expected. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > >> > >>> When are you going to release 1.6? Maybe let > me work > >> on it before it releases. If it doesn't resolve > the problem, > >> then we can think about other alternatives. > >>> > >>> Also, please show me the latest errors you > have for > >> 5.10.0. > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Yee Man Chan" > >>>> Cc: "Robert Buels" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 7:05 PM > >>>> I'm still seeing the same errors on > >>>> Mac OS X for 64-bit perl 5.10.0.? Mac > OS X, > >> native perl > >>>> (v5.8.8) passes fine now (as well as perl > 5.8.8 > >> on > >>>> dev.open-bio.org). > >>>> > >>>> I'm wondering if this is a problem with my > local > >> perl > >>>> build.? I'm very tempted to push the > >> HMM-related code > >>>> into a separate distribution (bioperl-hmm) > and > >> make a CPAN > >>>> release out of it so it gets wider testing > via > >> CPAN testers; > >>>> it would just require a minimum bioperl > 1.6 > >> installation for > >>>> Bio::Tools::HMM and any related modules. > >> Yee, would > >>>> that be okay with you? > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan > wrote: > >>>> > >>>>> > >>>>> I just committed HMM.xs and typemap to > SVN. > >> Can you > >>>> test it to confirm it works in 64-bit > machines? > >>>>> > >>>>> Thanks > >>>>> Yee Man > >>>>> > >>>>> --- On Sat, 8/15/09, Chris Fields > > >>>> wrote: > >>>>> > >>>>>> From: Chris Fields > >>>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext > >>>> package on WinVista? > >>>>>> To: "Robert Buels" > >>>>>> Cc: "Yee Man Chan" , > >>>> "BioPerl List" > >>>>>> Date: Saturday, August 15, 2009, > 12:11 PM > >>>>>> I'm not sure, but it makes more > sense > >>>>>> to commit these changes directly. > >> Yee, need > >>>> us to set > >>>>>> you up with a commit bit?? If > so, > >> fill out > >>>> the > >>>>>> information on this page: > >>>>>> > >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>>>> > >>>>>> and forward it to support at open-bio.org. > >>>>>> I'll sponsor you. > >>>>>> > >>>>>> chris > >>>>>> > >>>>>> On Aug 15, 2009, at 11:44 AM, > Robert Buels > >> wrote: > >>>>>> > >>>>>>> The usual procedure for > developing > >> code is to > >>>> exchange > >>>>>> code via commits to a version > control > >>>> system.? Yee, do > >>>>>> you know how to use Subversion? > Does Yee > >> need a > >>>> commit bit? > >>>>>>> > >>>>>>> Rob > >>>>>>> > >>>>>>> Yee Man Chan wrote: > >>>>>>>> Hi Chris > >>>>>>>>? ? ? I > find > >> that there is a > >>>> memory > >>>>>> access bug in my code. Attached is > the > >> fixed > >>>> HMM.xs. This > >>>>>> file together with the simpler > typemap > >> should fix > >>>> all > >>>>>> problems. (I hope..) > >>>>>>>>? ? ? Please > let > >> me know if it > >>>> works > >>>>>> for you. > >>>>>>>> Sorry for the bug... > >>>>>>>> Yee Man > >>>>>>>> --- On Fri, 8/14/09, Chris > Fields > >> > >>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems > >>>> with > >>>>>> Bioperl-ext package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>> "Jonny Dalzell" , > >>>>>> "BioPerl List" > >>>>>>>>> Date: Friday, August > 14, 2009, > >> 8:31 > >>>> AM > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I tested this out > locally > >> (perl 5.8.8 > >>>> 32-bit, > >>>>>> perl 5.10.0 > >>>>>>>>> 64-bit) and on > >> dev.open-bio.org (which > >>>> is perl > >>>>>> 5.8.8, > >>>>>>>>> appears to be > 32-bit). > >> The patch > >>>> results > >>>>>> in cleaning > >>>>>>>>> up warnings for 5.10.0 > but > >> results in > >>>> similar > >>>>>> warnings for > >>>>>>>>> 5.8.8 (linux or OS > X). > >>>>>>>>> > >>>>>>>>> On OS X perl 5.8.8, > this > >> sometimes > >>>> passes > >>>>>> (note the first > >>>>>>>>> attempt fails, the > second > >> succeeds), > >>>> so it's > >>>>>> not entirely a > >>>>>>>>> 32-bit issue: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167860 > >>>>>>>>> > >>>>>>>>> OS X and perl 5.10.0, > this > >> always > >>>> fails as the > >>>>>> previous > >>>>>>>>> gist shows, but > demonstrates > >> similar > >>>> behavior > >>>>>> (multiple > >>>>>>>>> attempts to test get > >> different > >>>> responses): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> On linux, everything > passes > >> with or > >>>> w/o the > >>>>>> patched files > >>>>>>>>> (patched files have > warnings > >> as > >>>> indicated > >>>>>> above): > >>>>>>>>> > >>>>>>>>> Specs for all three > perl > >> executables > >>>> (they > >>>>>> vary a bit): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167883 > >>>>>>>>> > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 14, 2009, at > 3:27 AM, > >> Yee Man > >>>> Chan > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Ah.. I find that > the > >> typemap can > >>>> become as > >>>>>> simple as > >>>>>>>>> this > >>>>>>>>>> > ===================== > >>>>>>>>>> TYPEMAP > >>>>>>>>>> HMM * > >> T_PTROBJ > >>>>>>>>>> > ===================== > >>>>>>>>>> > >>>>>>>>>> Then the generated > HMM.c > >> will have > >>>> a > >>>>>> function called > >>>>>>>>> INT2PTR to do the > pointer > >> conversion. > >>>> I > >>>>>> believe this should > >>>>>>>>> solve the warnings. > >>>>>>>>>> Attached are the > updated > >> HMM.xs > >>>> and > >>>>>> typemap. Can > >>>>>>>>> someone with a 64-bit > machine > >> give it > >>>> a try? > >>>>>>>>>> Thank you > >>>>>>>>>> Yee Man > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Yee Man > Chan" > >> > >>>>>>>>>>> Cc: "Robert > Buels" > >> , > >>>>>>>>> "Jonny Dalzell" , > >>>>>>>>> "BioPerl List" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 5:31 > >>>>>> PM > >>>>>>>>>>> (just to point > out to > >>>> everyone, Yee > >>>>>>>>>>> Man's contact > >> information was > >>>> in the > >>>>>> POD) > >>>>>>>>>>> > >>>>>>>>>>> Yee Man, > >>>>>>>>>>> > >>>>>>>>>>> I have the > output in > >> the below > >>>> link: > >>>>>>>>>>> > >>>>>>>>>>> http://gist.github.com/167542 > >>>>>>>>>>> > >>>>>>>>>>> There are > similar > >> problems > >>>> popping up > >>>>>> on 32- and > >>>>>>>>> 64-bit > >>>>>>>>>>> perl 5.10.0, > Mac OS X > >> 10.5. > >>>>>> Haven't had time > >>>>>>>>> to debug > >>>>>>>>>>> it > unfortunately. > >>>>>>>>>>> > >>>>>>>>>>> I think we > should > >> seriously > >>>> consider > >>>>>> spinning this > >>>>>>>>> code off > >>>>>>>>>>> into it's own > >> distribution > >>>> for > >>>>>> CPAN.? It's > >>>>>>>>>>> unfortunately > >> bit-rotting away > >>>> in > >>>>>>>>> bioperl-ext.? If > you > >>>>>>>>>>> want to > continue > >> supporting it > >>>> I can > >>>>>> help set that > >>>>>>>>> up. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 6:58 PM, > >>>> Yee Man > >>>>>> Chan wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? So is > >>>> this > >>>>>> an HMM only > >>>>>>>>> problem? Or does > >>>>>>>>>>> it apply to > other > >> bioperl-ext > >>>>>> modules? > >>>>>>>>>>>> > >>? ? What > >>>>>> exactly are the > >>>>>>>>> compilation errors > >>>>>>>>>>> for HMM? I > believe my > >>>> implementation > >>>>>> is just a > >>>>>>>>> simple one > >>>>>>>>>>> based on > Rabiner's > >> paper. > >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > > >>>>>>>>>>>> > ~murphyk%2FBayes > >>>>>>>>>>>> > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > > >>>>>>>>>>>> > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? I > >>>> don't > >>>>>> think I did > >>>>>>>>> anything fancy that > >>>>>>>>>>> makes it > machine > >> dependent or > >>>> non-ANSI > >>>>>> C. > >>>>>>>>>>>> Yee Man > >>>>>>>>>>>> > >>>>>>>>>>>> --- On > Thu, > >> 8/13/09, Chris > >>>> Fields > >>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> From: > Chris > >> Fields > >>>> > >>>>>>>>>>>>> > Subject: Re: > >>>> [Bioperl-l] > >>>>>> Problems with > >>>>>>>>> Bioperl-ext > >>>>>>>>>>> package on > WinVista? > >>>>>>>>>>>>> To: > "Robert > >> Buels" > >>>> > >>>>>>>>>>>>> Cc: > "Jonny > >> Dalzell" > >>>> , > >>>>>>>>>>> "BioPerl List" > , > >>>>>>>>>>> "Yee Man Chan" > > >>>>>>>>>>>>> Date: > >> Thursday, August > >>>> 13, > >>>>>> 2009, 3:18 PM > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Aug > 13, > >> 2009, at > >>>> 4:37 PM, > >>>>>> Robert Buels > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > Jonny > >> Dalzell > >>>> wrote: > >>>>>>>>>>>>>>> > Is it > >>>> ridiculous of me > >>>>>> to expect > >>>>>>>>> ubuntu to > >>>>>>>>>>> take > >>>>>>>>>>>>> care > of this > >> for > >>>> me?? How > >>>>>> do > >>>>>>>>>>>>>>> > I go > >> about > >>>> compiling > >>>>>> the HMM? > >>>>>>>>>>>>>> > Yes. > >> This is > >>>> a very > >>>>>> specialized > >>>>>>>>> thing > >>>>>>>>>>> that > >>>>>>>>>>>>> you're > doing, > >> and > >>>> Ubuntu does > >>>>>> not have > >>>>>>>>> the > >>>>>>>>>>> resources to > >>>>>>>>>>>>> > package every > >> single > >>>> thing. > >>>>>>>>>>>>>> > >> Unfortunately, it > >>>> looks > >>>>>> like > >>>>>>>>> bioperl-ext > >>>>>>>>>>> package is > >>>>>>>>>>>>> not > >> installable under > >>>> Ubuntu > >>>>>> 9.04 anyway, > >>>>>>>>> which is > >>>>>>>>>>> what I'm > >>>>>>>>>>>>> > running. > >> For > >>>> others on > >>>>>> this list, > >>>>>>>>> if > >>>>>>>>>>> somebody is > >>>>>>>>>>>>> > interested in > >> doing > >>>>>> maintaining it, I'd be > >>>>>>>>> happy > >>>>>>>>>>> to help out > >>>>>>>>>>>>> by > testing on > >>>> Debian-based > >>>>>> Linux > >>>>>>>>> platforms. > >>>>>>>>>>> We need to > >>>>>>>>>>>>> > clarify this > >>>> package's > >>>>>> maintenance status: > >>>>>>>>> if > >>>>>>>>>>> there is > >>>>>>>>>>>>> > nobody > >> interested in > >>>>>> maintaining it, I > >>>>>>>>> would > >>>>>>>>>>> recommend > that > >>>>>>>>>>>>> > bioperl-ext be > >> removed > >>>> from > >>>>>> distribution. > >>>>>>>>>>> It's not in > >>>>>>>>>>>>> > anybody's > >> interest to > >>>> have > >>>>>> unmaintained > >>>>>>>>> software > >>>>>>>>>>> out there > >>>>>>>>>>>>> > causing > >> confusion. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have > cc'd > >> Yee Man > >>>> Chan for > >>>>>> this. > >>>>>>>>> If there > >>>>>>>>>>> isn't a > >>>>>>>>>>>>> > response or > >> the > >>>> message > >>>>>> bounces, we do one > >>>>>>>>> of two > >>>>>>>>>>> things: > >>>>>>>>>>>>> 1) > consider > >> it > >>>> deprecated > >>>>>> (probably > >>>>>>>>> safest). > >>>>>>>>>>>>> 2) > spin it out > >> into a > >>>> separate > >>>>>> module. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Just > tried to > >> comile > >>>> it myself > >>>>>> and am > >>>>>>>>> getting > >>>>>>>>>>> errors (using > >>>>>>>>>>>>> 64bit > perl > >> 5.10), so I > >>>> think, > >>>>>> unless > >>>>>>>>> someone wants > >>>>>>>>>>> to take > >>>>>>>>>>>>> this > on, > >> option #1 is > >>>> best. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> So > Jonny, > >> in > >>>> short, I > >>>>>> would say "do > >>>>>>>>> not use > >>>>>>>>>>>>> > bioperl-ext". > >>>>>>>>>>>>> > >>>>>>>>>>>>> In > general, > >> that's a > >>>> safe > >>>>>> bet.? We're > >>>>>>>>> moving > >>>>>>>>>>> most of > >>>>>>>>>>>>> our > C/C++ > >> bindings to > >>>> BioLib. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > Step > >> back. > >>>> What are > >>>>>> you trying > >>>>>>>>> to > >>>>>>>>>>>>> > accomplish? > >>>> Chris > >>>>>> already > >>>>>>>>> recommended some > >>>>>>>>>>> alternative > >>>>>>>>>>>>> > methods in his > >> email > >>>> of 8/11 > >>>>>> on this > >>>>>>>>>>> subject. > >> Perhaps > >>>>>>>>>>>>> we can > guide > >> you to > >>>> some > >>>>>> software that is > >>>>>>>>>>> actively > >>>>>>>>>>>>> > maintained and > >> will > >>>> meet your > >>>>>> needs. > >>>>>>>>>>>>>> > Rob > >>>>>>>>>>>>> > Exactly. > >> Lots of > >>>> other > >>>>>> (better > >>>>>>>>> supported!) > >>>>>>>>>>> options > >>>>>>>>>>>>> out > there. > >>>> HMMER, SeqAn, > >>>>>> and > >>>>>>>>> others. > >>>>>>>>>>>>> chris > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>>>> Do You Yahoo!? > >>>>>>>>>> Tired of spam? > >> Yahoo! Mail > >>>> has the > >>>>>> best spam > >>>>>>>>> protection around > >>>>>>>>>> http://mail.yahoo.com > >>>>>>>>> > >>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>>>> Bioperl-l mailing > list > >>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> --Robert Buels > >>>>>>> Bioinformatics Analyst, Sol > Genomics > >> Network > >>>>>>> Boyce Thompson Institute for > Plant > >> Research > >>>>>>> Tower Rd > >>>>>>> Ithaca, NY? 14853 > >>>>>>> Tel: 503-889-8539 > >>>>>>> rmb32 at cornell.edu > >>>>>>> http://www.sgn.cornell.edu > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Mon Aug 17 18:19:27 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:19:27 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <419432.62970.qm@web30403.mail.mud.yahoo.com> I believe this warnings should have been fixed with the latest Bio/Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/HMM.pm? I noticed that there are two pairs of "use strict" and "use warnings" in this version. :P Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:22 PM > Still seeing that odd warning popping > up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt > (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > Have you tried using Yee Man's original Makefile.PL to see > if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into > a new distro at Bio-Tools-HMM in the repo.? The tests > are not passing, I think that some bugs need to be fixed in > the logic of things. > > > > Yee Man, could you have a look?? To download the > newly repackaged code: > > > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > > > > perl Build.PL; ./Build test > > > > Please check that things are compiling OK, check the > test logic, upgrade the tests to use Test::More, and get the > tests to the point where they are passing. > > > > At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > > > > Rob > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Mon Aug 17 18:28:50 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:28:50 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Message-ID: <360578.66990.qm@web30403.mail.mud.yahoo.com> I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:28 PM > Take that back.? Yes the 'FL' > warning is still there, but no tests are run b/c (simply > put) there are no regression tests (no use of Test or > Test::More).? If you run './Build test --verbose' you > can see the run, but no test output.? That should be > easy to fix, though. > > chris > > On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > > > Still seeing that odd warning popping up: > > > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test > --verbose > > t/001_basics.t .. Argument "FL" isn't numeric in > numeric lt (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > > > Have you tried using Yee Man's original Makefile.PL to > see if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > > > chris > > > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off > into a new distro at Bio-Tools-HMM in the repo.? The > tests are not passing, I think that some bugs need to be > fixed in the logic of things. > >> > >> Yee Man, could you have a look?? To download > the newly repackaged code: > >> > >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > >> > >> perl Build.PL; ./Build test > >> > >> Please check that things are compiling OK, check > the test logic, upgrade the tests to use Test::More, and get > the tests to the point where they are passing. > >> > >> At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > >> > >> Rob > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Mon Aug 17 20:24:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 17:24:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89E0F0.8010307@cornell.edu> Message-ID: <62126.74727.qm@web30401.mail.mud.yahoo.com> I get it now. So it is now spinned off. Anyway, I updated the HMM.pm in Bio-Tools-HMM with the latest version. I think it should work. Yee Man --- On Mon, 8/17/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Monday, August 17, 2009, 4:00 PM > Yee Man Chan wrote: > > I noticed that Bio/Tools/HMM.pm was removed from the > trunk. So I added it back in. I think you shouldn't get the > warnings with this version. > > Please read my email above with instructions for checkout > out the new Bio-Tools-HMM component, where Bio::Tools::HMM > has been moved.? Please do not add the Bio::Tools::HMM > module back into bioperl-live. > > I think you might be confused about the functions of 'svn > add', 'svn commit', etc, because I don't see any actual > addition of the module in the commit logs.? Please read > through the SVN manual at http://svnbook.red-bean.com/ if you need > clarification. > > Rob > > From whs at eaglegenomics.com Tue Aug 18 05:14:48 2009 From: whs at eaglegenomics.com (Will Spooner) Date: Tue, 18 Aug 2009 10:14:48 +0100 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers In-Reply-To: References: Message-ID: Hi Robert, Speaking for Ensembl, the GeneTree display code is deeply embedded in the API and web code, and refactoring as a standalone package would be exceedingly difficult. Jalview (http://www.jalview.org) may be a good alternative, albeit a Java one. There is code available for driving Jalview from the Ensembl database, and something similar for BioPerl seems reasonable. Will On 17 Aug 2009, at 18:14, Robert Bradbury wrote: > One of the questions facing people working in bioinformatics is "How > do we > present information so that it can be effectively interpreted by > non-informatics specialists?" > > Now, my expertise lies in computer science (esp. O.S. & databases) > and as a > second vocation the biology of aging (DNA damage & repair, to a lesser > extent cancer and pathologies of aging, etc.). Now by my estimate > there are > perhaps 5 people in the world who are able to effectively discuss > computer > science X aging (gerontology) [3]. There are perhaps several dozen > people > where those areas, esp aging, may overlap with DNA damage & repair. > But > then there is a wider audience of perhaps a few hundred members of > AGE, and > maybe a thousand or so who are members of the scientific subgroup of > GSA. > But most of those individuals are "old school" scientists who know > relatively little about bioinformatics. So one has barriers to > presenting > bioinformatics information in ways that they can use usefully. > > I have found in my limited experience that homology graphs of > conserved > protein domains, such as those displayed in HomloGene or those in > Ensembl > (including phylogeny graphs) can be quite useful in reaching > interesting > conclusions. For example, double strand break repair processes > which may > involve 8-10 relatively conserved proteins, may have a critical role > in the > mechanisms of aging. In particular two of those proteins, WRN & > DCLRE1C > (Artemis) contain complementary exonuclease activities which chew up > the DNA > in order to prepare the strands for ligation. Of course, > programmers may > appreciate better than gerontologists the significance of deleting > random > bytes from instruction sequences in ones code. At the recent AGE > meeting in > June several discussions arose as to possible differences in "aging" > in > yeast, *C. elegans* and mammals. [1]. A quick database search > showed that *C. > elegans* seems to be lacking the exonuclease domain on the WRN > homologue and > may be missing a DCLRE1C homologue entirely (which if true would > lead to > conclusions that aging in *C. elegans* may be fundamentally > different from > aging in vertebrates). Explaining this to researchers can best be > done > using pictures. > > I've been through PubMed and have several papers (NAR / BMC > Bioinformatics) > regarding programs to do homology comparisons and phylogeny trees. > However > these seem to lean towards producing less condensed bioinformatics-ish > information. I do not know however whether the outputs from > databases like > PubMed HomoloGene or Ensembl have been packaged in tools that might > be part > of BioPerl. I am interested in programs that can be run on a > regular basis > to draw "pretty pictures" that can be used for publication and/or > internet > browsing. In particular I'm interested in running such programs on > species > of interest to various gerontological communities [2] which involves > subsets > of databases which seem to be scattered around the world. > > Thanks. > > 1. Of course there has been lots of discussion and rationalization > over the > last 15+ years about how "aging" is largely the same in more complex > and > simpler organisms -- in part to justify sequencing some organisms > and in > part to justify funding research at certain laboratories. A closer > examination based on some of the complete and emerging genome > sequences may > suggest this is a very swampy discussion. > 2. For example, nematode DNA repair gene comparisons would be > interesting to > nematode researchers, insect DNA repair gene comparisons to insect > researchers, both to invertebrate researchers, etc. > 3. The recently published textbooks *Aging of the Genome* by Jan > Vijg and > the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg > *et al*, > go a long way towards moving these areas from the stacks of research > libraries into areas for more general discussion. Both volumes deal > extensively with the ~150 DNA repair genes. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- William Spooner whs at eaglegenomics.com http://www.eaglegenomics.com From cjfields at illinois.edu Tue Aug 18 10:35:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 09:35:49 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Message-ID: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> I think I already answered this: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 chris On Aug 14, 2009, at 2:02 PM, David Quan wrote: > Hello, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Recommendations for elements > that would be capable of behaving in such a way would be greatly > appreciated. Thanks very much. > > David N. Quan > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 18 10:42:09 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 Aug 2009 16:42:09 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> Message-ID: <628aabb70908180742o4bf93d21tab0b90c328323efa@mail.gmail.com> On Tue, Aug 18, 2009 at 02:36, Kevin Brown wrote: > The obfuscator does help, but even it is a little sparse on data for > modules. Especially information on the realities of the returned data > from a method call. Yep, sorry about that, Kevin. I'm way overdue in devoting a little attention to cleaning up those Deobfuscator bugs and -- just maybe -- putting a prettier face on it. Hoping to find some time in the near future for that. Dave From cjfields at illinois.edu Tue Aug 18 11:04:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:04:40 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: <20090818110102.GA27010@seinfeld> References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > Putting it in the bioperl list, makes more sense here, > > On Wed, 12 Aug 2009, Chris Fields wrote: > >> (BTW, this is re: the reimplementation of major chunks of BioPerl >> using >> Moose, Biome: http://github.com/cjfields/biome/tree/) >> >> Locations should use a Role (specifically, Biome::Role::Range), so >> start/end/strand should be attributes, not methods. With >> attributes the >> best way to do this is probably with a builder, and lazily (start >> requires end, and vice versa). Factor out the common code as Tomas >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- >> >throw() >> exception handling; it simply catches any exceptions and passes >> them to >> the metaclass exception handling. >> >> I've been thinking about making the Range role abstract for this very >> reason (or defining very basic attributes); something like: >> >> ---------------------------- >> >> package Bio::Role::Range; >> >> requires qw(_build_start _build_end _build_strand); >> >> # also require other methods which need to be defined in >> implementation >> >> has 'start' => ( >> isa => 'Int', >> is => 'rw', >> builder => '_build_start', >> lazy => 1 >> ); >> >> # same for end, strand (except strand has a different isa via >> MooseX::Types) >> .... >> >> package Bio::Location::Foo; >> >> with 'Bio::Role::Range'; >> >> sub _build_start { >> # for location-specific start >> } >> >> sub _build_end { >> # for location-specific end >> } >> >> sub _build_strand { >> # for location-specific strand >> } >> >> sub _common_build_method { >> # factor out common code here, call from other builders >> } >> >> ---------------------------- > > This plan makes things much clearer. Currently the > BioMe::Role::Location has a 'requires' keyword and rest of the > location modules consume that role to have its own implementation. At > this point on BioMe::Location::Atomic has attribute based 'start' and > 'end' implememtation. I got a bit confused because in current bioperl > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > i am trying to follow that path in BioMe it has to override that > method. > So, my question is do all the location modules really needs to > inherits > from each other. I am totally aware about the origianl design ideas > but > it would be better to have a flatten hierarchy if possible. Flattening with roles is always a good idea, yes. I wouldn't worry as much about the way it was originally implemented as the general API (and ways in which we can simplify it). > One more thing, what about putting the 'start', 'end' and the other > common base attributes in BioMe::Role::Location instead of > BioMe::Role::Range. I am not sure which would be correct from bioperl > stand of view, just throwing out an idea. That's a possibility. To me Locations are just Ranges with different behavior (hence the below comment...) >> Also, I think the Coordinate-related stuff should be simplified >> down to a >> trait or an attribute; they bring in way too much overhead in >> bioperl w/o >> much added value. > > You mean instead of having 'builder' method, having a specialized > traits handling those. That sounds like even better. > > -siddhartha Yes, that's essentially it. Location behavior could be changed by having CoordinatePolicy as a trait. Similarly, fuzziness for start/ end could also be thought of as a trait. In essence, you could probably role most behavior into attribute traits (which, in Moose, are just roles that are composed into the attribute meta class, Moose::Meta::Attribute). I had started up a Biome::Meta::Attribute class in case we were to go down this path, then we could start registering specific traits within that namespace. Just to note, it might be easier to try the simplest approach first and get tests passing, then layer in traits to see how they act performance-wise. My guess is they will speed things up, but you never know. Locations will be a performance bottleneck as they are used in generic Features. chris From cjfields at illinois.edu Tue Aug 18 11:10:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:10:08 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <62126.74727.qm@web30401.mail.mud.yahoo.com> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: Yee Man, Robert, All tests are passing; there was a small change in the expected floating point, but no warning now. Re: passing this on to CPAN, I think it needs a distinct version from BioPerl (something that should probably happen with any spinoffs). I foresee two options (and a possible conflict): 1) Use the same versioning scheme, starting with 1.6.1. 2) Use a simpler scheme a'la Bio::Graphics, which I suggest. Tripartite versions are a PITA, we'll only need to keep that in core. Conflict: Bio::Tools::HMM is currently part of the 1.6 branch (in 1.6.0). If this stays in 1.6.1 then we have two versions of the module floating out there. I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 is out. After that, I could then add Yee Man as PAUSE co- maintainer for those modules (which means Yee Man needs to sign up for a PAUSE account). Any objections to that? chris On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > I get it now. So it is now spinned off. Anyway, I updated the HMM.pm > in Bio-Tools-HMM with the latest version. I think it should work. > > Yee Man > > --- On Mon, 8/17/09, Robert Buels wrote: > >> From: Robert Buels >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Chris Fields" , "BioPerl List" > > >> Date: Monday, August 17, 2009, 4:00 PM >> Yee Man Chan wrote: >>> I noticed that Bio/Tools/HMM.pm was removed from the >> trunk. So I added it back in. I think you shouldn't get the >> warnings with this version. >> >> Please read my email above with instructions for checkout >> out the new Bio-Tools-HMM component, where Bio::Tools::HMM >> has been moved. Please do not add the Bio::Tools::HMM >> module back into bioperl-live. >> >> I think you might be confused about the functions of 'svn >> add', 'svn commit', etc, because I don't see any actual >> addition of the module in the commit logs. Please read >> through the SVN manual at http://svnbook.red-bean.com/ if you need >> clarification. >> >> Rob >> >> > > > From hlapp at gmx.net Tue Aug 18 11:46:55 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 11:46:55 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A89EADD.9050509@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> Message-ID: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > I can see how this might be a good idea, or it might be overkill. > Anybody have thoughts on having feature _sources_ strongly typed > with ontology terms? It's how BioSQL and Chado would store it anyway. I'm not sure whether GFF3 requires it, possibly not. But when you make everything else ontology-typed, why exempt one property that also stands to benefit from more predictable values? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Tue Aug 18 11:49:32 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 08:49:32 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: <4A8ACD8C.1060908@cornell.edu> Chris Fields wrote: > I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I > could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 > is out. After that, I could then add Yee Man as PAUSE co-maintainer for > those modules (which means Yee Man needs to sign up for a PAUSE > account). Any objections to that? Sounds like a good plan to me, if Yee Man agreed with it. He would be the primary CPAN maintainer of the package. Maybe he should actually be the first uploader too? Then, it would show up under his PAUSE account at the outset, and he would get better attribution and visibility. Rob From cjfields at illinois.edu Tue Aug 18 12:34:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 11:34:00 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 10:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. Might be worth bringing up with Lincoln to get his thoughts. > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar What I'm thinking as well. You can always implement it that way, and if we deem it too heavy-weight then revert back. Or have it evaluated lazily and get the benefits of both. That's the magic of doing this on a branch, it gives you much more latitude to try things out. chris From cain.cshl at gmail.com Tue Aug 18 14:28:05 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 18 Aug 2009 14:28:05 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: Hi Hilmar and all, Actually, Chado stores sources as a dbxref for the feature (where the db.name is "GFF_source") and the source can be any string, which is what the GFF3 spec indicates. I think the source was intended to be free text to allow the creator maximum flexibility when making the GFF; it also allows lots of flexibility when defining what features go into a particular track in GBrowse: you can have lots of gene features in your GFF, but you can segregate them according to what their source attributes are. Additionally, some applications (SynBrowse comes to mind) overload the source value and require them to conform to a certain syntax. So, what I'm trying to say is, source should probably just stay a simple string. Scott On Aug 18, 2009, at 11:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. > > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From marcelo011982 at gmail.com Tue Aug 18 14:34:17 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Tue, 18 Aug 2009 15:34:17 -0300 Subject: [Bioperl-l] Genbank code from Blast results Message-ID: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> hi all.. I was doing a script that take some information of the results of blastn files. Everythig was ok, but i have some dificult to pic the Genbank code number (the 'gb' below). I tried $obj->each_accession_number $hit->name And some variation of this. ------------------------------ >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h segment 1 gmrtDrNS01 Glycine max cDNA 3', mRNA sequence /clone_end=3' /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 Length = 853 Score = 1336 bits (674), Expect = 0.0 Identities = 793/832 (95%), Gaps = 8/832 (0%) Strand = Plus / Minus Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt 294917 |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc 794 ---------------------------------------- But, i still don't get it. thank you with regards Miwata From hlapp at gmx.net Tue Aug 18 16:01:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 16:01:18 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > Additionally, some applications (SynBrowse comes to mind) overload > the source value and require them to conform to a certain syntax. > > So, what I'm trying to say is, source should probably just stay a > simple string. I would rephrase that to source should probably retain the possibility of using made-up strings. You mention one example yourself, and there have been others in a recent thread on BioSQL [1], for why the option to have predictable, structured values with attached semantics could be very useful. -hilmar [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 18 17:46:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 16:46:25 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8ACD8C.1060908@cornell.edu> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> <4A8ACD8C.1060908@cornell.edu> Message-ID: On Aug 18, 2009, at 10:49 AM, Robert Buels wrote: > Chris Fields wrote: >> I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, >> and I could attempt to push the initial Bio-Ext-HMM release after >> core 1.6.1 is out. After that, I could then add Yee Man as PAUSE >> co-maintainer for those modules (which means Yee Man needs to sign >> up for a PAUSE account). Any objections to that? > > > Sounds like a good plan to me, if Yee Man agreed with it. He would > be the primary CPAN maintainer of the package. Maybe he should > actually be the first uploader too? Then, it would show up under > his PAUSE account at the outset, and he would get better attribution > and visibility. > > Rob At the moment BIOPERLML is the primary maintainer. It's an 'umbrella' account for the bioperl group; a few others exist for stuff like DBI, Catalyst, etc I think. Anyone who's designated a co-maintainer can release code onto CPAN. Several of us can assign new co-maintainer status for modules, so the code doesn't get locked up if someone decides to abandon it. We simply designate another co-maintainer if someone decides to take it over. In fact, that's half the reason I would like to get the ext code out there again; either designate it as abandonware or set it up so that it can be reimplemented by someone with the tuits (maybe using biolib, for instance). We have recently moved Bio::Graphics over to LDS as the primary, though, so this is all a point up for debate. chris From rmb32 at cornell.edu Tue Aug 18 17:56:19 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 14:56:19 -0700 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> Message-ID: <4A8B2383.1030207@cornell.edu> Steven, Could you CC Heath Bair on this? He's the YAPC::NA 2010 coordinator that started this thread. Rob Steven Lembark wrote: > On Fri, 26 Jun 2009 14:06:06 -0700 > Robert Buels wrote: > >> This is a really giant opportunity to expose some of the best >> technologists in the world to what we do in bioinformatics, and possibly >> to entice some of them to help us the heck out! ;-) > > OK, so I'm a few months behind on my email... > > One suggestion: Have them add a BioPerl track to the > conference in advance of getting any submissions for > it. The gent I spoke to in Pittsburgh seemed open to > the idea of a Bioinformatcs/BioPerl track in 2010. > > Opening things up a bit to include Bioinformatics > even beyond BioPerl would give people who are > marginally interested a chance to see what the > whole area is about (e.g., adapting the W-Curve > for use with Perl or how we analyzed Clostridia > using Perl for the bookkeeping). > > In the meantime you might want to see how many > people would be willing to give talks in the > track -- even recycled ones -- before the conference > submission period begins. And, yes, I'd volunteer to > give 1-2 talks. > > enjoi > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jncline at gmail.com Tue Aug 18 23:06:19 2009 From: jncline at gmail.com (Jonathan Cline) Date: Tue, 18 Aug 2009 22:06:19 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: <4A8B6C2B.9030101@gmail.com> Chris Fields wrote: > > Your modules may or may not need the Bio* namespace (that's up to you, > actually); there are several non-bioperl modules that also share the > Bio* namespace, and I believe there are modules that aren't Bio* that > use BioPerl (Gbrowse comes to mind). If you're focusing on > interaction with robotics, Robotics::Bio::X might be a better > namespace for instance (b/c you could expand later into other possibly > non-bio robotics interfaces). Based on your & other opinions I have received, I am creating: Robotics.pm (high level hardware abstraction layer) Robotics::Tecan Robotics::Tecan::Genesis I'll post a release note when it's reached an interesting level of maturity (estimate a couple weeks from now) so anyone with the hardware can play with the package. It's currently working great, and I am adding functionality on a daily basis. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> >> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>>> Sent: Thursday, 30 July 2009 2:07 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Cc: Jonathan Cline >>>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>>> >>>> I am writing a module for communication with biology robotics, as >>>> discussed recently on #bioperl, and I invite your comments. >>>> >>>> >>>> On Namespace: >>>> >>>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >>>> s/w modules already called 'robots' (web spider robots, chat bots, www >>>> automate, etc) so I chose the longer name "robotics" to differentiate >>>> this module as manipulating real hardware. Bio::Robotics is the >>>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>>> manufacturer-specific implementation. Robot control is made more >>>> complex due to the very configurable nature of the work table >>>> (placement >>>> of equipment, type of equipment, type of attached arm, etc). The >>>> abstraction has to be careful not to generalize or assume too >>>> much. In >>>> some cases, the Bio::Robotics modules may expand to arbitrary >>>> equipment >>>> such as thermocyclers, tray holders, imagers, etc - that could be a >>>> future roadmap plan. From rmb32 at cornell.edu Wed Aug 19 00:13:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:13:53 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <829996.94283.qm@web30404.mail.mud.yahoo.com> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A8B7C01.5060502@cornell.edu> Yee Man Chan wrote: > I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. The long-term development plan is to factor *ALL* of Bioperl into individual distributions similar to Bio-Tools-HMM. It is actually much easier to maintain and release code in this "broken up" way. This means that the Bio-Ext package is going to go away, so it doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core devs, do you agree with this? > I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Oh good, the next step would just be to coordinate when to do the release in concert with Bioperl 1.6.1, right? Rob From rmb32 at cornell.edu Wed Aug 19 00:37:49 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:37:49 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <190221.61009.qm@web30408.mail.mud.yahoo.com> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A8B819D.9070309@cornell.edu> Yee Man Chan wrote: > Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. Yes, we are probably going to make a Task::BioPerl or something similar. > What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. By going away, I meant that everything in there is going to be spinned off. Except modules that are no longer maintainable, if there are any in there. Rob From deequan at gmail.com Wed Aug 19 00:39:35 2009 From: deequan at gmail.com (deequan) Date: Tue, 18 Aug 2009 21:39:35 -0700 (PDT) Subject: [Bioperl-l] bioperl capability In-Reply-To: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> Message-ID: <25037707.post@talk.nabble.com> Howdy there, Yes, quite right. I apologize for the double posting. Moreover, I appreciate your assistance in trying to sort out what can and cannot be done with bioperl. To address the problem previously stated, I put together a remarkably misbehaving script that has the following parts: #Some parsing: $q_start = $hsp->query->start; $q_end = $hsp->query->end; $h_start = $hsp->hit->start; $h_end = $hsp->hit->end; $length = $hsp->query->seqlength(); $id = $hit->accession; print OUT "$id\t"; my $seq; if($h_start<$h_end){ #the bit per your recommendation my $begin = $h_start-$q_start+1; my $cease = ($length - $q_end) + $h_end; my $strand = 1; my $factory = Bio::DB::GenBank->new(-format=> 'genbank', -seq_start =>$begin, -seq_stop =>$cease, -strand => $strand, #1 = plus, 2 = minus ); $seq = $factory->get_Seq_by_acc($id); }else{#else assume backward, code not shown} #and some stuff to retrieve the sequence my $len = $seq->length(); my $string = $seq->subseq(1, $len); print OUT "length = $len\t"; print OUT "seq = $string\n"; In your previous reply, you said the code accessing the seq object created by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO for basic IO purposes. Not seeing exactly how to go about that, I tried some other functions in combination that seemed as though they should work (length() and subseq()). Unfortunately, the program does not even run to that point, as the script throws an exception: ------------- EXCEPTION ------------- MSG: acc CP000948 does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 2 STACK toplevel test.pl:36 ------------------------------------- Oddly, the record corresponding to this accession number can be found here: http://www.ncbi.nlm.nih.gov/nuccore/169887498 Perhaps you'd be willing to offer another hint. Thank you for your assistance thus far. And on behalf of all posters, thank you for sharing your knowledge. 'Preciate. David Q. Chris Fields-5 wrote: > > I think I already answered this: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 > > chris > > -- View this message in context: http://www.nabble.com/bioperl-capability-tp25024929p25037707.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Wed Aug 19 01:28:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:29 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> <4A8B819D.9070309@cornell.edu> Message-ID: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> On Aug 18, 2009, at 11:37 PM, Robert Buels wrote: > Yee Man Chan wrote: >> Is it going to be an arrangement similar to bioconductor? If so, I >> suppose then it makes sense. But you might want to develop scripts >> to automatically download and install new modules to make it user >> friendly. > Yes, we are probably going to make a Task::BioPerl or something > similar. > >> What do you mean by Bio-Ext is going away? I notice quite many >> people using dpAlign. So if Bio-Ext is going away, then at least >> dpAlign should become another spin off. > By going away, I meant that everything in there is going to be > spinned off. Except modules that are no longer maintainable, if > there are any in there. > > Rob dpAlign could become another spinoff, yes, if it's used (and works fine). The problematic code dealt with pSW, alignment statistics, and staden io_lib support (the latter which is fairly bit rotted now): http://bugzilla.open-bio.org/show_bug.cgi?id=2668 http://bugzilla.open-bio.org/show_bug.cgi?id=1857 http://bugzilla.open-bio.org/show_bug.cgi?id=2069 http://bugzilla.open-bio.org/show_bug.cgi?id=2074 http://bugzilla.open-bio.org/show_bug.cgi?id=2329 dpAlign has it's own bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2384 chris From cjfields at illinois.edu Wed Aug 19 01:28:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:39 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> <4A8B7C01.5060502@cornell.edu> Message-ID: <1DA73AAB-EC4F-4F44-BBF2-CFF7B3E4A0BE@illinois.edu> On Aug 18, 2009, at 11:13 PM, Robert Buels wrote: > Yee Man Chan wrote: >> I think it is better to keep Bio-Tools-HMM within the Bio-Ext >> package and then spin this whole Bio-Ext package out to CPAN. I am >> ok with Robert's arrangement to move the related pm files under Bio/ >> Tools/ to the new Bio-Ext package. > > The long-term development plan is to factor *ALL* of Bioperl into > individual distributions similar to Bio-Tools-HMM. It is actually > much easier to maintain and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so it > doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core > devs, do you agree with this? In general, though there will be a limit as to how small we can split these off. For instance, Bio::Tree/TreeIO will be messy to split up and makes sense to keep together. Others could be more easily split off. YMMV. >> I have a PAUSE already due to my other CPAN contributions. So there >> is no need to create a new one. My PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to do the > release in concert with Bioperl 1.6.1, right? > > Rob Yes. That should be easy enough to do; basically Bio::Tools::HMM will be removed from 1.6.1, then core will be released along with Bio::Ext::HMM (or Bio::Tools::HMM, either way it would double as the distribution name). chris From cjfields at illinois.edu Wed Aug 19 01:28:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:48 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A8B6C2B.9030101@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> <4A8B6C2B.9030101@gmail.com> Message-ID: <2F5111BE-A1F3-437F-AC6C-4AC3BE05E9EB@illinois.edu> On Aug 18, 2009, at 10:06 PM, Jonathan Cline wrote: > Chris Fields wrote: >> >> Your modules may or may not need the Bio* namespace (that's up to >> you, >> actually); there are several non-bioperl modules that also share the >> Bio* namespace, and I believe there are modules that aren't Bio* that >> use BioPerl (Gbrowse comes to mind). If you're focusing on >> interaction with robotics, Robotics::Bio::X might be a better >> namespace for instance (b/c you could expand later into other >> possibly >> non-bio robotics interfaces). > > Based on your & other opinions I have received, I am creating: > > Robotics.pm (high level hardware abstraction layer) > Robotics::Tecan > Robotics::Tecan::Genesis > > > I'll post a release note when it's reached an interesting level of > maturity (estimate a couple weeks from now) so anyone with the > hardware > can play with the package. It's currently working great, and I am > adding functionality on a daily basis. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## That's great to hear! Keep us updated, I'm sure there are a few potential users lurking about here. chris From scott at scottcain.net Wed Aug 19 09:15:12 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 19 Aug 2009 09:15:12 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Hilmar, The examples in that thread ought to go in the ninth column; using the Dbxref tag for references back to GenBank for example. The provenience stuff should go in the ninth column as well, though I don't know exactly how would be best. Scott On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > > On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > >> Additionally, some applications (SynBrowse comes to mind) overload >> the source value and require them to conform to a certain syntax. >> >> So, what I'm trying to say is, source should probably just stay a >> simple string. > > > I would rephrase that to source should probably retain the > possibility of using made-up strings. > > You mention one example yourself, and there have been others in a > recent thread on BioSQL [1], for why the option to have predictable, > structured values with attached semantics could be very useful. > > -hilmar > > [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From saikari78 at gmail.com Wed Aug 19 09:30:07 2009 From: saikari78 at gmail.com (saikari keitele) Date: Wed, 19 Aug 2009 14:30:07 +0100 Subject: [Bioperl-l] Pipeline for generating phylogenetic trees from list of species names Message-ID: Hi, Does anyone know of a simple pipeline for generating a phylogenetic tree from a list of species with bioperl? I've had a look at http://www.bioperl.org/wiki/HOWTO:PhylogeneticAnalysisPipeline#Distance_Distance_in_PHYLIP_.2B_NJ_Tree_in_PHYLIPbut it isn't explicit for the crucial steps (at least given my level of knowledge) For each species, should I extract the longest sequence available for every protein and align it with the same protein sequences of the other species in the list? Would anyone have an example pipeline of the different steps to perform? Thank you very much. Saikari From ymc at yahoo.com Tue Aug 18 22:50:57 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 19:50:57 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <829996.94283.qm@web30404.mail.mud.yahoo.com> I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. There aren't that many modules in Bio-Ext. Plus, based on Chris and Robert's comments, modules other than my dpAlign and HMM appear to be abandoned. Moving HMM out only makes users less likely to try it out. If need be, I can also be a co-maintainer of this spinned off Bio-Ext package. I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Tuesday, August 18, 2009, 8:10 AM > Yee Man, Robert, > > All tests are passing; there was a small change in the > expected floating point, but no warning now. > > Re: passing this on to CPAN, I think it needs a distinct > version from BioPerl (something that should probably happen > with any spinoffs).? I foresee two options (and a > possible conflict): > > 1) Use the same versioning scheme, starting with 1.6.1. > 2) Use a simpler scheme a'la Bio::Graphics, which I > suggest.? Tripartite versions are a PITA, we'll only > need to keep that in core. > > Conflict: Bio::Tools::HMM is currently part of the 1.6 > branch (in 1.6.0).? If this stays in 1.6.1 then we have > two versions of the module floating out there. > > I think we should go ahead and remove Bio::Tools::HMM from > 1.6.1, and I could attempt to push the initial Bio-Ext-HMM > release after core 1.6.1 is out.? After that, I could > then add Yee Man as PAUSE co-maintainer for those modules > (which means Yee Man needs to sign up for a PAUSE > account).? Any objections to that? > > chris > > On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > > > I get it now. So it is now spinned off. Anyway, I > updated the HMM.pm in Bio-Tools-HMM with the latest version. > I think it should work. > > > > Yee Man > > > > --- On Mon, 8/17/09, Robert Buels > wrote: > > > >> From: Robert Buels > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Chris Fields" , > "BioPerl List" > >> Date: Monday, August 17, 2009, 4:00 PM > >> Yee Man Chan wrote: > >>> I noticed that Bio/Tools/HMM.pm was removed > from the > >> trunk. So I added it back in. I think you > shouldn't get the > >> warnings with this version. > >> > >> Please read my email above with instructions for > checkout > >> out the new Bio-Tools-HMM component, where > Bio::Tools::HMM > >> has been moved.? Please do not add the > Bio::Tools::HMM > >> module back into bioperl-live. > >> > >> I think you might be confused about the functions > of 'svn > >> add', 'svn commit', etc, because I don't see any > actual > >> addition of the module in the commit logs.? > Please read > >> through the SVN manual at http://svnbook.red-bean.com/ if you need > >> clarification. > >> > >> Rob > >> > >> > > > > > > > > From ymc at yahoo.com Wed Aug 19 00:24:05 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:24:05 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> Message-ID: <190221.61009.qm@web30408.mail.mud.yahoo.com> Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:13 PM > Yee Man Chan wrote: > > I think it is better to keep Bio-Tools-HMM within the > Bio-Ext package and then spin this whole Bio-Ext package out > to CPAN. I am ok with Robert's arrangement to move the > related pm files under Bio/Tools/ to the new Bio-Ext > package. > > The long-term development plan is to factor *ALL* of > Bioperl into individual distributions similar to > Bio-Tools-HMM.? It is actually much easier to maintain > and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so > it doesn't make sense to keep Bio-Tools-HMM in it.? > Chris, other core devs, do you agree with this? > > > I have a PAUSE already due to my other CPAN > contributions. So there is no need to create a new one. My > PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to > do the release in concert with Bioperl 1.6.1, right? > > Rob > > From ymc at yahoo.com Wed Aug 19 00:49:18 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:49:18 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> Message-ID: <184595.94226.qm@web30407.mail.mud.yahoo.com> Good. That makes sense then. Please update me when all is set. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:37 PM > Yee Man Chan wrote: > > Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > What do you mean by Bio-Ext is going away? I notice > quite many people using dpAlign. So if Bio-Ext is going > away, then at least dpAlign should become another spin off. > By going away, I meant that everything in there is going to > be spinned off.? Except modules that are no longer > maintainable, if there are any in there. > > Rob > > From ymc at yahoo.com Wed Aug 19 05:01:39 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 02:01:39 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> Message-ID: <884845.92813.qm@web30408.mail.mud.yahoo.com> I tried that sample script that reportedly caused the dpAlign "bug" but I can't reproduced it. All I get is a warning from LocatableSeq. ------------------------------------------- [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "-Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl --------------------- WARNING --------------------- MSG: In sequence ABC|9944760 residue count gives end value 101. Overriding value [104] with value 101 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 300 Getting score for ABC|9986984 -> ABC|9944760 = 303 ------------------------------------------ Does the test script crash in your machine? Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Tuesday, August 18, 2009, 10:28 PM > On Aug 18, 2009, at 11:37 PM, Robert > Buels wrote: > > > Yee Man Chan wrote: > >> Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > >> What do you mean by Bio-Ext is going away? I > notice quite many people using dpAlign. So if Bio-Ext is > going away, then at least dpAlign should become another spin > off. > > By going away, I meant that everything in there is > going to be spinned off.? Except modules that are no > longer maintainable, if there are any in there. > > > > Rob > > dpAlign could become another spinoff, yes, if it's used > (and works fine).? The problematic code dealt with pSW, > alignment statistics, and staden io_lib support (the latter > which is fairly bit rotted now): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > dpAlign has it's own bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > chris > From cjfields at illinois.edu Wed Aug 19 10:49:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 09:49:15 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <884845.92813.qm@web30408.mail.mud.yahoo.com> References: <884845.92813.qm@web30408.mail.mud.yahoo.com> Message-ID: I'll have a look. It's probably something that hasn't been updated to deal with LocatableSeq's pathological end point checking. chris On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > I tried that sample script that reportedly caused the dpAlign "bug" > but I can't reproduced it. All I get is a warning from LocatableSeq. > ------------------------------------------- > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "- > Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9944760 residue count gives end value 101. > Overriding value [104] with value 101 for Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT > -GGG-CCGGCCC-AA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 300 > Getting score for ABC|9986984 -> ABC|9944760 > = 303 > ------------------------------------------ > > Does the test script crash in your machine? > > Yee Man > > --- On Tue, 8/18/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] >> Problems with Bioperl-ext package on WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM >> On Aug 18, 2009, at 11:37 PM, Robert >> Buels wrote: >> >>> Yee Man Chan wrote: >>>> Is it going to be an arrangement similar to >> bioconductor? If so, I suppose then it makes sense. But you >> might want to develop scripts to automatically download and >> install new modules to make it user friendly. >>> Yes, we are probably going to make a Task::BioPerl or >> something similar. >>> >>>> What do you mean by Bio-Ext is going away? I >> notice quite many people using dpAlign. So if Bio-Ext is >> going away, then at least dpAlign should become another spin >> off. >>> By going away, I meant that everything in there is >> going to be spinned off. Except modules that are no >> longer maintainable, if there are any in there. >>> >>> Rob >> >> dpAlign could become another spinoff, yes, if it's used >> (and works fine). The problematic code dealt with pSW, >> alignment statistics, and staden io_lib support (the latter >> which is fairly bit rotted now): >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 >> >> dpAlign has it's own bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 >> >> chris >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Aug 19 18:19:25 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 19 Aug 2009 18:19:25 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Message-ID: <4907C3F4-C503-4019-BBDA-153ED777276C@gmx.net> Putting it into the 9nth column is the equivalent of storing it in the {seqfeature,bioentry}_qualifier_value tables in BioSQL. -hilmar On Aug 19, 2009, at 9:15 AM, Scott Cain wrote: > Hilmar, > > The examples in that thread ought to go in the ninth column; using > the Dbxref tag for references back to GenBank for example. The > provenience stuff should go in the ninth column as well, though I > don't know exactly how would be best. > > Scott > > > > On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > >> >> On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: >> >>> Additionally, some applications (SynBrowse comes to mind) overload >>> the source value and require them to conform to a certain syntax. >>> >>> So, what I'm trying to say is, source should probably just stay a >>> simple string. >> >> >> I would rephrase that to source should probably retain the >> possibility of using made-up strings. >> >> You mention one example yourself, and there have been others in a >> recent thread on BioSQL [1], for why the option to have >> predictable, structured values with attached semantics could be >> very useful. >> >> -hilmar >> >> [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Wed Aug 19 20:55:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 19 Aug 2009 20:55:22 -0400 Subject: [Bioperl-l] Hi In-Reply-To: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> References: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> Message-ID: Please ask on the mailing list for these things, I am not really sure what you mean by subtract all taxonomy -- I suspect you mean extract all IDs, I think you should take a look at the example like http://bioperl.org/wiki/Module:Bio::DB::Taxonomy I think the example is basically what you want to do, except replace the nodeid with 7742 instead of 33090 -jason On Aug 19, 2009, at 2:47 PM, JingtaoLiu(TSU) wrote: > Hi Sir, > > Thank you for reading this. > I am working for BioChem Dept Texastate university. > I encounter a problem. > I need subtract all taxonomy IDs from vertebrates(taxon id is 7742) > how I can get all the leaf node of these? > > I referenced Bio::DB::Taxonomy, > but i have no clue about it. > Very appreciate for your help. > > Jingtao Liu -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From yannick.wurm at unil.ch Wed Aug 19 15:25:11 2009 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Wed, 19 Aug 2009 21:25:11 +0200 Subject: [Bioperl-l] Programmer job in Lausanne Switzerland Message-ID: <1D1F031E-29F1-4AE4-A225-D9B434ACE070@unil.ch> Dear list, my apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. We have a job opening for assembly and annotation of ant genomes in Lausanne Switzerland. http://www.isb-sib.ch/about-sib/jobs/details/91-sib-bioinformatician-at-sib--unil.html http://fourmidable.unil.ch/BioinformaticsEngineerLausanneAnts.pdf Kind regards, Yannick http://yannick.poulet.org From sidd.basu at gmail.com Thu Aug 20 06:03:07 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 20 Aug 2009 05:03:07 -0500 Subject: [Bioperl-l] Re: code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: <20090820100304.GA1884@seinfeld> On Tue, 18 Aug 2009, Chris Fields wrote: > > On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > > > Putting it in the bioperl list, makes more sense here, > > > > On Wed, 12 Aug 2009, Chris Fields wrote: > > > >> (BTW, this is re: the reimplementation of major chunks of BioPerl > >> using > >> Moose, Biome: http://github.com/cjfields/biome/tree/) > >> > >> Locations should use a Role (specifically, Biome::Role::Range), so > >> start/end/strand should be attributes, not methods. With attributes > >> the > >> best way to do this is probably with a builder, and lazily (start > >> requires end, and vice versa). Factor out the common code as Tomas > >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- > >> >throw() > >> exception handling; it simply catches any exceptions and passes them > >> to > >> the metaclass exception handling. > >> > >> I've been thinking about making the Range role abstract for this very > >> reason (or defining very basic attributes); something like: > >> > >> ---------------------------- > >> > >> package Bio::Role::Range; > >> > >> requires qw(_build_start _build_end _build_strand); > >> > >> # also require other methods which need to be defined in > >> implementation > >> > >> has 'start' => ( > >> isa => 'Int', > >> is => 'rw', > >> builder => '_build_start', > >> lazy => 1 > >> ); > >> > >> # same for end, strand (except strand has a different isa via > >> MooseX::Types) > >> .... > >> > >> package Bio::Location::Foo; > >> > >> with 'Bio::Role::Range'; > >> > >> sub _build_start { > >> # for location-specific start > >> } > >> > >> sub _build_end { > >> # for location-specific end > >> } > >> > >> sub _build_strand { > >> # for location-specific strand > >> } > >> > >> sub _common_build_method { > >> # factor out common code here, call from other builders > >> } > >> > >> ---------------------------- > > > > This plan makes things much clearer. Currently the > > BioMe::Role::Location has a 'requires' keyword and rest of the > > location modules consume that role to have its own implementation. At > > this point on BioMe::Location::Atomic has attribute based 'start' and > > 'end' implememtation. I got a bit confused because in current bioperl > > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > > i am trying to follow that path in BioMe it has to override that > > method. > > So, my question is do all the location modules really needs to > > inherits > > from each other. I am totally aware about the origianl design ideas > > but > > it would be better to have a flatten hierarchy if possible. > > Flattening with roles is always a good idea, yes. I wouldn't worry as > much about the way it was originally implemented as the general API (and > ways in which we can simplify it). Thanks for clarifying that. > > > One more thing, what about putting the 'start', 'end' and the other > > common base attributes in BioMe::Role::Location instead of > > BioMe::Role::Range. I am not sure which would be correct from bioperl > > stand of view, just throwing out an idea. > > That's a possibility. To me Locations are just Ranges with different > behavior (hence the below comment...) > > >> Also, I think the Coordinate-related stuff should be simplified down > >> to a > >> trait or an attribute; they bring in way too much overhead in > >> bioperl w/o > >> much added value. > > > > You mean instead of having 'builder' method, having a specialized > > traits handling those. That sounds like even better. > > > > -siddhartha > > Yes, that's essentially it. Location behavior could be changed by > having CoordinatePolicy as a trait. Similarly, fuzziness for start/end > could also be thought of as a trait. In essence, you could probably role > most behavior into attribute traits (which, in Moose, are just roles that > are composed into the attribute meta class, Moose::Meta::Attribute). I > had started up a Biome::Meta::Attribute class in case we were to go down > this path, then we could start registering specific traits within that > namespace. > > Just to note, it might be easier to try the simplest approach first and > get tests passing, then layer in traits to see how they act > performance-wise. My guess is they will speed things up, but you never > know. Locations will be a performance bottleneck as they are used in > generic Features. That's seemed to be a saner approach. Will play around with the builder approach and get the tests passing at least. thanks, -siddhartha > > chris From ymc at yahoo.com Wed Aug 19 23:01:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 20:01:28 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <191324.76414.qm@web30403.mail.mud.yahoo.com> I noticed that the $qalseq is a LocatableSeq with gaps. I don't think my program was written to support LocatableSeq with gaps. If I removed the gaps, then I would have the scores agree with each other which should be the desired outcome. --------------------- WARNING --------------------- MSG: In sequence ABC|9986984 residue count gives end value 104. Overriding value [101] with value 104 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 291 Getting score for ABC|9986984 -> ABC|9944760 = 291 Do you think I should check for this LocatableSeq type and give an error or should I remove the gaps if this is a LocatableSeq? Yee Man --- On Wed, 8/19/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 7:49 AM > I'll have a look.? It's probably > something that hasn't been updated to deal with > LocatableSeq's pathological end point checking. > > chris > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > I tried that sample script that reportedly caused the > dpAlign "bug" but I can't reproduced it. All I get is a > warning from LocatableSeq. > > ------------------------------------------- > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > "-Iblib/lib" "-Iblib/arch" > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > --------------------- WARNING --------------------- > > MSG: In sequence ABC|9944760 residue count gives end > value 101. > > Overriding value [104] with value 101 for > Bio::LocatableSeq::end(). > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > --------------------------------------------------- > > Getting score for ABC|9944760 -> ABC|9986984 > > = 300 > > Getting score for ABC|9986984 -> ABC|9944760 > > = 303 > > ------------------------------------------ > > > > Does the test script crash in your machine? > > > > Yee Man > > > > --- On Tue, 8/18/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was > Re: [Bioperl-l] Problems with Bioperl-ext package on > WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Tuesday, August 18, 2009, 10:28 PM > >> On Aug 18, 2009, at 11:37 PM, Robert > >> Buels wrote: > >> > >>> Yee Man Chan wrote: > >>>> Is it going to be an arrangement similar > to > >> bioconductor? If so, I suppose then it makes > sense. But you > >> might want to develop scripts to automatically > download and > >> install new modules to make it user friendly. > >>> Yes, we are probably going to make a > Task::BioPerl or > >> something similar. > >>> > >>>> What do you mean by Bio-Ext is going away? > I > >> notice quite many people using dpAlign. So if > Bio-Ext is > >> going away, then at least dpAlign should become > another spin > >> off. > >>> By going away, I meant that everything in > there is > >> going to be spinned off.? Except modules that > are no > >> longer maintainable, if there are any in there. > >>> > >>> Rob > >> > >> dpAlign could become another spinoff, yes, if it's > used > >> (and works fine).? The problematic code dealt > with pSW, > >> alignment statistics, and staden io_lib support > (the latter > >> which is fairly bit rotted now): > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > >> > >> dpAlign has it's own bug: > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > >> > >> chris > >> > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at gmail.com Thu Aug 20 04:46:52 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Thu, 20 Aug 2009 10:46:52 +0200 Subject: [Bioperl-l] SCF installation Message-ID: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Hi, I am trying to install SCF (a prerequisite to samtools). I installed libread and the compilation seems to be working, only test is failing: zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::SCF zoppel:Bio-SCF-1.01 bernd$ make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c Running Mkbootstrap for Bio::SCF () chmod 644 SCF.bs rm -f blib/arch/auto/Bio/SCF/SCF.bundle LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ -lread -lz \ chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs chmod 644 blib/arch/auto/Bio/SCF/SCF.bs Manifying blib/man3/Bio::SCF.3pm zoppel:Bio-SCF-1.01 bernd$ make test PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) Failed 18/18 subtests Test Summary Report ------------------- t/scf.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: Bad plan. You planned 18 tests but ran 0. Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 csys = 0.11 CPU) Result: FAIL Failed 1/1 test programs. 0/0 subtests failed. make: *** [test_dynamic] Error 2 Any idea what might be going wrong? Please not that in the directory there are some file empty: ls -ltr -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER -rw-r--r-- 1 bernd staff 532 17 mai 2006 README -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm drwxr-xr-x 3 bernd staff 102 17 mai 2006 t drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . Thanks, Bernd From cain.cshl at gmail.com Thu Aug 20 10:30:33 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 20 Aug 2009 10:30:33 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: Hi Bernd, Bio::SCF isn't technically part of BioPerl, but I have installed it before so I'll take a shot: do you have the Staden io-lib installed? It is a prereq for Bio::SCF. If you did install it, is it in a normal library path, and did you run ldconfig (if appropriate for your system) after installing it? io-lib can be obtained here: http://staden.sourceforge.net/ If you do have all of those things in place, what version of io-lib are you using? I wonder if there is an incompatibility between Bio::SCF and your version. The INSTALL doc for Bio::SCF indicates that you should have version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may have broken an api call that Bio::SCF depends on. Scott On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only > test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - > typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv > SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN > SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 > cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From dan.bolser at gmail.com Thu Aug 20 11:00:41 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 20 Aug 2009 16:00:41 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? Message-ID: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Hi, Quick version: How do I get a column of Bio::SimpleAlign using ungapped 'reference' sequence coordinates? Longer version: I have a set of pairwise alignments that I would like to process into a 'multiple sequence alignment' (MSA). All the alignments are short sequence 'contigs' aligned to a 'reference' sequence, so one sequence in all the pairwise alignments is constant (making the resulting MSA unambiguous). I came up with the following pseudo-code to create a MSA (Bio::SimpleAlign) from the set of pairwise alignments... initialise: Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. for each pairwise alignment: Create a Bio::LocatableSeq from the given fragment of the REFERENCE sequence (using ungapped REFERENCE coordinates). for each gap in the REFERENCE sequence: Take the position of the gap (in ungapped REFERENCE coordinates) and look up the corresponding column of the MSA (in ungapped REFERENCE coordinates). for each sequence in the column: Check if there is a gap-character at this position. if any sequence has a non gap-character at this position: Stick a gap in the MSA just before this position. Create a Bio::LocatableSeq from the CONTIG sequence (using ungapped REFERENCE coordinates) and add it to the Bio::SimpleAlign. done. I would very much appreciate, 1) feedback on the correctness of the above algorithm (it could be horribly wrong), and 2) advice on how to get a column of the alignment using ungapped REFERENCE coordinates? Sorry if this is a solved problem (where is it solved?). If not, and if I can get it working, I'll try to write a generic function to merge two MSAs when they have a reference sequence in common. For your reference, the pairwise alignments come from the show-aligns command in the MUMmer sequence alignment package, and have the following format: my.reference.fasta my.contigs.multi.fasta ============================================================ -- Alignments between REFERENCE and CONTIG00012 -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat ^ ^ ^ ^ 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa ^^ ^ ^ ^ 29335 aagacccctacataaatatatggtcccattaatattataaattaataat 97 aagacccctatataaatatatggtctcattaatattataaattaataat ^ ^ ... For further reference: This thread: http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html http://www.bioperl.org/wiki/Align_Refactor http://www.bioperl.org/wiki/Alignment_object All the best, Dan. From lincoln.stein at gmail.com Thu Aug 20 12:07:16 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 20 Aug 2009 12:07:16 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From j_martin at lbl.gov Thu Aug 20 12:41:16 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 20 Aug 2009 09:41:16 -0700 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <20090820164115.GA10681@eniac.jgi-psf.org> Hello, Bio::SCF isn't a pre-requisite of samtools or Bio::Samtools, and neither is actually related to Bioperl. samtools has a pretty active mailing list at sourceforge, you might try asking there. http://sourceforge.net/mailarchive/forum.php?forum_name=samtools-help I use samtools all the time w/o either of those modules. Joel On Thu, Aug 20, 2009 at 10:46:52AM +0200, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Aug 20 12:42:23 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 20 Aug 2009 17:42:23 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? In-Reply-To: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> References: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Message-ID: <4A8D7CEF.4080002@gmail.com> Hi Dan, I think you want the Bio::LocatableSeq method "column_from_residue_number". You might also try combining your pairwise alignments using the profile alignment option in ClustalW. Cheers. Roy. Dan Bolser wrote: > Hi, > > Quick version: How do I get a column of Bio::SimpleAlign using > ungapped 'reference' sequence coordinates? > > > > Longer version: > > I have a set of pairwise alignments that I would like to process into > a 'multiple sequence alignment' (MSA). All the alignments are short > sequence 'contigs' aligned to a 'reference' sequence, so one sequence > in all the pairwise alignments is constant (making the resulting MSA > unambiguous). > > I came up with the following pseudo-code to create a MSA > (Bio::SimpleAlign) from the set of pairwise alignments... > > initialise: > Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. > > for each pairwise alignment: > Create a Bio::LocatableSeq from the given fragment of the > REFERENCE sequence (using ungapped REFERENCE coordinates). > > for each gap in the REFERENCE sequence: > Take the position of the gap (in ungapped REFERENCE > coordinates) and look up the corresponding column of the MSA > (in ungapped REFERENCE coordinates). > > for each sequence in the column: > Check if there is a gap-character at this position. > > if any sequence has a non gap-character at this position: > Stick a gap in the MSA just before this position. > > Create a Bio::LocatableSeq from the CONTIG sequence (using > ungapped REFERENCE coordinates) and add it to the > Bio::SimpleAlign. > > done. > > > I would very much appreciate, 1) feedback on the correctness of the > above algorithm (it could be horribly wrong), and 2) advice on how to > get a column of the alignment using ungapped REFERENCE coordinates? > > > Sorry if this is a solved problem (where is it solved?). If not, and > if I can get it working, I'll try to write a generic function to merge > two MSAs when they have a reference sequence in common. > > > For your reference, the pairwise alignments come from the show-aligns > command in the MUMmer sequence alignment package, and have the > following format: > > my.reference.fasta my.contigs.multi.fasta > > ============================================================ > -- Alignments between REFERENCE and CONTIG00012 > > -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] > > > 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat > 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat > ^ ^ ^ ^ > > 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta > 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa > ^^ ^ ^ ^ > > 29335 aagacccctacataaatatatggtcccattaatattataaattaataat > 97 aagacccctatataaatatatggtctcattaatattataaattaataat > ^ ^ > > ... > > > For further reference: > > This thread: > http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html > > http://www.bioperl.org/wiki/Align_Refactor > > http://www.bioperl.org/wiki/Alignment_object > > > > All the best, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Thu Aug 20 16:31:20 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 20 Aug 2009 16:31:20 -0400 Subject: [Bioperl-l] genbank to fasta conversion Message-ID: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Hello, I have previously converted multiple genbank files to fasta. For some reason I am having trouble with this simple script. #!/usr/bin/perl -w use strict; use Bio::SeqIO; open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); my $in = Bio::SeqIO->new('-file' => "$inFile" , '-format' => 'GenBank'); my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); print $out $_ while <$in>; I keep getting the error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open GLOB(0x36a214): No such file or directory STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ genbank.pm:202 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 ----------------------------------------------------------- I am probably missing something simple, but would appreciate any help. M From cjfields at illinois.edu Thu Aug 20 16:38:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 Aug 2009 15:38:03 -0500 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <7868B105-53AD-4C87-8B21-2E4D4A7781B5@illinois.edu> You are passing filehandles in, not file names. Switch the '-file' parameter to '-fh'. chris On Aug 20, 2009, at 3:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 20 16:43:06 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 20 Aug 2009 13:43:06 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <4A8DB55A.6060605@cornell.edu> The error is that you are opening a filehandle called $outfile, and then you are stringifying it (resulting in a string containing "GLOB(..)", and telling Bio::SeqIO write to a file named "GLOB(...)", which it can't open. You probably want to use the -fh arguments for your two uses of Bio::SeqIO, either that, or remove your open() calls and pass the filenames to the SeqIO objects directly, like: my $in = Bio::SeqIO->new ('-file' => "C:/Documents and Settings/mydir/Desktop/TARGETING.gb", '-format' => 'GenBank', ); my $out = Bio::SeqIO->new ('-file' => ">C:/Documents and Settings/mydir/Desktop/TARGET.fa", '-format' => 'fasta', ); Rob Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From sharpton at berkeley.edu Thu Aug 20 16:40:34 2009 From: sharpton at berkeley.edu (Thomas Sharpton) Date: Thu, 20 Aug 2009 13:40:34 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: This is a problem I think I can solve, so I'm chiming in for once. Looks to me like you're trying to pass a file handle to the -file setting in your SeqIO object. One of the excellent things about using SeqIO is that you don't need to worry about file handles; it's all taken care of under the hood. Try the following adaptation of your script: #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $inFile = "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"; my $outfile = "C:/Documents and Settings/mydir/Desktop/TARGET.fa"; #OPEN A SEQUENCE FILE OF INTEREST ($inFile) AND CREATE A SEQUENCE STREAM ($in) my $in = Bio::SeqIO->new(-file => "$inFile" , '-format' => 'GenBank'); #OPEN AN OUPUT FILE OF INTEREST ($outfile)AND CREATE AN OUTPUT SEQUENCE STREAM ($out) #NOTICE HOW WE SET -file FOR OUTPUT WITH THE > SYMBOL HERE: my $out = Bio::SeqIO->new(-file => ">$outfile" ,'-format' => 'Fasta'); #NOW LET'S DO THE CONVERSION AND DUMP THE OUTPUT #INSTEAD OF DOING THIS #print $out $_ while <$in>; #TRY THIS while(my $seq = $in->next_seq() ){ $out->write_seq($seq) } The above is pretty much what you'll find here: http://www.bioperl.org/wiki/HOWTO:SeqIO which you should definitely look over to better understand what's happening with SeqIO object. Good luck! Tom On Aug 20, 2009, at 1:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 07:34:49 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 13:34:49 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotide database Message-ID: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Hello all I would like to download the wgs sequences of the unfinished genomes from ncbi. (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi here's an example accession NZ_ACVD00000000 and here's the link to the accession at genbank http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 This record contains the accessions that belong to this record in the following line in the genbank output WGS NZ_ACVD01000001-NZ_ACVD01000139 The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession numbers that are are specified by this range. here's a link http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] The bioperl related question is... Since these are unassembled genomes, there are several contigs for each one, and they all available in this record. Is it possible to download a range without trying to recreate each accession number? on the other hand, it is possible to download each individually , this would mean making the following NZ_ACVD01000001 NZ_ACVD01000002 NZ_ACVD01000003 . . . NZ_ACVD01000139 from NZ_ACVD01000001-NZ_ACVD01000139 I can recreate these numbers and download each one separately. However, sometimes I get a timeout exception and the whole thing stops. the code ( copied shamelessly from the bioperl website, works great to get single accessions) my $id = "NZ_ACVD00000000"; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => 'fullcontig.gb'); I did try and catch the exceptions from the get_Response..but its not working as expected... maybe someone can point out what I'm doing wrong here. For some reason, the code never seems to go any print statement in the catch construct... $ele = "somecontig id"; try { print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; $factory->get_Response(-file => "$genbank_file"); } catch Bio::Root::Exception with { my $err = shift; if (! defined $err) { print "MAY HAVE DOWNLOADED $ele..\n"; } else { print "PROBABLE TIMEOUT ERROR\n"; print "$err\n"; } }; Or is it possible to somehow increase the timeout time for the get_Response method? thanks in advance! regards Rohit From bernd.jagla at gmail.com Fri Aug 21 05:30:27 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Fri, 21 Aug 2009 11:30:27 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi, I have installed io_lib-1.9.0. This produces libread.a. I am working on a Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error message. I don't really know how to test that it is working. I am trying to install Bio-SCF-1.01. It seems that the test.scf file cannot be read. Is there another way using some other tools to see if that is working? (Sorry for misrepresenting samtools. I was actually trying to install Bio-Graphics, which was asking for Bio::SCF). Thanks, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln Stein Sent: Thursday, August 20, 2009 6:07 PM To: scott at scottcain.net Cc: bioperl-l at lists.open-bio.org; Bernd Jagla Subject: Re: [Bioperl-l] SCF installation It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Fri Aug 21 09:05:25 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 Aug 2009 09:05:25 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Fri Aug 21 08:50:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 08:50:08 -0400 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Message-ID: <71B4268E5B524F719D24088483568870@NewLife> Hi Rohit- Re: timeout, you could try $factory->ua->timeout($number_greater_than_180_sec) before issuing the request. cheers MAJ ----- Original Message ----- From: "Rohit Ghai" To: Sent: Friday, August 21, 2009 7:34 AM Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase > Hello all > > I would like to download the wgs sequences of the unfinished genomes from > ncbi. > (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi > > here's an example accession > > NZ_ACVD00000000 > > and here's the link to the accession at genbank > > http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 > > This record contains the accessions that belong to this record in the > following line in the genbank output > > WGS NZ_ACVD01000001-NZ_ACVD01000139 > > The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession > numbers that are > > are specified by this range. > > here's a link > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] > > > The bioperl related question is... > > Since these are unassembled genomes, there are several contigs for each one, > and they all available in this record. > > Is it possible to download a range without trying to recreate each accession > number? > > on the other hand, it is possible to download each individually , this would > mean making the following > > NZ_ACVD01000001 > NZ_ACVD01000002 > NZ_ACVD01000003 > . > . > . > NZ_ACVD01000139 > > from NZ_ACVD01000001-NZ_ACVD01000139 > > > I can recreate these numbers and download each one separately. However, > sometimes I get a timeout exception > and the whole thing stops. > > the code ( copied shamelessly from the bioperl website, works great to get > single accessions) > > my $id = "NZ_ACVD00000000"; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => > 'nucleotide', > -id => > $id, > -rettype > => 'gbwithparts'); > > $factory->get_Response(-file => 'fullcontig.gb'); > > > I did try and catch the exceptions from the get_Response..but its not > working as expected... maybe someone can point out what I'm doing wrong > here. For some reason, the code never seems to go any print statement in the > catch construct... > > $ele = "somecontig id"; > > try { > print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; > $factory->get_Response(-file => "$genbank_file"); > > } catch Bio::Root::Exception with { > my $err = shift; > if (! defined $err) { > print "MAY HAVE DOWNLOADED $ele..\n"; > } else { > print "PROBABLE TIMEOUT ERROR\n"; > print "$err\n"; > } > }; > > > Or is it possible to somehow increase the timeout time for the get_Response > method? > > thanks in advance! > > > regards > > Rohit > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at pasteur.fr Fri Aug 21 09:30:38 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Fri, 21 Aug 2009 15:30:38 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina><6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: <0D219C72BC5F432BA5CDBBCFCE94AA02@zillumina> Thanks, I was confused by the error message of Bio::Graphics. Now I tried make, make test and was able to install... Thanks, Let's forget about the rest then since I don't believe I will need that... Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain Sent: Friday, August 21, 2009 3:05 PM To: Bernd Jagla Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] SCF installation Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 09:40:02 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 15:40:02 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <71B4268E5B524F719D24088483568870@NewLife> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> <71B4268E5B524F719D24088483568870@NewLife> Message-ID: <94c73820908210640h3b5854fbxe19c259c66cf9ee4@mail.gmail.com> Thanks! I have made the change... no error yet.. so keeping my fingers crossed cheers Rohit On Fri, Aug 21, 2009 at 2:50 PM, Mark A. Jensen wrote: > Hi Rohit- > Re: timeout, you could try > $factory->ua->timeout($number_greater_than_180_sec) > before issuing the request. > cheers MAJ > ----- Original Message ----- From: "Rohit Ghai" > To: > Sent: Friday, August 21, 2009 7:34 AM > Subject: [Bioperl-l] downloading multiple contigs from ncbi > nucleotidedatabase > > > Hello all >> >> I would like to download the wgs sequences of the unfinished genomes from >> ncbi. >> (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi >> >> here's an example accession >> >> NZ_ACVD00000000 >> >> and here's the link to the accession at genbank >> >> http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 >> >> This record contains the accessions that belong to this record in the >> following line in the genbank output >> >> WGS NZ_ACVD01000001-NZ_ACVD01000139 >> >> The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession >> numbers that are >> >> are specified by this range. >> >> here's a link >> >> >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] >> >> >> The bioperl related question is... >> >> Since these are unassembled genomes, there are several contigs for each >> one, >> and they all available in this record. >> >> Is it possible to download a range without trying to recreate each >> accession >> number? >> >> on the other hand, it is possible to download each individually , this >> would >> mean making the following >> >> NZ_ACVD01000001 >> NZ_ACVD01000002 >> NZ_ACVD01000003 >> . >> . >> . >> NZ_ACVD01000139 >> >> from NZ_ACVD01000001-NZ_ACVD01000139 >> >> >> I can recreate these numbers and download each one separately. However, >> sometimes I get a timeout exception >> and the whole thing stops. >> >> the code ( copied shamelessly from the bioperl website, works great to get >> single accessions) >> >> my $id = "NZ_ACVD00000000"; >> my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', >> -db => >> 'nucleotide', >> -id => >> $id, >> -rettype >> => 'gbwithparts'); >> >> $factory->get_Response(-file => 'fullcontig.gb'); >> >> >> I did try and catch the exceptions from the get_Response..but its not >> working as expected... maybe someone can point out what I'm doing wrong >> here. For some reason, the code never seems to go any print statement in >> the >> catch construct... >> >> $ele = "somecontig id"; >> >> try { >> print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; >> $factory->get_Response(-file => "$genbank_file"); >> >> } catch Bio::Root::Exception with { >> my $err = shift; >> if (! defined $err) { >> print "MAY HAVE DOWNLOADED $ele..\n"; >> } else { >> print "PROBABLE TIMEOUT ERROR\n"; >> print "$err\n"; >> } >> }; >> >> >> Or is it possible to somehow increase the timeout time for the >> get_Response >> method? >> >> thanks in advance! >> >> >> regards >> >> Rohit >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From rmb32 at cornell.edu Fri Aug 21 15:39:31 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 Aug 2009 12:39:31 -0700 Subject: [Bioperl-l] added a perltidy profile file Message-ID: <4A8EF7F3.0@cornell.edu> This one is copied from the parrot project. I added it in maintenance/perltidy.conf. Have a look, tweak as you see fit. The idea with perltidy profile files is to use them to enforce coding style rules. So this perltidy profile file would be the place to codify the BioPerl coding standards, such as indentation, use of cuddled elses, etc. So here is one, let's customize it for our needs. The way I usually run perltidy is with -b to modify a file in-place, and with the '-pro=' option to specify a profile file. Example: perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Aug 21 17:03:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 16:03:07 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <25037707.post@talk.nabble.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> <25037707.post@talk.nabble.com> Message-ID: On Aug 18, 2009, at 11:39 PM, deequan wrote: > > Howdy there, > > Yes, quite right. I apologize for the double posting. > Moreover, I > appreciate your assistance in trying to sort out what can and cannot > be done > with bioperl. To address the problem previously stated, I put > together a > remarkably misbehaving script that has the following parts: > > #Some parsing: > $q_start = $hsp->query->start; > $q_end = $hsp->query->end; > $h_start = $hsp->hit->start; > $h_end = $hsp->hit->end; > $length = $hsp->query->seqlength(); > $id = $hit->accession; > > print OUT "$id\t"; > my $seq; > if($h_start<$h_end){ > > #the bit per your recommendation > my $begin = $h_start-$q_start+1; > my $cease = ($length - $q_end) + $h_end; > my $strand = 1; > my $factory = Bio::DB::GenBank->new(-format=> 'genbank', > -seq_start =>$begin, > -seq_stop =>$cease, > -strand => $strand, #1 = plus, 2 = minus > ); > $seq = $factory->get_Seq_by_acc($id); > }else{#else assume backward, code not shown} > [ > #and some stuff to retrieve the sequence > > my $len = $seq->length(); > my $string = $seq->subseq(1, $len); > print OUT "length = $len\t"; > print OUT "seq = $string\n"; ] Not sure what you are doing with the above sequence. The abve > In your previous reply, you said the code accessing the seq object > created > by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO > for > basic IO purposes. # create an output seq stream somewhere my $out = Bio::SeqIO->new(-file => '>sequences.gb', -format => 'genbank'); .... # take seq object ($seq), write to the stream $out->write_seq($seq); > Not seeing exactly how to go about that, I tried some > other functions in combination that seemed as though they should work > (length() and subseq()). Unfortunately, the program does not even > run to > that point, as the script throws an exception: > > ------------- EXCEPTION ------------- > MSG: acc CP000948 does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 > 2 > STACK toplevel test.pl:36 > ------------------------------------- > > > Oddly, the record corresponding to this accession number can be > found here: > http://www.ncbi.nlm.nih.gov/nuccore/169887498 That's probably something to do with NCBI unfortunately; I'll have to look into it. The best alternative is if you have BLAST reports that include the GI (or UID). That's the most reliable number (using that in coordination with get_Seq_by_id), but it's not on by default, you have to indicate it's inclusion. More recent versions of Bio::SearchIO::blast parse out the GI from the descriptor if it's present. > Perhaps you'd be willing to offer another hint. Thank you for your > assistance thus far. And on behalf of all posters, thank you for > sharing > your knowledge. 'Preciate. > > David Q. No problem. chris From dan.bolser at gmail.com Fri Aug 21 17:55:37 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 21 Aug 2009 22:55:37 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <4A8EF7F3.0@cornell.edu> References: <4A8EF7F3.0@cornell.edu> Message-ID: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Cheers Rob, Whatever objectons may arise from style x or style y, I think it's a great idea to at least have one style or another recognized as being 'standard'. I know TMTOWTDI, but on a project like this, with so many contributors and users, it's essential to at least have a recommendation. I'll try to use this on any contribs. As you pointed out [1], its probably best to provide two patches for any change involving a formating clean up: one to change the fomat to the standard and one to commit the actual code changes. All the best, Dan. [1] irc://irc.freenode.net/#bioperl 2009/8/21 Robert Buels : > This one is copied from the parrot project. ?I added it in > maintenance/perltidy.conf. > Have a look, tweak as you see fit. > > The idea with perltidy profile files is to use them to enforce coding style > rules. ?So this perltidy profile file would be the place to codify the > BioPerl coding standards, such as indentation, use of cuddled elses, etc. > > So here is one, let's customize it for our needs. ?The way I usually run > perltidy is with -b to modify a file in-place, and with the '-pro=' option > to specify a profile file. > > Example: > ? perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY ?14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Aug 21 23:12:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 23:12:55 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <86486D3736614E6A81AF9521B5BB796A@NewLife> Thanks to all (six, seven including Rob and his perltidy) who responded to this thread. (Lurkers, you are not volunteering by responding, honest.) I'm preparing a wiki page (of course) with the major points, some further comments, and an action plan for your consideration. Watch this space. cheers, MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "BioPerl List" Cc: "Chris Fields" Sent: Friday, August 14, 2009 10:32 PM Subject: [Bioperl-l] on BP documentation > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >>I guess that from my perspective, after doing this stuff for >>about 10 years, I personally would prefer to see a "summer of >>documentation" for the bio* languages (or at least bioperl, as that is >>the only one I ever look at). From my own experiences, and from those >>of many colleagues, the documentation for bioperl has gone from >>mediocre to quite poor in the last few years. I largely think the >>wikification of the docs are to blame for this. Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. There is >>far too much inside baseball going on in the documentation scheme. > >>So I worry more about the constant adding of features at the expense >>of documenting what is already there. This is just my 2 cents, and it >>is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I must > agree > that BP docs are rather a rat's nest and of varying quality, but taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal about the > wikification, but I wager that the power of the wiki could be leveraged > to greater advantage (right, Dan?). > > I think that what we all as developers love is to code, and detest is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it all hang > out', > as they said in my mom's era. > > cheers > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 22 00:11:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 23:11:42 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <86486D3736614E6A81AF9521B5BB796A@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <86486D3736614E6A81AF9521B5BB796A@NewLife> Message-ID: <594EBBA3-5043-4DDF-9157-65195747266D@illinois.edu> Mark, One suggestion that i agree with: we need to add API-specific module documentation to the site somehow (not just links to CPAN/PDOC). There are a few ways to do so; a quick way may be to install something like the Mediawiki SecureHTML extension and create a protected template (this would be for pdoc, cpan, or both). Another one is to write up a pod2wiki converter and create API- specific pages, then have a bot automate the pages. A POD extension also exists, but we would still need to embed code. I much prefer the extensions than anything else. chris On Aug 21, 2009, at 10:12 PM, Mark A. Jensen wrote: > Thanks to all (six, seven including Rob and his perltidy) who > responded to this thread. (Lurkers, you are not volunteering > by responding, honest.) I'm preparing a wiki page (of course) > with the major points, some further comments, and an action > plan for your consideration. Watch this space. > cheers, > MAJ > ----- Original Message ----- From: "Mark A. Jensen" > > To: "BioPerl List" > Cc: "Chris Fields" > Sent: Friday, August 14, 2009 10:32 PM > Subject: [Bioperl-l] on BP documentation > > >> Hi All -- >> >> Off-list, an old colleague of mine had this insightful, if damning, >> comment: >> >>> I guess that from my perspective, after doing this stuff for >>> about 10 years, I personally would prefer to see a "summer of >>> documentation" for the bio* languages (or at least bioperl, as >>> that is >>> the only one I ever look at). From my own experiences, and from >>> those >>> of many colleagues, the documentation for bioperl has gone from >>> mediocre to quite poor in the last few years. I largely think the >>> wikification of the docs are to blame for this. Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. There is >>> far too much inside baseball going on in the documentation scheme. >> >>> So I worry more about the constant adding of features at the expense >>> of documenting what is already there. This is just my 2 cents, >>> and it >>> is disappointing to see a downward trend for bioperl in this regard. >> >> I would be really interested in all responses from the list users. >> I must agree >> that BP docs are rather a rat's nest and of varying quality, but >> taken in >> toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount >> of useful and sophisticated information available. I think there are >> approaches we can take to reorganize and standardize the accession >> of it to make it more useful and inviting. I disagree with my pal >> about the >> wikification, but I wager that the power of the wiki could be >> leveraged >> to greater advantage (right, Dan?). >> >> I think that what we all as developers love is to code, and detest >> is to >> document. Since BP is all-volunteer, and volunteers tend to do what >> they like -- the beauty of open source, btw -- documentation reorg >> and cleanup probably must devolve to the Core. I am willing to lead >> such an effort, which will take some time, and more time the fewer >> volunteers there are. First let's hear some thoughts, and 'let it >> all hang out', >> as they said in my mom's era. >> >> cheers >> Mark >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sat Aug 22 10:55:06 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sat, 22 Aug 2009 16:55:06 +0200 Subject: [Bioperl-l] Getting genomic coordinates for a list of SNPs Message-ID: <2ac05d0f0908220755y59b029f2u82eede5b29836a1d@mail.gmail.com> Dear list, I'm searching for a script like this http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates to get the genomic position of a SNP, not a Gene. Does it exist? Thanks a lot Emanuele From cjfields at illinois.edu Sat Aug 22 16:17:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 22 Aug 2009 15:17:46 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> Message-ID: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Anand, You should always post emails to the bioperl-l mailing list, never to individual developers (you'll get an answer much faster). Keep responses on the list as well. Though I use bioperl-db some, I'm probably not the best person to ask. Does anyone know what's going on with this? Does this have to do with the Species/Taxon refactoring? chris Begin forwarded message: > From: "Anand C. Patel" > Date: August 22, 2009 2:57:42 PM CDT > To: cjfields at illinois.edu > Subject: problem with bioperl (where's the Mus?) > > Dr. Fields, > > I'm struggling with what seems to be a strange quirk in Bioperl +/- > Bioperl-db/BioSQL. > > I've successfully loaded in genbank sequences into a biosql database. > > When I try to write a genbank sequence back out, a curious thing > happens -- the Genus is missing from the SOURCE and ORGANISM areas. > > Despite reporting: > primary tag: source > tag: chromosome > value: 3 > > tag: db_xref > value: taxon:10090 > > tag: map > value: 3 74.5 cM > > tag: mol_type > value: mRNA > > tag: organism > value: Mus musculus > The sequence when printed out via SeqIO looks like this: > LOCUS NM_017474 2935 bp dna linear ROD > 13-AUG-2009 > DEFINITION Mus musculus chloride channel calcium activated 3 > (Clca3), mRNA. > ACCESSION NM_017474 XM_978159 > VERSION NM_017474.2 GI:255918210 > KEYWORDS . > SOURCE musculus > ORGANISM musculus > Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; > Bilateria; > Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; > Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; > Tetrapoda; > Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; > Glires; > Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. > Confession -- I have a final project due Monday wherein I boldly > elected to interface Bioperl, MySQL, Perl, and CGI. > (I'm an MD getting my MS in Bioinformatics.) > After many misadventures, I'm getting to the point where I could > actually complete the objectives, but this is bug is rather > problematic. > Thanks, > Anand > Anand C. Patel, MD > Assistant Professor of Pediatrics > Division of Allergy/Pulmonary Medicine > Department of Pediatrics > Washington University School of Medicine > 660 South Euclid Ave, Campus Box 8052 > St. Louis, MO 63110 > acpatel at wustl.edu > acpatel at gmail.com > acpatel at jhu.edu > From hlapp at gmx.net Sat Aug 22 17:36:42 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:36:42 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Message-ID: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> That's a pretty strange bug. Anand, which version of BioPerl and Bioperl-db are you running? Note that the genus *is* actually there in the lineage (and hence does get retrieved from the database). Apparently the Species object fails to pull it out correctly, though? Anand - I suspect there have been some warnings printed to the terminal - can you post these, and otherwise confirm that there haven't been any? -hilmar On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > Anand, > > You should always post emails to the bioperl-l mailing list, never > to individual developers (you'll get an answer much faster). Keep > responses on the list as well. > > Though I use bioperl-db some, I'm probably not the best person to > ask. Does anyone know what's going on with this? Does this have to > do with the Species/Taxon refactoring? > > chris > > Begin forwarded message: > >> From: "Anand C. Patel" >> Date: August 22, 2009 2:57:42 PM CDT >> To: cjfields at illinois.edu >> Subject: problem with bioperl (where's the Mus?) >> >> Dr. Fields, >> >> I'm struggling with what seems to be a strange quirk in Bioperl +/- >> Bioperl-db/BioSQL. >> >> I've successfully loaded in genbank sequences into a biosql database. >> >> When I try to write a genbank sequence back out, a curious thing >> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >> >> Despite reporting: >> primary tag: source >> tag: chromosome >> value: 3 >> >> tag: db_xref >> value: taxon:10090 >> >> tag: map >> value: 3 74.5 cM >> >> tag: mol_type >> value: mRNA >> >> tag: organism >> value: Mus musculus >> The sequence when printed out via SeqIO looks like this: >> LOCUS NM_017474 2935 bp dna linear ROD >> 13-AUG-2009 >> DEFINITION Mus musculus chloride channel calcium activated 3 >> (Clca3), mRNA. >> ACCESSION NM_017474 XM_978159 >> VERSION NM_017474.2 GI:255918210 >> KEYWORDS . >> SOURCE musculus >> ORGANISM musculus >> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >> Bilateria; >> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >> Tetrapoda; >> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >> Glires; >> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >> Confession -- I have a final project due Monday wherein I boldly >> elected to interface Bioperl, MySQL, Perl, and CGI. >> (I'm an MD getting my MS in Bioinformatics.) >> After many misadventures, I'm getting to the point where I could >> actually complete the objectives, but this is bug is rather >> problematic. >> Thanks, >> Anand >> Anand C. Patel, MD >> Assistant Professor of Pediatrics >> Division of Allergy/Pulmonary Medicine >> Department of Pediatrics >> Washington University School of Medicine >> 660 South Euclid Ave, Campus Box 8052 >> St. Louis, MO 63110 >> acpatel at wustl.edu >> acpatel at gmail.com >> acpatel at jhu.edu >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 17:42:32 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:42:32 -0400 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: Consistent coding style is in principle a good thing. It's also worth to keep in mind one of the old BioPerl principles - don't change working code purely to change style. In my interpretation of the rule, however, this has always applied to code writing style, and not code formatting style. I'm assuming the goal here is only to make the formatting consistent. -hilmar On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > Cheers Rob, > > Whatever objectons may arise from style x or style y, I think it's a > great idea to at least have one style or another recognized as being > 'standard'. I know TMTOWTDI, but on a project like this, with so many > contributors and users, it's essential to at least have a > recommendation. I'll try to use this on any contribs. > > As you pointed out [1], its probably best to provide two patches for > any change involving a formating clean up: one to change the fomat to > the standard and one to commit the actual code changes. > > > All the best, > Dan. > > [1] irc://irc.freenode.net/#bioperl > > > 2009/8/21 Robert Buels : >> This one is copied from the parrot project. I added it in >> maintenance/perltidy.conf. >> Have a look, tweak as you see fit. >> >> The idea with perltidy profile files is to use them to enforce >> coding style >> rules. So this perltidy profile file would be the place to codify >> the >> BioPerl coding standards, such as indentation, use of cuddled >> elses, etc. >> >> So here is one, let's customize it for our needs. The way I >> usually run >> perltidy is with -b to modify a file in-place, and with the '-pro=' >> option >> to specify a profile file. >> >> Example: >> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 19:21:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 19:21:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > [...] > I think I know what's broken. Using load_seqdatabases.pl, I'd put a > set of sequences from genbank into a biosql db in mysql. > > I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl > script from biosql. Did you load the NCBI taxonomy first, or afterwards? > > When I searched for house (as in house mouse), I found that the name > of the type of taxon class was "genbank common name". > > When I searched for musculus, it does appear as a type of > "scientific name". It is the 'scientific name' class names that Bioperl-db will onto the lineage array. > [...] > I'm not just getting warnings. I'm getting errors. Tons of them. > It's a wonder it's working at all. I'm not sure what you're referring to, but what you pasted into your email were neither errors nor warnings but a debugging log (and what it prints looks like it's working fine). You triggered that by setting -verbose to a value greater than 0. If you don't want debugging output, then you can just leave off that argument (no debugging output is the default). > > I started with the getentry.cgi script in the cgi-bin folder, and > stripped most of it away. I see - which reminds me that I need to look at that script; I'm afraid it hasn't been updated for a long time (that doesn't mean though that it can't work - the core API has been stable for years). > > Code: > #!/usr/bin/perl > > [...] > if( $@ || !defined $seq) { > print "Got fetch exception of...\n
$@\n
"; > exit(0); > } Wouldn't you want to put that right after the eval() clause? -hilmar > > >> >> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >> >>> Anand, >>> >>> You should always post emails to the bioperl-l mailing list, never >>> to individual developers (you'll get an answer much faster). Keep >>> responses on the list as well. >>> >>> Though I use bioperl-db some, I'm probably not the best person to >>> ask. Does anyone know what's going on with this? Does this have >>> to do with the Species/Taxon refactoring? >>> >>> chris >>> >>> Begin forwarded message: >>> >>>> From: "Anand C. Patel" >>>> Date: August 22, 2009 2:57:42 PM CDT >>>> To: cjfields at illinois.edu >>>> Subject: problem with bioperl (where's the Mus?) >>>> >>>> Dr. Fields, >>>> >>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>> +/- Bioperl-db/BioSQL. >>>> >>>> I've successfully loaded in genbank sequences into a biosql >>>> database. >>>> >>>> When I try to write a genbank sequence back out, a curious thing >>>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>>> >>>> Despite reporting: >>>> primary tag: source >>>> tag: chromosome >>>> value: 3 >>>> >>>> tag: db_xref >>>> value: taxon:10090 >>>> >>>> tag: map >>>> value: 3 74.5 cM >>>> >>>> tag: mol_type >>>> value: mRNA >>>> >>>> tag: organism >>>> value: Mus musculus >>>> The sequence when printed out via SeqIO looks like this: >>>> LOCUS NM_017474 2935 bp dna linear >>>> ROD 13-AUG-2009 >>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>> (Clca3), mRNA. >>>> ACCESSION NM_017474 XM_978159 >>>> VERSION NM_017474.2 GI:255918210 >>>> KEYWORDS . >>>> SOURCE musculus >>>> ORGANISM musculus >>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>> Bilateria; >>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>> Tetrapoda; >>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>> Glires; >>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>> Confession -- I have a final project due Monday wherein I boldly >>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>> (I'm an MD getting my MS in Bioinformatics.) >>>> After many misadventures, I'm getting to the point where I could >>>> actually complete the objectives, but this is bug is rather >>>> problematic. >>>> Thanks, >>>> Anand >>>> Anand C. Patel, MD >>>> Assistant Professor of Pediatrics >>>> Division of Allergy/Pulmonary Medicine >>>> Department of Pediatrics >>>> Washington University School of Medicine >>>> 660 South Euclid Ave, Campus Box 8052 >>>> St. Louis, MO 63110 >>>> acpatel at wustl.edu >>>> acpatel at gmail.com >>>> acpatel at jhu.edu >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 10:38:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:38:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: On Aug 22, 2009, at 9:13 PM, Anand C. Patel wrote: > Turns out that using the default namespace bioperl doesn't change > anything. No it shouldn't, so long as you are consistent about it. (And if you're not, all that should happen is that you don't find your sequences any more.) > > Common name -- still "genbank common name" in name_class in the > taxon_name table for "house mouse", which I think the module is > looking for as "common name". If you are loading the NCBI taxonomy first, this is coming from NCBI, not one of the scripts or BioPerl, and hence we have no control over it. Are you saying that there is no designated name of class 'common name' for Mus musculus in the NCBI taxonomy dump? Also, the common name being present or not should have no bearing on the lineage array, where the actual problem is, so I don't understand right now how this would be connected to the problem you are seeing. > > It's not behaving differently despite reloading the sequences. > > I've created a horrible munge that fixes it for cosmetic purposes: > my $species = $seq->species; > my $justspecies = $species->scientific_name(); > my $binspecies = $species->binomial(); > > my $gbstring2 = $gbstring; > > $gbstring2 =~ s/$binspecies/$justspecies/g; > $gbstring2 =~ s/$justspecies/$binspecies/g; I don't understand what you are trying to achieve here - it seems like you are making a substitution and then reverting it? Also, $species- >scientific_name() and $species->binomial() should be identical for Mus musculus - are you finding different values being returned? So in essence, I wouldn't expect your above code snippet to have any effect, for both of these reasons. How do you find $gbstring2 to be different from $gbstring at the end of this block of code? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 10:42:58 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:42:58 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Message-ID: <119BC08A-6D3A-4D03-B0D5-7619EDE682AE@gmx.net> On Aug 22, 2009, at 8:13 PM, Anand C. Patel wrote: > Do I need to load ontology before loading sequences? You don't. Especially if you load genbank sequences as they come. Loading ontologies that are used for sequence annotation is useful as it will get your features (or sequences) linked to fully populated (description, synonyms, relationships, etc) terms rather than skeleton term records created on the fly. However, in GenBank format ontology terms are part of the feature table, and require a post-processing (using, e.g., a SeqProcessor class) step to be identified and turned into Bio::Annotation::OntologyTerm objects. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jorismeys at gmail.com Sun Aug 23 11:08:47 2009 From: jorismeys at gmail.com (joris meys) Date: Sun, 23 Aug 2009 17:08:47 +0200 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree Message-ID: Hi, I'm currently exploring the phylogenetic parts of Bio Perl, but I can't seem to find a quick solution to following problem : Say you have a tree obtained by a certain method. From this tree, you want to have the evolutionary distances between species, defined as the sum of the branch lengths between any 2 species. There is as far as I know no function for doing that. But is there a possibility to get a list of some sort of "shortest paths" from one species to another, allowing to easily calculate that matrix? >From the phylip package, I get following data if I run the neighbor or fitch program. From there I can easily get an algorithm to calculate the distances I need. But I also need to do that for maximum likelihood trees and the like. Is there a way to get this information in Bio Perl? >From to dist node1 sp1 xxxxx node2 sp3 xxxxxx node1 node2 xxxxx node 1 sp2 xxxxx Kind regards Joris From heikki.lehvaslaiho at gmail.com Mon Aug 24 01:59:22 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 24 Aug 2009 08:59:22 +0300 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: De facto coding style standard for BioPerl has been emacs using cperl mode and bioperl.list file. As long as this configuration does not change the conventions used, I see this as great way in helping to format code from other editors. -Heikki 2009/8/23 Hilmar Lapp : > Consistent coding style is in principle a good thing. > > It's also worth to keep in mind one of the old BioPerl principles - don't > change working code purely to change style. In my interpretation of the > rule, however, this has always applied to code writing style, and not code > formatting style. I'm assuming the goal here is only to make the formatting > consistent. > > ? ? ? ?-hilmar > > On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > >> Cheers Rob, >> >> Whatever objectons may arise from style x or style y, I think it's a >> great idea to at least have one style or another recognized as being >> 'standard'. I know TMTOWTDI, but on a project like this, with so many >> contributors and users, it's essential to at least have a >> recommendation. I'll try to use this on any contribs. >> >> As you pointed out [1], its probably best to provide two patches for >> any change involving a formating clean up: one to change the fomat to >> the standard and one to commit the actual code changes. >> >> >> All the best, >> Dan. >> >> [1] irc://irc.freenode.net/#bioperl >> >> >> 2009/8/21 Robert Buels : >>> >>> This one is copied from the parrot project. ?I added it in >>> maintenance/perltidy.conf. >>> Have a look, tweak as you see fit. >>> >>> The idea with perltidy profile files is to use them to enforce coding >>> style >>> rules. ?So this perltidy profile file would be the place to codify the >>> BioPerl coding standards, such as indentation, use of cuddled elses, etc. >>> >>> So here is one, let's customize it for our needs. ?The way I usually run >>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>> option >>> to specify a profile file. >>> >>> Example: >>> ?perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>> >>> Rob >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY ?14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp ?-:- ?Durham, NC ?-:- ?hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Building #2, Office #4216 Computational Bioscience Research Centre (CBRC) 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From geoeco at rambler.ru Mon Aug 24 05:20:13 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Mon, 24 Aug 2009 13:20:13 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file Message-ID: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Dear all, I am trying to extract species taxonomy from ORGANISM line. In fact I only need a first line under ORGANISM tag (e.i. genus + species). I though that it would be possible to do with the SeqBuilder object by stating $builder->add_wanted_slot('display_id','species'); the problem is, however, that I've got an empty file as a result. What might be wrong with the script (see below)? Thanks a lot in advance for any ideas, ------------------------------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'raw'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species'); while(my $seq = $seq_in->next_seq()) { $seq_out->write_seq($seq); } exit; ---------------------------------------------------- Anna From maj at fortinbras.us Mon Aug 24 07:30:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 07:30:27 -0400 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree In-Reply-To: References: Message-ID: Hi Joris, AFAIK, there is only one path between any two nodes in a typical phylogenetic tree, the one passing through the most recent common ancestor of the nodes. The distance() method in Bio::Tree::TreeFunctionsI will give you what I think you want: use Bio::TreeIO; use Bio::Tree::TreeFunctionsI; $t = Bio::TreeIO->new(-file=>'t/data/urease.tre.nexus', -format=>'nexus')->next_tree; $n1 = $t->find_node('Anidulans'); $n2 = $t->find_node('Ncrassa'); $dist = $t->distance(-nodes => [$n1, $n2] ); print $dist; Use the Bio::TreeIO package to read in the tree in your favorite format; it will handle many. cheers, MAJ ----- Original Message ----- From: "joris meys" To: Sent: Sunday, August 23, 2009 11:08 AM Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree > Hi, > > I'm currently exploring the phylogenetic parts of Bio Perl, but I > can't seem to find a quick solution to following problem : > Say you have a tree obtained by a certain method. From this tree, you > want to have the evolutionary distances between species, defined as > the sum of the branch lengths between any 2 species. There is as far > as I know no function for doing that. But is there a possibility to > get a list of some sort of "shortest paths" from one species to > another, allowing to easily calculate that matrix? > >>From the phylip package, I get following data if I run the neighbor or > fitch program. From there I can easily get an algorithm to calculate > the distances I need. But I also need to do that for maximum > likelihood trees and the like. Is there a way to get this information > in Bio Perl? >>From to dist > node1 sp1 xxxxx > node2 sp3 xxxxxx > node1 node2 xxxxx > node 1 sp2 xxxxx > > Kind regards > Joris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.bolser at gmail.com Mon Aug 24 08:26:13 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:26:13 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <2c8757af0908240526j1cb0a455x53f7f3dccaceda86@mail.gmail.com> 2009/8/24 Heikki Lehvaslaiho : > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. 'bioperl.list' file? I guess you made a typo and you mean bioperl.lisp http://www.bioperl.org/wiki/Emacs_template > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - don't >> change working code purely to change style. In my interpretation of the >> rule, however, this has always applied to code writing style, and not code >> formatting style. I'm assuming the goal here is only to make the formatting >> consistent. I have changed coding style in the past. IIRC this was in the Quality.pm file. I made the changes because two different styles were being used to do (roughly) the same thing at different points in the script. The two styles were being used interchangeably (at random?). As a noob, the use of two different styles was very confusing, because I didn't know if the difference was significant or what the significance of the difference might be. I resolved the issue by writing a set of additional tests and then slowly harmonizing the coding style while confirming that the tests were still running OK. In this case I think it was reasonable to try to have a consistent style at least within the module. Or should I have left the style as it was? Cheers, Dan. From dan.bolser at gmail.com Mon Aug 24 08:50:46 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:50:46 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> Message-ID: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> I just ran into the same problem described here. Here is my code to demonstrate what I expected: #!/usr/bin/perl -w use strict; use Bio::SimpleAlign; use Bio::LocatableSeq; use Bio::AlignIO; my $CLUDGE = 0; ## REF tacattaaagacccg ## SEQ1 taca.taaa...... ## SEQ2 .....taaaga.ccg my $aln = Bio::SimpleAlign->new(); $aln->gap_char('.'); my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' ); my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' ); $aln->add_seq( $r ); $aln->add_seq( $s1 ); $aln->add_seq( $s2 ); if($CLUDGE){ foreach(($r, $s1, $s2)){ $_->seq( '.' x ($_->start - 1) . $_->seq ) } } ## Prepare an 'output stream' for the alignment: my $aliWriter = Bio::AlignIO-> new( -fh => \*STDOUT, -format => 'clustalw', ); warn "\nOUTPUT:\n"; $aliWriter->write_aln($aln); I was calling the "fill in the gaps yourself" step a CLUDGE because I had expected the alignment object to take care of this for me. Is there any reason that it couldn't do this 'CLUDGE' automatically? It seems strange that it insists on being passed locatable sequence objects, but then largely ignore the given location. Would it not be possible to have this happen when the sequences are written out from the alignment? I think it should still be possible to index the column number via the (gapless) sequence number... or did I get confused? There are two levels of confusion here (on my part), 1) the concepts behind the objects and 2) the implementation details. Thanks for any hints on how to understand or potentially how to fix these problems. Cheers, Dan. 2009/7/22 Mark A. Jensen : > Hi Paolo, > I think I see what you want to do, however, it doesn't quite work > this way. I'm supposing you want to specify something like > > s1/3-6 attc > s2/7-10 gaag > > and obtain output like > > s1 --attc---- > s2 ------gaag > > But (and this is why LocatableSeqs are "locatable"), the alignment described > by the former data is always going to be > > s1 attc > s2 gaag > > so that I can query the alignment *column* number 1 and obtain > the residue coordinates of the original sequences in that column: > > $loc = $aln->get_seq_by_pos(1)->location_from_column(1); # 3 > > or vice-versa > > $col = $aln->column_from_residue_number( 's1', 3); # 1 > > As far as I know, you have to fill in the gaps yourself; a good > exercise, since you already have all the information you need, in having set > up the start and end coordinates (which are really > the column coordinates in this model). > If this wasn't what you had in mind, I apologize. > cheers, Mark > > > ----- Original Message ----- From: "Paolo Pavan" > To: > Sent: Thursday, July 16, 2009 6:17 AM > Subject: [Bioperl-l] Bio::SimpleAlign constructor? > > >> Hi, >> I have a brief question: I would like to know if there is a method to >> obtain a valid formatted and flush Bio::SimpleAlign object (i.e. >> properly filled with gaps on the right and on the left side of each >> sequence) given a bounch of Bio::LocatableSeq objects in which I have >> specified the -start and -end properties. >> Can anyone help me? Thank you very much, >> >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghai.rohit at gmail.com Mon Aug 24 08:53:03 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Mon, 24 Aug 2009 14:53:03 +0200 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> hi I think you forgot to add the "seq" in the builder.. thats why the file is empty. Also, the species name, though being parsed, is nowhere in the output. Here's a version using fasta output that you can probably customize further. This also takes the full name of the organism and adds to the description line in the output. use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'fasta'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species','seq','description'); while(my $seq = $seq_in->next_seq()) { my $desc = $seq->description(); my $species_string = $seq->species()->binomial('FULL'); $desc = $desc . " [$species_string]"; $seq->description($desc); $seq_out->write_seq($seq); } exit; On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact I only > need a first line under ORGANISM tag (e.i. genus + species). I though that > it would be possible to do with the SeqBuilder object by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 24 08:55:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:55:56 -0500 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Anna, It's stored in the Bio::Species object. I have to say, though, I think you're using a stick of dynamite for a scalpel here; if you only need ORGANISM parse it out directly (it's much faster). Or am I missing something? chris On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 08:56:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:56:02 -0500 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <1E5347D2-A60F-49CB-8F3B-C5E06342417E@illinois.edu> Heikki, perltidy has become the most common way to standardize perl coding style (in a non-text-editor-dependent way). A number of projects have started using it as a means for checking and cleaning up modules prior to release. I think Perl Best Practices reinforced that. chris On Aug 24, 2009, at 12:59 AM, Heikki Lehvaslaiho wrote: > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. > > > -Heikki > > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - >> don't >> change working code purely to change style. In my interpretation of >> the >> rule, however, this has always applied to code writing style, and >> not code >> formatting style. I'm assuming the goal here is only to make the >> formatting >> consistent. >> >> -hilmar >> >> On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: >> >>> Cheers Rob, >>> >>> Whatever objectons may arise from style x or style y, I think it's a >>> great idea to at least have one style or another recognized as being >>> 'standard'. I know TMTOWTDI, but on a project like this, with so >>> many >>> contributors and users, it's essential to at least have a >>> recommendation. I'll try to use this on any contribs. >>> >>> As you pointed out [1], its probably best to provide two patches for >>> any change involving a formating clean up: one to change the fomat >>> to >>> the standard and one to commit the actual code changes. >>> >>> >>> All the best, >>> Dan. >>> >>> [1] irc://irc.freenode.net/#bioperl >>> >>> >>> 2009/8/21 Robert Buels : >>>> >>>> This one is copied from the parrot project. I added it in >>>> maintenance/perltidy.conf. >>>> Have a look, tweak as you see fit. >>>> >>>> The idea with perltidy profile files is to use them to enforce >>>> coding >>>> style >>>> rules. So this perltidy profile file would be the place to >>>> codify the >>>> BioPerl coding standards, such as indentation, use of cuddled >>>> elses, etc. >>>> >>>> So here is one, let's customize it for our needs. The way I >>>> usually run >>>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>>> option >>>> to specify a profile file. >>>> >>>> Example: >>>> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>>> >>>> Rob >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > Building #2, Office #4216 > Computational Bioscience Research Centre (CBRC) > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 09:36:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 08:36:32 -0500 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: Dan, all, Bio::SimpleAlign doesn't align anything for you. It makes no assumptions about the data being added, beyond possibly checking for the seqs to be flush prior to analyses. Here's the reason why: The object doesn't 'know' the seqs map across from one to the other as below: > ... > ## REF tacattaaagacccg > ## SEQ1 taca.taaa...... > ## SEQ2 .....taaaga.ccg > > my $aln = Bio::SimpleAlign->new(); > > $aln->gap_char('.'); > > my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); > my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, - > seq=>'taca.taaa' ); > my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, - > seq=>'taaaga.ccg' ); > > $aln->add_seq( $r ); > $aln->add_seq( $s1 ); > $aln->add_seq( $s2 ); Above, you are making the assumption that SimpleAlign 'knows' where to match the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does NOT indicate that (the LocatableSeq docs, and their usage, should indicate that). Think about HSP alignments in a BLAST report; the start/end/strand coordinates are where the sequence in the alignment maps to the original query or hit sequence. They don't indicate where the hit maps to the query (the alignment itself does that in a column-wise fashion). I'm not sure, maybe it needs to be more explicit in the documentation, but SimpleAlign does not align the sequences for you (and it shouldn't be expected to). There are much better (faster, more accurate) ways to do that. > if($CLUDGE){ > foreach(($r, $s1, $s2)){ > $_->seq( '.' x ($_->start - 1) . $_->seq ) > } > } > > ## Prepare an 'output stream' for the alignment: > my $aliWriter = Bio::AlignIO-> > new( -fh => \*STDOUT, > -format => 'clustalw', > ); > > warn "\nOUTPUT:\n"; > $aliWriter->write_aln($aln); ... > I was calling the "fill in the gaps yourself" step a CLUDGE because I > had expected the alignment object to take care of this for me. Is > there any reason that it couldn't do this 'CLUDGE' automatically? It > seems strange that it insists on being passed locatable sequence > objects, but then largely ignore the given location. > > Would it not be possible to have this happen when the sequences are > written out from the alignment? I think it should still be possible to > index the column number via the (gapless) sequence number... or did I > get confused? There are two levels of confusion here (on my part), 1) > the concepts behind the objects and 2) the implementation details. Mentioned above (no assumptions on how locatableseqs map to one another). WYSIWYG. There is nothing precluding you from writing up code to do that, though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl alignment implementation (there are, believe it or not, pure perl implementations of Smith- Waterman and Needleman-Wunsch. > Thanks for any hints on how to understand or potentially how to fix > these problems. > > Cheers, > Dan. Not that SimpleAlign and LocatableSeqs don't have their share of problems. However, I don't think you can expect this behavior to change with the refactors. chris From hlapp at gmx.net Mon Aug 24 09:44:43 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:44:43 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> Message-ID: On Aug 23, 2009, at 1:25 PM, Anand C. Patel wrote: > The other piece of potentially useful information is below -- output > from > SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; > (taxon_id 138 maps to ncbi_taxon_id 10090) > > taxon_id name name_class > 138 LK3 transgenic mice includes > 138 Mus muscaris misnomer > 138 Mus musculus scientific name > 138 Mus sp. 129SV includes > 138 house mouse genbank common name > 138 mice C57BL/6xCBA/CaJ hybrid misspelling > 138 mouse common name > 138 nude mice includes > 138 transgenic mice includes > > The source from the genbank entry NM_017474 is: > SOURCE Mus musculus (house mouse) > > Which is why I think the issue is that the name_class is "genbank > common name" rather than common name. Note that apparently NCBI has decided that the common name is 'mouse', not 'house mouse'. Why what they report in the genbank record is different from what they decided to be the common name is beyond me. Note also that the common name in parentheses is optional. If it's missing the record is still in valid format. > What does strike me as odd though is that not even "mouse" shows up > -- common_name is empty. Indeed, that's odd. Can you file this as a bug report and assign to the bioperl-db queue? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Aug 24 09:50:17 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:50:17 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> Message-ID: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > [...] > Code snippet: > my $species = $seq->species; > print "common name = ",$species->common_name, "\n"; > print "scientific name = ",$species->scientific_name, "\n"; > print "species = ",$species->species, "\n"; > print "genus = ",$species->genus, "\n"; > print "sub_species = ",$species->sub_species, "\n"; > print "binomial = ",$species->binomial, "\n"; > print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; > > Output: > common name = > scientific name = musculus > species = musculus > genus = Mus > sub_species = > binomial = Mus musculus > ncbi_taxid = 10090 This points to a problem in Bio::Species::scientific_name(), given that binomial() is correct. Could you file this as a bug report? > The common name is missing, despite having loaded it from NCBI > taxonomy using the provided script. > It is ONLY present as this "genbank common name". > [...] > I could go through and replace all of the instances of "genbank > common name" with "common name" and see if this fixes it. I think we need to first discuss how we want to treat the 'common name' versus 'genbank common name' classes in BioPerl. So question for everyone: do we need to have both available (in which case we need to add an accessor in Bio::Species), or only 'common name', or should 'genbank common name' override 'common name' if both are present and have different values. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Mon Aug 24 10:18:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Aug 2009 15:18:20 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> Message-ID: <320fb6e00908240718q194afe78j4a05b31aeb33e313@mail.gmail.com> On Mon, Jul 27, 2009 at 2:06 PM, Chris Fields wrote: > > I added this (and the others) to our ticket tracking this. ?Looks like > solexa conversion either way is borked, which is very likely an issue > with conversion. Hi Chris, I've been digging into the current SVN code for BioPerl's FASTQ support - I realised you are doing the Solexa to PHRED mapping twice when parsing "fastq-solexa" files. Using "qual" output (which shows the PHRED scores in plain text) makes it very clear something is wrong: $ cat solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<; That is Solexa scores from 40 (h) down to -5 (;), which should map onto PHRED scores from 40 down to 1 (according to our prior discussions). $ ./bioperl_solexa2qual.pl < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 10 9 8 7 6 6 5 5 5 5 4 4 4 4 For reference, $ python biopython_solexa2qual.py < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 9 8 7 6 5 5 4 4 3 3 2 2 1 1 I can "fix" this in fastq.pm by commenting out one of the log mappings, for example see the patch I've just uploaded to Bug 2857: http://bugzilla.open-bio.org/show_bug.cgi?id=2857 That brings me to another problem, consider the following (with the double conversion fixed): $ ./bioperl_solexa2solexa.pl < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJJHGFEDDBB@@>><< If you compare that to the original, you'll notice a loss of detail in the poor quality reads. e.g. Solexa scores 9 (I) and 10 (J) have both been mapped onto 10 (J). I believe this happens because BioPerl is converting the Solexa scores to PHRED scores on loading (which is fine - EMBOSS does this too), but you are also storing them as integers! In order to preserve these details, I think you'll have to hold the converted PHRED scores as floating point numbers (which I think is what EMBOSS does). This has the downside of taking more memory, and may also complicate file output (you may need to round things). Regards, Peter (@Biopython) From acpatel at gmail.com Sat Aug 22 18:44:20 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 17:44:20 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> Message-ID: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> On Aug 22, 2009, at 4:36 PM, Hilmar Lapp wrote: > That's a pretty strange bug. Anand, which version of BioPerl and > Bioperl-db are you running? BioPerl is: https://launchpad.net/ubuntu/karmic/+source/bioperl/1.6.0-2ubuntu1 (1.6.0 loaded via apt-get into ubuntu karmic alpha 4) BioPerl-db is version 1.006 (1.6.0) loaded via CPAN. BioSQL is 1.0.1 I think I know what's broken. Using load_seqdatabases.pl, I'd put a set of sequences from genbank into a biosql db in mysql. I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl script from biosql. When I searched for house (as in house mouse), I found that the name of the type of taxon class was "genbank common name". When I searched for musculus, it does appear as a type of "scientific name". > Note that the genus *is* actually there in the lineage (and hence > does get retrieved from the database). Apparently the Species object > fails to pull it out correctly, though? > > Anand - I suspect there have been some warnings printed to the > terminal - can you post these, and otherwise confirm that there > haven't been any? > > -hilmar I'm not just getting warnings. I'm getting errors. Tons of them. It's a wonder it's working at all. I started with the getentry.cgi script in the cgi-bin folder, and stripped most of it away. Code: #!/usr/bin/perl use DBI; use CGI::Carp qw( fatalsToBrowser ); use CGI qw/:standard/; use Bio::DB::BioDB; use Bio::Seq::RichSeq; use Bio::SeqIO; use IO::String; my $q = new CGI; # create new CGI object print $q->header; # create the HTTP header my $value = "NM_017474"; my $host = "localhost"; my $dbname = "biosql"; my $driver = "mysql"; my $dbuser = "webuser"; my $dbpass = "wrjFfjjW9y243xvF"; my $biodbname = "genbank"; my $seq; eval { my $db = Bio::DB::BioDB->new(-database => "biosql", -host => $host, -dbname => $dbname, -driver => $driver, -user => $dbuser, -pass => $dbpass, -verbose => 10, ); my $seqadaptor = $db->get_object_adaptor('Bio::SeqI'); $seq = Bio::Seq::RichSeq->new( -accession_number => $value, - namespace => $biodbname ); $seq = $seqadaptor->find_by_unique_key($seq); }; my $seqfh = IO::String->new($gbstring); my $ioseq = Bio::SeqIO->new(-fh => $seqfh, -format => 'genbank'); $ioseq->write_seq($seq); if( $@ || !defined $seq) { print "Got fetch exception of...\n
$@\n
"; exit(0); } print "BioSQL display of ". $seq->display_id ."\n"; print "\n"; print "
\n
".$gbstring."\n
\n
\n"; Errors (some but not all): test1.cgi: attempting to load adaptor class for Bio::SeqI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load adaptor class for BioNamespace test1.cgi: \tattempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? test1.cgi: BioNamespaceAdaptor: binding UK column 1 to "genbank" (namespace) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor test1.cgi: preparing UK select statement: SELECT bioentry.bioentry_id, bioentry.name, bioentry.identifier, bioentry.accession, bioentry.description, bioentry.version, bioentry.division, bioentry.biodatabase_id, bioentry.taxon_id FROM bioentry WHERE biodatabase_id = ? AND accession = ? test1.cgi: SeqAdaptor: binding UK column 1 to "1" (bionamespace) test1.cgi: SeqAdaptor: binding UK column 2 to "NM_017474" (accession_number) test1.cgi: attempting to load adaptor class for Bio::PrimarySeq test1.cgi: \tattempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: preparing PK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE biodatabase_id = ? test1.cgi: BioNamespaceAdaptor: binding PK column to "1" test1.cgi: attempting to load adaptor class for Bio::Species test1.cgi: \tattempting to load module Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: preparing PK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND taxon_name.name_class = 'scientific name' AND taxon.taxon_id = ? test1.cgi: SpeciesAdaptor: binding PK column to "138" test1.cgi: prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value >= node.left_value AND taxon.left_value <= node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value test1.cgi: preparing SELECT COMMON_NAME: SELECT taxon_name.name FROM taxon_name WHERE taxon_name.taxon_id = ? AND taxon_name.name_class = 'common_name' test1.cgi: attempting to load adaptor class for Bio::Tree::Tree test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::Root test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::RootI test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeFunctionsI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor test1.cgi: no adaptor found for class Bio::Tree::Tree test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy::list test1.cgi: \tattempting to load module Bio::DB::BioSQL::listAdaptor test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy test1.cgi: \tattempting to load module Bio::DB::BioSQL::TaxonomyAdaptor test1.cgi: no adaptor found for class Bio::DB::Taxonomy::list test1.cgi: attempting to load adaptor class for Biosequence test1.cgi: \tattempting to load module Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BiosequenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: preparing UK select statement: SELECT biosequence.bioentry_id, biosequence.version, biosequence.length, biosequence.alphabet, NULL, NULL, biosequence.bioentry_id FROM biosequence WHERE bioentry_id = ? test1.cgi: BiosequenceAdaptor: binding UK column 1 to "1" (primary_seq) test1.cgi: attempting to load adaptor class for Bio::AnnotationCollectionI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: attempting to load adaptor class for Bio::Annotation::TypeManager test1.cgi: \tattempting to load module Bio::DB::BioSQL::TypeManagerAdaptor test1.cgi: no adaptor found for class Bio::Annotation::TypeManager test1.cgi: attempting to load adaptor class for Bio::Annotation::Reference test1.cgi: \tattempting to load module Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.reference_id, t2.authors, t2.title, t2.location, t2.crc, bioentry_reference.start_pos, bioentry_reference.end_pos, bioentry_reference.rank, t2.dbxref_id FROM bioentry t1, reference t2, bioentry_reference WHERE t1.bioentry_id = bioentry_reference.bioentry_id AND t2.reference_id = bioentry_reference.reference_id AND t1.bioentry_id = ? test1.cgi: ReferenceAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::DBLink test1.cgi: \tattempting to load module Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: preparing PK select statement: SELECT dbxref.dbxref_id, dbxref.dbname, dbxref.accession, dbxref.version, NULL FROM dbxref WHERE dbxref_id = ? test1.cgi: DBLinkAdaptor: binding PK column to "1" test1.cgi: DBLinkAdaptor: binding PK column to "2" test1.cgi: DBLinkAdaptor: binding PK column to "3" test1.cgi: DBLinkAdaptor: binding PK column to "4" test1.cgi: DBLinkAdaptor: binding PK column to "5" test1.cgi: DBLinkAdaptor: binding PK column to "6" test1.cgi: DBLinkAdaptor: binding PK column to "7" test1.cgi: DBLinkAdaptor: binding PK column to "8" test1.cgi: DBLinkAdaptor: binding PK column to "9" test1.cgi: DBLinkAdaptor: binding PK column to "10" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, bioentry_dbxref.rank FROM bioentry t1, dbxref t2, bioentry_dbxref WHERE t1.bioentry_id = bioentry_dbxref.bioentry_id AND t2.dbxref_id = bioentry_dbxref.dbxref_id AND t1.bioentry_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::SimpleValue test1.cgi: \tattempting to load module Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::Ontology test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing UK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE name = ? test1.cgi: OntologyAdaptor: binding UK column 1 to "Annotation Tags" (name) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, bioentry_qualifier_value.value, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::OntologyTerm test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyTermAdaptor test1.cgi: attempting to load adaptor class for Bio::AnnotationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::TermI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::TermAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::Comment test1.cgi: \tattempting to load module Bio::DB::BioSQL::CommentAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::CommentAdaptor test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE t1.bioentry_id = ? test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::Seq::RichSeq: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::SeqFeatureI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: preparing query: SELECT t1.seqfeature_id, t1.display_name, t1.rank, t1.bioentry_id, t1.type_term_id, t1.source_term_id FROM seqfeature t1 WHERE t1.bioentry_id = ? ORDER BY t1.rank test1.cgi: Query FIND FEATURE BY SEQ: binding column 1 to "1" test1.cgi: preparing PK select statement: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL, term.ontology_id FROM term WHERE term_id = ? test1.cgi: TermAdaptor: binding PK column to "245" test1.cgi: attempting to load adaptor class for Bio::Ontology::OntologyI test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing PK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE ontology_id = ? test1.cgi: OntologyAdaptor: binding PK column to "32" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, term_dbxref.rank FROM term t1, dbxref t2, term_dbxref WHERE t1.term_id = term_dbxref.term_id AND t2.dbxref_id = term_dbxref.dbxref_id AND t1.term_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "245" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: preparing: SELECT synonym FROM term_synonym WHERE term_id = ? test1.cgi: SELECT SYNONYMS: executing with values (245) (FK to Bio::Ontology::Term) test1.cgi: TermAdaptor: binding PK column to "246" test1.cgi: OntologyAdaptor: binding PK column to "33" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "246" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (246) (FK to Bio::Ontology::Term) test1.cgi: attempting to load adaptor class for Bio::LocationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::LocationAdaptor test1.cgi: preparing query: SELECT t1.location_id, t1.start_pos, t1.end_pos, t1.strand, t1.rank, t1.seqfeature_id, t1.dbxref_id FROM location t1 WHERE t1.seqfeature_id = ? test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::DB::Persistent::PersistentObjectFactory test1.cgi: \tattempting to load module Bio::DB::BioSQL::PersistentObjectFactoryAdaptor test1.cgi: attempting to load adaptor class for Bio::Factory::ObjectFactoryI test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryAdaptor test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, seqfeature_dbxref.rank FROM seqfeature t1, dbxref t2, seqfeature_dbxref WHERE t1.seqfeature_id = seqfeature_dbxref.seqfeature_id AND t2.dbxref_id = seqfeature_dbxref.dbxref_id AND t1.seqfeature_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, seqfeature_qualifier_value.value, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE 1 = 1 test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::SeqFeature::Generic: binding column 1 to "1" test1.cgi: TermAdaptor: binding PK column to "260" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "260" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (260) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "2" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "250" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "250" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (250) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "3" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "264" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "264" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (264) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "4" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing SELECT statement: SELECT seq FROM biosequence WHERE bioentry_id = ? > > On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > >> Anand, >> >> You should always post emails to the bioperl-l mailing list, never >> to individual developers (you'll get an answer much faster). Keep >> responses on the list as well. >> >> Though I use bioperl-db some, I'm probably not the best person to >> ask. Does anyone know what's going on with this? Does this have >> to do with the Species/Taxon refactoring? >> >> chris >> >> Begin forwarded message: >> >>> From: "Anand C. Patel" >>> Date: August 22, 2009 2:57:42 PM CDT >>> To: cjfields at illinois.edu >>> Subject: problem with bioperl (where's the Mus?) >>> >>> Dr. Fields, >>> >>> I'm struggling with what seems to be a strange quirk in Bioperl >>> +/- Bioperl-db/BioSQL. >>> >>> I've successfully loaded in genbank sequences into a biosql >>> database. >>> >>> When I try to write a genbank sequence back out, a curious thing >>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>> >>> Despite reporting: >>> primary tag: source >>> tag: chromosome >>> value: 3 >>> >>> tag: db_xref >>> value: taxon:10090 >>> >>> tag: map >>> value: 3 74.5 cM >>> >>> tag: mol_type >>> value: mRNA >>> >>> tag: organism >>> value: Mus musculus >>> The sequence when printed out via SeqIO looks like this: >>> LOCUS NM_017474 2935 bp dna linear >>> ROD 13-AUG-2009 >>> DEFINITION Mus musculus chloride channel calcium activated 3 >>> (Clca3), mRNA. >>> ACCESSION NM_017474 XM_978159 >>> VERSION NM_017474.2 GI:255918210 >>> KEYWORDS . >>> SOURCE musculus >>> ORGANISM musculus >>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>> Bilateria; >>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>> Tetrapoda; >>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>> Glires; >>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>> Confession -- I have a final project due Monday wherein I boldly >>> elected to interface Bioperl, MySQL, Perl, and CGI. >>> (I'm an MD getting my MS in Bioinformatics.) >>> After many misadventures, I'm getting to the point where I could >>> actually complete the objectives, but this is bug is rather >>> problematic. >>> Thanks, >>> Anand >>> Anand C. Patel, MD >>> Assistant Professor of Pediatrics >>> Division of Allergy/Pulmonary Medicine >>> Department of Pediatrics >>> Washington University School of Medicine >>> 660 South Euclid Ave, Campus Box 8052 >>> St. Louis, MO 63110 >>> acpatel at wustl.edu >>> acpatel at gmail.com >>> acpatel at jhu.edu >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sat Aug 22 20:04:35 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:04:35 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? First -- before the sequences. In fact, I'm in the midst of reloading the taxonomy into a clean new database. I used namespace "genbank" instead of namespace "bioperl". Could that be the problem? >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). I did not know that! They were flagged "error", so I thought those might be the problem. >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > It works -- I just think I confused the system by not sticking with the default namespace? Thanks, Anand >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sat Aug 22 20:13:37 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:13:37 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Do I need to load ontology before loading sequences? (I promise I've been reading the documentation for days, and could not find a yea or nay on this) Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at usa.net Sat Aug 22 21:13:14 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sat, 22 Aug 2009 20:13:14 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Turns out that using the default namespace bioperl doesn't change anything. Common name -- still "genbank common name" in name_class in the taxon_name table for "house mouse", which I think the module is looking for as "common name". It's not behaving differently despite reloading the sequences. I've created a horrible munge that fixes it for cosmetic purposes: my $species = $seq->species; my $justspecies = $species->scientific_name(); my $binspecies = $species->binomial(); my $gbstring2 = $gbstring; $gbstring2 =~ s/$binspecies/$justspecies/g; $gbstring2 =~ s/$justspecies/$binspecies/g; But this does not strike me as a long term solution. Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From jkb at sanger.ac.uk Mon Aug 24 05:02:34 2009 From: jkb at sanger.ac.uk (James Bonfield) Date: Mon, 24 Aug 2009 10:02:34 +0100 Subject: [Bioperl-l] SCF installation Message-ID: <20090824090234.GB821@sanger.ac.uk> Lincoln Stein wrote: > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If you download and install Staden 1.12, you'll get a library > named libstaden-read rather than libread; Bio::SCF hasn't been updated > for the name change, and so you will have to open up the Makefile.PL > and change "-lread" to "-lstaden-read" in order for it to compile. This post was pointed out to me by one of the Debian maintainers. I'm mailing the list directly but am not a subscriber, so please keep me listed in any replies. The Staden Package home page recently underwent a revamp to use the RSS feeds, automatically updating it. Unfortunately within a couple weeks of doing that sourceforge managed to break the file release RSS and so the site has stopped updating. The News section is still working though, so I ought to add a news post about io_lib-1.12.1 and it'll at least appear somewhere on the home page. Regarding the library name change, this was requested by Debian and also already implemented by Fedora. I agree with it too as libread.so is a truely appalling name, so the new name is here to stay. There shouldn't be a great number of differences compared to the 1.11.x release set though, with the only incompatibility I can immediately think of being the change from int to size_t in the Array structs. James PS. There's been very few changes to SCF over the years so it's likely all working just fine. Most recent io_lib changes have been SRF support, and a few associated tweaks to ZTR necessitated by SRF. -- James Bonfield (jkb at sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From acpatel at usa.net Sun Aug 23 13:17:08 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:17:08 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> On Aug 23, 2009, at 9:38 AM, Hilmar Lapp wrote: >> Common name -- still "genbank common name" in name_class in the >> taxon_name table for "house mouse", which I think the module is >> looking for as "common name". > > If you are loading the NCBI taxonomy first, this is coming from > NCBI, not one of the scripts or BioPerl, and hence we have no > control over it. Are you saying that there is no designated name of > class 'common name' for Mus musculus in the NCBI taxonomy dump? > > Also, the common name being present or not should have no bearing on > the lineage array, where the actual problem is, so I don't > understand right now how this would be connected to the problem you > are seeing. > >> >> It's not behaving differently despite reloading the sequences. >> >> I've created a horrible munge that fixes it for cosmetic purposes: >> my $species = $seq->species; >> my $justspecies = $species->scientific_name(); >> my $binspecies = $species->binomial(); >> >> my $gbstring2 = $gbstring; >> >> $gbstring2 =~ s/$binspecies/$justspecies/g; >> $gbstring2 =~ s/$justspecies/$binspecies/g; > > I don't understand what you are trying to achieve here - it seems > like you are making a substitution and then reverting it? Also, > $species->scientific_name() and $species->binomial() should be > identical for Mus musculus - are you finding different values being > returned? > > So in essence, I wouldn't expect your above code snippet to have any > effect, for both of these reasons. How do you find $gbstring2 to be > different from $gbstring at the end of this block of code? > > -hilmar I should have been clearer. Code snippet: my $species = $seq->species; print "common name = ",$species->common_name, "\n"; print "scientific name = ",$species->scientific_name, "\n"; print "species = ",$species->species, "\n"; print "genus = ",$species->genus, "\n"; print "sub_species = ",$species->sub_species, "\n"; print "binomial = ",$species->binomial, "\n"; print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; Output: common name = scientific name = musculus species = musculus genus = Mus sub_species = binomial = Mus musculus ncbi_taxid = 10090 The common name is missing, despite having loaded it from NCBI taxonomy using the provided script. It is ONLY present as this "genbank common name". So, what I get in $gbstring is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE musculus ORGANISM musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. What I get in $gbstring2 is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE Mus musculus ORGANISM Mus musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. Not perfect -- common name is still missing, but better. I could go through and replace all of the instances of "genbank common name" with "common name" and see if this fixes it. Any other thoughts? Thanks, Anand > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From acpatel at usa.net Sun Aug 23 13:25:16 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:25:16 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> The other piece of potentially useful information is below -- output from SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; (taxon_id 138 maps to ncbi_taxon_id 10090) taxon_id name name_class 138 LK3 transgenic mice includes 138 Mus muscaris misnomer 138 Mus musculus scientific name 138 Mus sp. 129SV includes 138 house mouse genbank common name 138 mice C57BL/6xCBA/CaJ hybrid misspelling 138 mouse common name 138 nude mice includes 138 transgenic mice includes The source from the genbank entry NM_017474 is: SOURCE Mus musculus (house mouse) Which is why I think the issue is that the name_class is "genbank common name" rather than common name. What does strike me as odd though is that not even "mouse" shows up -- common_name is empty. Thanks again, Anand From maj at fortinbras.us Mon Aug 24 10:37:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 10:37:45 -0400 Subject: [Bioperl-l] The Documentation Project Message-ID: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Hi All, I'm starting this journey of 1000 mi (1620 km) with the following step: http://www.bioperl.org/wiki/The_Documentation_Project Please visit and comment. Thanks, Mark From hlapp at gmx.net Mon Aug 24 10:47:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 10:47:34 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Hi Anna, sequence formats all have some varying amount of information that must be present or otherwise the syntax is invalid. If what you need is a two-column table of display_id and species name, then I would simply write that, and not squeeze it into a standard sequence format. (Unless you actually do want the sequence too, in which case you need to add it as a wanted slot; even in that case though, writing a three- column table might serve you better.) -hilmar On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Mon Aug 24 12:50:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 11:50:05 -0500 Subject: [Bioperl-l] The Documentation Project In-Reply-To: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> References: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Message-ID: Mark, We should probably keep some of this discussion on the list, primarily as I've been running into conflicts with responses on the wiki page. It's more amenable to discussion. For anyone out there interested, you should speak up now, this is the best opportunity to do so (we're considering lack of input assent). I want to make a a few key points on behalf of the devs. It's impossible to consistently maintain two active copies of any documentation (wiki vs docs in the distribution). I have tried keeping up with this, helping with the 1.5.2 release, and full-on with the 1.6.0 release, and it's an extreme headache. From the maintenance point-of-view, this is what I would do: 1) Where possible always link to the official POD (either pdoc or CPAN) from the distribution. Make the API documentation link very prominent (I moved it to the docs section in the sidebar). Protect wiki module pages (in line with the 'one official copy' rule), allow writable discussion pages for additional, wiki-specific documentation (which can be added to the official docs as needed). 2) ...or, have a search bar specifically for the module documentation that links directly to the proper API/PDOC/CPAN page. Not sure how feasible that is, particularly since we plan on splitting things up a bit. 3) POD-ify any relevant documentation we intend on including in the wiki that also comes with the distribution (similar to Moose::Manual). I do not want to repeatedly edit a plain text INSTALL/ BUGS/DEPENDENCIES file to correspond with the wikified version for every release (nor vice versa). Long term: (this is my own personal style, YMMV) move all POD to the end of the file. Add a 'Status' tags to any method docs indicating implementation status (virtual, stable, unstable, public, private, etc). Move method POD to it's own section within the main documentation. Implement a coding style (as mentioned recently on list using perltidy, but also using proper method names). HOWTO's are also subject to API changes, but we haven't run into many issues with those yet, and they're wiki-specific. chris On Aug 24, 2009, at 9:37 AM, Mark A. Jensen wrote: > Hi All, > I'm starting this journey of 1000 mi (1620 km) with the following > step: > http://www.bioperl.org/wiki/The_Documentation_Project > Please visit and comment. > Thanks, > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 13:37:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 12:37:39 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: On Aug 24, 2009, at 12:16 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >>> >>> ... >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I think it's bioperl-db-related. You've previously pointed out the incongruity bioperl-db has with Bio::Species in a bug report (I indicated that in a separate post to this thread). >>> The common name is missing, despite having loaded it from NCBI >>> taxonomy using the provided script. >>> It is ONLY present as this "genbank common name". >>> [...] >>> I could go through and replace all of the instances of "genbank >>> common name" with "common name" and see if this fixes it. >> I think we need to first discuss how we want to treat the 'common >> name' versus 'genbank common name' classes in BioPerl. >> So question for everyone: do we need to have both available (in >> which case we need to add an accessor in Bio::Species), or only >> 'common name', or should 'genbank common name' override 'common >> name' if both are present and have different values. > > Bio::Species (via Bio::Taxon) has the common_names() method, for > which common_name() is an alias that in scalar context returns the > first of possibly many common names, one of which may be the genbank > common name. > > See: > http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes Yes, but that method stored names in an array and removes the context, presumed or not. If there are two or more, which names correspond to common_name, which to genbank_common_name (and which should we prefer)? chris From bix at sendu.me.uk Mon Aug 24 13:16:13 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 18:16:13 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> Message-ID: <4A92CADD.10901@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > >> [...] >> Code snippet: >> my $species = $seq->species; >> print "common name = ",$species->common_name, "\n"; >> print "scientific name = ",$species->scientific_name, "\n"; >> print "species = ",$species->species, "\n"; >> print "genus = ",$species->genus, "\n"; >> print "sub_species = ",$species->sub_species, "\n"; >> print "binomial = ",$species->binomial, "\n"; >> print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; >> >> Output: >> common name = >> scientific name = musculus >> species = musculus >> genus = Mus >> sub_species = >> binomial = Mus musculus >> ncbi_taxid = 10090 > > This points to a problem in Bio::Species::scientific_name(), given that > binomial() is correct. Could you file this as a bug report? What code creates the Bio::Species object here? I suspect this code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> The common name is missing, despite having loaded it from NCBI >> taxonomy using the provided script. >> It is ONLY present as this "genbank common name". >> [...] >> I could go through and replace all of the instances of "genbank common >> name" with "common name" and see if this fixes it. > I think we need to first discuss how we want to treat the 'common name' > versus 'genbank common name' classes in BioPerl. > > So question for everyone: do we need to have both available (in which > case we need to add an accessor in Bio::Species), or only 'common name', > or should 'genbank common name' override 'common name' if both are > present and have different values. Bio::Species (via Bio::Taxon) has the common_names() method, for which common_name() is an alias that in scalar context returns the first of possibly many common names, one of which may be the genbank common name. See: http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes From hlapp at gmx.net Mon Aug 24 13:54:13 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 13:54:13 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I see. Any pointer to what would tell me what I need to change or is everything in the Bio::Species POD? BTW what the Bioperl-db code does is instantiate the blank object and then populate it through its accessors (mostly the classification() array). If what it has been doing in the past is now considered incorrect, at least it doesn't raise any warning that would alert one to that ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From robert.bradbury at gmail.com Mon Aug 24 14:38:08 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 24 Aug 2009 14:38:08 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: As a really "off-the-wall" suggestion, you might see if somehow the "name" being pulled is the SwissProt name rather than the species name. I run into this when I'm fetching FASTA sequences from SwissProt in that the sequence identifier names are non-standard for some of the early "standard" species, e.g. "HUMAN", # Homo sapiens "MOUSE", # Mus musculus "RAT", # Rattus norvegicus "BOVIN", # Bos taurus "HORSE", # Equus caballus "PIG", # Sus scrofa "RABIT", # Oryctolagus cuniculus "SHEEP", # Ovis aries "YEAST", # Saccharomyces cerevisiae (Baker's yeast) etc. Eventually they largely adopted the 3+2 letter species derived name, but the early "standard" names are anomalies. You might run a test on a newly sequenced species (Gorilla, Opossum, Armadillo, Dog, etc.) to see if you get a "standard" species name. Robert Bradbury From dan.bolser at gmail.com Mon Aug 24 15:13:26 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 20:13:26 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > From bix at sendu.me.uk Mon Aug 24 15:12:05 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 20:12:05 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: <4A92E605.5090706@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: > >>> This points to a problem in Bio::Species::scientific_name(), given >>> that binomial() is correct. Could you file this as a bug report? >> >> What code creates the Bio::Species object here? I suspect this code >> isn't aware of changes in Bio::Species since BioPerl 1.5.2. > > I see. Any pointer to what would tell me what I need to change or is > everything in the Bio::Species POD? ... I won't guarantee the perfection of the POD ;) > BTW what the Bioperl-db code does is instantiate the blank object and > then populate it through its accessors (mostly the classification() > array). If what it has been doing in the past is now considered > incorrect, at least it doesn't raise any warning that would alert one to > that ... Yuh... If you point out the code that creates the Bio::Species I can look into it for you and suggest what needs changing and why it doesn't work (or if it's a bug in Bio::Species). I can't remember things clearly right now, though classification() I guess was supposed to be backwards compatible. From cjfields at illinois.edu Mon Aug 24 15:52:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 14:52:56 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92E605.5090706@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> Message-ID: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>> This points to a problem in Bio::Species::scientific_name(), >>>> given that binomial() is correct. Could you file this as a bug >>>> report? >>> >>> What code creates the Bio::Species object here? I suspect this >>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> I see. Any pointer to what would tell me what I need to change or >> is everything in the Bio::Species POD? > > ... I won't guarantee the perfection of the POD ;) > > >> BTW what the Bioperl-db code does is instantiate the blank object >> and then populate it through its accessors (mostly the >> classification() array). If what it has been doing in the past is >> now considered incorrect, at least it doesn't raise any warning >> that would alert one to that ... > > Yuh... If you point out the code that creates the Bio::Species I can > look into it for you and suggest what needs changing and why it > doesn't work (or if it's a bug in Bio::Species). I can't remember > things clearly right now, though classification() I guess was > supposed to be backwards compatible. Sendu, I think it's related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 Bio::DB::BioSQL::SpeciesAdaptor and Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in question i think. chris From bix at sendu.me.uk Mon Aug 24 16:01:29 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 21:01:29 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> Message-ID: <4A92F199.2030900@sendu.me.uk> Chris Fields wrote: > > On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > >> Hilmar Lapp wrote: >>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>> This points to a problem in Bio::Species::scientific_name(), given >>>>> that binomial() is correct. Could you file this as a bug report? >>>> >>>> What code creates the Bio::Species object here? I suspect this code >>>> isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>> I see. Any pointer to what would tell me what I need to change or is >>> everything in the Bio::Species POD? >> >> ... I won't guarantee the perfection of the POD ;) >> >> >>> BTW what the Bioperl-db code does is instantiate the blank object and >>> then populate it through its accessors (mostly the classification() >>> array). If what it has been doing in the past is now considered >>> incorrect, at least it doesn't raise any warning that would alert one >>> to that ... >> >> Yuh... If you point out the code that creates the Bio::Species I can >> look into it for you and suggest what needs changing and why it >> doesn't work (or if it's a bug in Bio::Species). I can't remember >> things clearly right now, though classification() I guess was supposed >> to be backwards compatible. > > Sendu, I think it's related to this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 > > Bio::DB::BioSQL::SpeciesAdaptor and > Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in > question i think. Ah, yes, well there you go then. So it is a classification() issue. Judging by what I said in that bug, looks like the db code needs to be changed to put the full scientific name in the first element it passes to classification. From cjfields at illinois.edu Mon Aug 24 16:27:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 15:27:23 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92F199.2030900@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> Message-ID: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>> Hilmar Lapp wrote: >>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>> given that binomial() is correct. Could you file this as a bug >>>>>> report? >>>>> >>>>> What code creates the Bio::Species object here? I suspect this >>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>> I see. Any pointer to what would tell me what I need to change or >>>> is everything in the Bio::Species POD? >>> >>> ... I won't guarantee the perfection of the POD ;) >>> >>> >>>> BTW what the Bioperl-db code does is instantiate the blank object >>>> and then populate it through its accessors (mostly the >>>> classification() array). If what it has been doing in the past is >>>> now considered incorrect, at least it doesn't raise any warning >>>> that would alert one to that ... >>> >>> Yuh... If you point out the code that creates the Bio::Species I >>> can look into it for you and suggest what needs changing and why >>> it doesn't work (or if it's a bug in Bio::Species). I can't >>> remember things clearly right now, though classification() I guess >>> was supposed to be backwards compatible. >> Sendu, I think it's related to this: >> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >> Bio::DB::BioSQL::SpeciesAdaptor and >> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in >> question i think. > > Ah, yes, well there you go then. So it is a classification() issue. > Judging by what I said in that bug, looks like the db code needs to > be changed to put the full scientific name in the first element it > passes to classification. Yup. I believe the only blocking issue with implementing it was potential backwards-compat problems with databases loaded using old behavior and then being updated post-1.5.2 (new behavior). I would think this only affects sequence data loaded w/o taxonomy preloaded, but I'm not sure. I suggest, if you can fix it, go ahead make the necessary change. We can then post a big warning to BioSQL and here about the problem, something along the lines of 'bioperl-db in svn may be backwards incompatible with species information loaded in previous versions; it may eat your first born' or similar. It's an absolutely necessary fix, and may effectively kill a bunch of other db/species-related bugs. chris From Kevin.M.Brown at asu.edu Mon Aug 24 17:48:35 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 24 Aug 2009 14:48:35 -0700 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com><990CEF10B1AD4BD5BE9977FD62DB3437@NewLife><2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B4062D2655@EX02.asurite.ad.asu.edu> You can use Bio::SimpleAlign for those tasks, but you, the programmer, have to remember that you didn't front pad the sequence and so can't utilize certain functions blindly. I've used SimpleAlign with LocatableSeq objects and wrote a few custom methods that did things like creating slices from the simplealign for each locatableseq. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Bolser Sent: Monday, August 24, 2009 12:13 PM To: Chris Fields Cc: bioperl-l at lists.open-bio.org; Mark A. Jensen; Paolo Pavan Subject: Re: [Bioperl-l] Bio::SimpleAlign constructor? Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Mon Aug 24 20:12:18 2009 From: hartzell at alerce.com (George Hartzell) Date: Mon, 24 Aug 2009 17:12:18 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl Message-ID: <19091.11362.190209.844074@already.dhcp.gene.com> There's a warning at Ensembl about the perl api code depending on an old version of bioperl (1.2.3) http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html Does anyone have current information about that dependency? My quick-n-dirty tests suggest that one can't build an app that uses both new Bioperl and the ensembl api without ensembl picking up the newer bioperl libraries (or your app getting the older ones). It's not clear what parts of the ensembl world depend on the older BioPerl. Anyone have any recipes to make it work? Any info on a possible modernization of the ensembl code? Thanks, g. From cjfields at illinois.edu Mon Aug 24 22:29:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 21:29:38 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <19091.11362.190209.844074@already.dhcp.gene.com> References: <19091.11362.190209.844074@already.dhcp.gene.com> Message-ID: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > > There's a warning at Ensembl about the perl api code depending on an > old version of bioperl (1.2.3) > > http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html > > Does anyone have current information about that dependency? > > My quick-n-dirty tests suggest that one can't build an app that uses > both new Bioperl and the ensembl api without ensembl picking up the > newer bioperl libraries (or your app getting the older ones). It's > not clear what parts of the ensembl world depend on the older BioPerl. I've asked this question several times of the ensembl folk w/o an adequate response. My general feeling is even they may not really know for sure (though I recall ewan saying something about feature/ annotation changes around then, and maybe something about the blastreporter). Saying that, the ensembl perl API worked for me using bioperl-live (and bioperl 1.6) as of a couple months ago. You might eventually run into some issues; if so report them back here and to the ensembl list. > Anyone have any recipes to make it work? > > Any info on a possible modernization of the ensembl code? That is completely up to the ensembl folks. bioperl 1.2.3 is full enough of bugs, and I don't plan on backporting any changes to that branch (seems kind of silly, as that branch is now about six yrs old). > Thanks, > > g. np! -chris From hlapp at gmx.net Mon Aug 24 23:17:29 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 23:17:29 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> [...] >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). The code change is for retrieving data, right? So I'm not sure how it would break backwards compatibility, unless one has taxon entries created before the change (i.e., about 3 years ago?) and through loading sequences rather than through loading the NCBI taxonomy. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 25 00:10:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:10:15 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Right, that's what I thought as well, but I just wasn't clear on that. So, basically we're saying, as long as the code change is on the retrieving side, everything's okay? Then I'm pretty sure I know how to fix it, at least partly. I can probably squeeze that in unless Sendu's working on it. Sendu? chris From cjfields at illinois.edu Tue Aug 25 00:28:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:28:26 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Okay, if possible I would like you or Sendu to review that last commit I made to bioperl-db. It includes Sendu's patch; I commented out sections that were modifying the genus/species when loaded in, but there are a few TODO's I noted as well (everything is in populate_from_row()). 02species.t is now failing but I think it's based on the same old behavior; I'll look into it. chris From geoeco at rambler.ru Tue Aug 25 03:01:24 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:01:24 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> Message-ID: <1074529971.1251183684.50392744.40754@mcgi70.rambler.ru> Hi Rohit, Thanks a lot for your comments, it actually worked well, but in fact i only want to extract species names as I want to have it in a separate file together with a fasta file with sequences. So, thanks a lot again! Anna * Rohit Ghai [Mon, 24 Aug 2009 14:53:03 +0200]: > hi > > I think you forgot to add the "seq" in the builder.. thats why the file > is > empty. > Also, the species name, though being parsed, is nowhere in the output. > Here's a version > using fasta output that you can probably customize further. This also > takes > the full > name of the organism and adds to the description line in the output. > > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'fasta'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species','seq','description'); > > while(my $seq = $seq_in->next_seq()) { > > my $desc = $seq->description(); > my $species_string = $seq->species()->binomial('FULL'); > $desc = $desc . " [$species_string]"; > $seq->description($desc); > $seq_out->write_seq($seq); > } > > exit; > > > On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova > wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact I > only > > need a first line under ORGANISM tag (e.i. genus + species). I though > that > > it would be possible to do with the SeqBuilder object by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From geoeco at rambler.ru Tue Aug 25 03:03:56 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:03:56 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Message-ID: <734135890.1251183836.48962856.71827@mcgi59.rambler.ru> hello Chris, Well, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? Thanks, Anna * Chris Fields [Mon, 24 Aug 2009 07:55:56 -0500]: > Anna, > > It's stored in the Bio::Species object. I have to say, though, I > think you're using a stick of dynamite for a scalpel here; if you only > need ORGANISM parse it out directly (it's much faster). Or am I > missing something? > > chris > > On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From geoeco at rambler.ru Tue Aug 25 03:09:43 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:09:43 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Message-ID: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> hello Hilmar, Thanks for your comments. Actually, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? the reason, why I don't want a simple parsing for species names is that i also want to be able to which gene has been sequenced while (my $inseq = $seq_in->next_seq) { if ($inseq->desc =~ m/5\.8S ribosomal RNA/) { $seq_out->write_seq($inseq); } } and only it is 5.8s rRNA I want to extract the species name and a sequences. And I thought that with direct parsing it would be much longer code. Am I wrong? i am a newbie both in bioperl and bioinformatics, so all comments would be appreciated:) Anna * Hilmar Lapp [Mon, 24 Aug 2009 10:47:34 -0400]: > Hi Anna, > > sequence formats all have some varying amount of information that must > be present or otherwise the syntax is invalid. If what you need is a > two-column table of display_id and species name, then I would simply > write that, and not squeeze it into a standard sequence format. > (Unless you actually do want the sequence too, in which case you need > to add it as a wanted slot; even in that case though, writing a three- > column table might serve you better.) > > -hilmar > > On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Aug 25 07:34:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:34:18 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> Message-ID: <4A8C2A89-C212-4969-8B01-3DA7D7DE7862@gmx.net> On Aug 25, 2009, at 12:28 AM, Chris Fields wrote: > Okay, if possible I would like you or Sendu to review that last > commit I made to bioperl-db. Will do. > [...] > 02species.t is now failing but I think it's based on the same old > behavior; I'll look into it. I would expect that if the classification array is now different, so the test will need changing to expect the "new" behavior. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Aug 25 07:52:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:52:11 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> Message-ID: <3B23691B-B165-4CC3-889E-04DE45AB1627@gmx.net> Hi Anna: On Aug 25, 2009, at 3:09 AM, Anna Kostikova wrote: > Actually, my final aim is to get 2 files: first one is a fasta file > with all the sequences, and the seconds one is simply a list of > species names Then I'd change your script to write two files: one with the sequences in FASTA format (you can use Bio::SeqIO for that), and the second one in the format you need it (one species name per line?). (Right now you are writing one file in Genbank format, which is quite unlike the above, right?) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From whs at ebi.ac.uk Tue Aug 25 07:04:23 2009 From: whs at ebi.ac.uk (William Spooner) Date: Tue, 25 Aug 2009 12:04:23 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> Message-ID: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> On 25 Aug 2009, at 03:29, Chris Fields wrote: > On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > >> >> There's a warning at Ensembl about the perl api code depending on an >> old version of bioperl (1.2.3) >> >> http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html >> >> Does anyone have current information about that dependency? >> >> My quick-n-dirty tests suggest that one can't build an app that uses >> both new Bioperl and the ensembl api without ensembl picking up the >> newer bioperl libraries (or your app getting the older ones). It's >> not clear what parts of the ensembl world depend on the older >> BioPerl. > > I've asked this question several times of the ensembl folk w/o an > adequate response. My general feeling is even they may not really > know for sure (though I recall ewan saying something about feature/ > annotation changes around then, and maybe something about the > blastreporter). > > Saying that, the ensembl perl API worked for me using bioperl-live > (and bioperl 1.6) as of a couple months ago. You might eventually > run into some issues; if so report them back here and to the ensembl > list. I'm not sure of the full list of dependencies, but my feeling is that most are related to the Ensembl application/web code; the blast interface in particular. I can support Chris's findings that the API works (AFAIK) with bioperl-live, but this is obviously untested. > >> Anyone have any recipes to make it work? >> >> Any info on a possible modernization of the ensembl code? > > That is completely up to the ensembl folks. bioperl 1.2.3 is full > enough of bugs, and I don't plan on backporting any changes to that > branch (seems kind of silly, as that branch is now about six yrs old). It would be nice if someone at Ensembl could compile a list of BioPerl dependencies. At least that would give a feel for the scope of the problem... Will From ak at ebi.ac.uk Tue Aug 25 09:43:19 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 25 Aug 2009 14:43:19 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <20090825134319.GE12422@qux.windows.ebi.ac.uk> [cut] > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... > > Will Hi Will, and list, These are the BioPerl modules that the Ensembl Core API "use" or otherwise directly call (scanned our current HEAD code): Bio::Annotation::DBLink in Bio::EnsEMBL::DBEntry Bio::Tools::CodonTable in Bio::EnsEMBL::Utils::TranscriptAlleles in Bio::EnsEMBL::PredictionTranscript in Bio::EnsEMBL::Transcript.pm Bio::LocatableSeq in Bio::EnsEMBL::DnaDnaAlignFeature Bio::PrimarySeqI in Bio::EnsEMBL::Slice Bio::Root::IO in Bio::EnsEMBL::Utils::Converter Bio::Root::Root in Bio::EnsEMBL::Utils::EasyArgv Bio::Seq in Bio::EnsEMBL::Utils::PolyA in Bio::EnsEMBL::Intron in Bio::EnsEMBL::Exon in Bio::EnsEMBL::Transcript in Bio::EnsEMBL::Translation in Bio::EnsEMBL::Utils::TranscriptAlleles Bio::SeqFeature::FeaturePair in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeature::Generic in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeatureI in Bio::EnsEMBL::SeqFeatureI Bio::SimpleAlign in Bio::EnsEMBL::DnaDnaAlignFeature Bio::Species in Bio::EnsEMBL::DBSQL::MetaContainer I have not looked at the other Ensembl APIs (Variation, FuncGen, Compara, Web, Pipeline, etc.), and I might possibly have missed references to some BioPerl modules. I have also not indicated the relative importance of any of these modules (clearly Bio::Seq is central, but I don't know how widely the code that accesses Bio::SeqFeature::Generic is used) or investigated if any of the references to BioPerl modules occur in deprecated code. As far as I know, there are currently no plans to get rid of these dependencies. Or there might be, only they are not very far up the priority list right now. I would be happy to look at conservative patches, but can not promise snappy response times. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer -{ }- European Bioinformatics Institute (EMBL-EBI) -{ }- Wellcome Trust Genome Campus, Hinxton -{ }- Cambridge CB10 1SD, United Kingdom -{ }- From cjfields at illinois.edu Tue Aug 25 10:07:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 Aug 2009 09:07:52 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <20090825134319.GE12422@qux.windows.ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <20090825134319.GE12422@qux.windows.ebi.ac.uk> Message-ID: <9D26C8FA-6D74-42C2-A2BD-4EFF529DA05A@illinois.edu> Andreas, Thanks for the response, been waiting for something a bit more official for a while now. We can definitely help you patch these as needed when problems arise, just let us know, or file a bug report listing issues. Scanning through there will be a could of future trouble spots: 1) We are very likely deprecating Bio::Species in favor of Bio::Taxon (that may be relatively easy to map, as Bio::Species now delegates to Bio::Taxon and similar anyway). 2) We will be refactoring Bio::SimpleAlign/LocatableSeq. There are too many corner cases where assumptions are made. We'll try to stick with the current API, but there may be a few delegating methods. More significantly, we're also planning a significant restructuring of bioperl prior to 1.7, basically splitting it into several (more easily maintainable) parts. The exact nature of these is still a bit fuzzy (we have to sort out dependencies) but we do plan on making a bundle package to assemble a complete old-style 'monolithic' bioperl, just a bit more customizable. It's very likely the versioning scheme will stay the same for the core (root) set of modules, but the others may end up having their own versioning for monitoring dependencies. chris On Aug 25, 2009, at 8:43 AM, Andreas K?h?ri wrote: > [cut] >> >> It would be nice if someone at Ensembl could compile a list of >> BioPerl dependencies. At least that would give a feel for the scope >> of the problem... >> >> Will > > Hi Will, and list, > > These are the BioPerl modules that the Ensembl Core API "use" or > otherwise directly call (scanned our current HEAD code): > > Bio::Annotation::DBLink > in Bio::EnsEMBL::DBEntry > > Bio::Tools::CodonTable > in Bio::EnsEMBL::Utils::TranscriptAlleles > in Bio::EnsEMBL::PredictionTranscript > in Bio::EnsEMBL::Transcript.pm > > Bio::LocatableSeq > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::PrimarySeqI > in Bio::EnsEMBL::Slice > > Bio::Root::IO > in Bio::EnsEMBL::Utils::Converter > > Bio::Root::Root > in Bio::EnsEMBL::Utils::EasyArgv > > Bio::Seq > in Bio::EnsEMBL::Utils::PolyA > in Bio::EnsEMBL::Intron > in Bio::EnsEMBL::Exon > in Bio::EnsEMBL::Transcript > in Bio::EnsEMBL::Translation > in Bio::EnsEMBL::Utils::TranscriptAlleles > > Bio::SeqFeature::FeaturePair > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeature::Generic > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeatureI > in Bio::EnsEMBL::SeqFeatureI > > Bio::SimpleAlign > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::Species > in Bio::EnsEMBL::DBSQL::MetaContainer > > > I have not looked at the other Ensembl APIs (Variation, FuncGen, > Compara, Web, Pipeline, etc.), and I might possibly have missed > references to some BioPerl modules. I have also not indicated > the relative importance of any of these modules (clearly Bio::Seq > is central, but I don't know how widely the code that accesses > Bio::SeqFeature::Generic is used) or investigated if any of the > references to BioPerl modules occur in deprecated code. > > As far as I know, there are currently no plans to get rid of these > dependencies. Or there might be, only they are not very far up the > priority list right now. I would be happy to look at conservative > patches, but can not promise snappy response times. > > > Regards, > Andreas > > -- > Andreas K?h?ri, Ensembl Software Developer -{ }- > European Bioinformatics Institute (EMBL-EBI) -{ }- > Wellcome Trust Genome Campus, Hinxton -{ }- > Cambridge CB10 1SD, United Kingdom -{ }- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From acpatel at usa.net Mon Aug 24 23:54:01 2009 From: acpatel at usa.net (Anand C. Patel) Date: Mon, 24 Aug 2009 22:54:01 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <9BA4272D-E7A1-4530-B8D8-B6156823BFDB@usa.net> I preloaded the NCBI taxonomy into the biosql database using the provided script before adding the sequences from genbank format text file (downloaded directly from genbank) using the script provided by bioperl-db, which would be what created the Bio::Species objects (I'd assume) from the text files, prior to inserting them into the database. Hope this helps, Anand On Aug 24, 2009, at 3:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>>> Hilmar Lapp wrote: >>>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>>> given that binomial() is correct. Could you file this as a bug >>>>>>> report? >>>>>> >>>>>> What code creates the Bio::Species object here? I suspect this >>>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>>> I see. Any pointer to what would tell me what I need to change >>>>> or is everything in the Bio::Species POD? >>>> >>>> ... I won't guarantee the perfection of the POD ;) >>>> >>>> >>>>> BTW what the Bioperl-db code does is instantiate the blank >>>>> object and then populate it through its accessors (mostly the >>>>> classification() array). If what it has been doing in the past >>>>> is now considered incorrect, at least it doesn't raise any >>>>> warning that would alert one to that ... >>>> >>>> Yuh... If you point out the code that creates the Bio::Species I >>>> can look into it for you and suggest what needs changing and why >>>> it doesn't work (or if it's a bug in Bio::Species). I can't >>>> remember things clearly right now, though classification() I >>>> guess was supposed to be backwards compatible. >>> Sendu, I think it's related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >>> Bio::DB::BioSQL::SpeciesAdaptor and >>> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules >>> in question i think. >> >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). I would > think this only affects sequence data loaded w/o taxonomy preloaded, > but I'm not sure. > > I suggest, if you can fix it, go ahead make the necessary change. > We can then post a big warning to BioSQL and here about the problem, > something along the lines of 'bioperl-db in svn may be backwards > incompatible with species information loaded in previous versions; > it may eat your first born' or similar. It's an absolutely > necessary fix, and may effectively kill a bunch of other db/species- > related bugs. > > chris > From dan.bolser at gmail.com Tue Aug 25 11:16:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 25 Aug 2009 16:16:14 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? Message-ID: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Hi, Can some one set $wgEnableMWSuggest on the BioPerl wiki please? http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest I generally find this a great feature to have on any MW install. Can we also create a page (usually "BioPerl:Configuration" (or '$wgSiteName:Configuration')) to report details of the specific MW configuration settings used on the wiki? This is also a good place for people to request configuration changes to tweak the way the wiki works. Cheers, Dan. From jason at bioperl.org Tue Aug 25 13:17:44 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 25 Aug 2009 10:17:44 -0700 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: Can you send sysadmin request mail to the helpdesk - support at open-bio.org so mauricio or someone can have it in the queue. [aside] I've had to stop doing OBF sysadmin work so we are definitely looking for someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris Dagdigian who do mediawiki and sysadmin support. We've reached a bit of crunch where there are lots of things to tweak and customize for the various flavors of MW installs that the projects want but we don't have enough dedicated admins to really support this. Most of us have gotten into these projects to support our own bioinformatics programming not sysadmin tasks so there is a bit of gap here. Some of us (me) were not trained as sysadmin but jumped in and figured out how to help and do it - and learned valuable life skills... =) We're discussing plans to upgrade the machines in the future which would improve performance and reliability we hope and also use this opportunity to streamline the MW installs to be a more easily maintained wikifarm. [/aside] -jason On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > Hi, > > Can some one set $wgEnableMWSuggest on the BioPerl wiki please? > > http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest > > > I generally find this a great feature to have on any MW install. Can > we also create a page (usually "BioPerl:Configuration" (or > '$wgSiteName:Configuration')) to report details of the specific MW > configuration settings used on the wiki? This is also a good place for > people to request configuration changes to tweak the way the wiki > works. > > > Cheers, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Tue Aug 25 09:45:59 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 25 Aug 2009 14:45:59 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... I just downloaded ? ensembl ? ensembl-compara ? ensembl-variation ? ensembl-functgenomics from their website and did a regex on the files for /^use (Bio::.+);/ which reveals (filtering out Bio::EnsEMBL::*): Bio::AlignIO Bio::Annotation::DBLink Bio::Das::ProServer::SourceAdaptor Bio::Das::ProServer::SourceAdaptor::Transport::generic Bio::Index::Fastq Bio::LocatableSeq Bio::Location::Simple Bio::MAGE::Experiment::Experiment Bio::MAGE::XMLUtils Bio::Perl Bio::PrimarySeq Bio::PrimarySeqI Bio::Root::Root Bio::Root::RootI Bio::Search::HSP::EnsemblHSP Bio::Seq Bio::SeqFeature::FeaturePair Bio::SeqFeature::Generic Bio::SeqFeatureI Bio::SeqIO Bio::SimpleAlign Bio::Species Bio::Tools::CodonTable Bio::Tools::Run::Phylo::PAML::Codeml Bio::TreeIO does that help? (I have the list broken down by which module/script contains which if that helps also) cheers adam From hartzell at alerce.com Tue Aug 25 16:22:20 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 13:22:20 -0700 Subject: [Bioperl-l] code review on LocatableSeq performance fix. Message-ID: <19092.18428.494334.482303@already.dhcp.gene.com> [For better or worse] I use pairs of locatable seq's to represent alignments between cDNAs (spliced mRNA) and genomic sequence. I end up using column_from_residue_number a lot to map features back and forth between the coordinate system. My sequences tend to be fairly long, and the current implementation of column_from_residue_number (which splits the sequences into arrays of individual characters) performs very badly on them. I've included below a small variation on a patch that I've been using for a while (when I pulled it up to the current bioperl-live I changed a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). Instead of creating whopping big arrays and then looping over them it breaks the sequence down into runs of residues/gaps and strides across them. It also unwinds the strandedness test and avoids the cute trick of using an anonymous sub (which saves a couple of lines in the source file but adds *signficant* overhead every time around the loop). All hail Devel::NYTProf. Chris et al.'s comments about the mysteries and vagaries of Bio::LocatableSeq makes me leary of just committing it. Anyone want to comment on it? g. Index: Bio/LocatableSeq.pm =================================================================== --- Bio/LocatableSeq.pm (revision 16001) +++ Bio/LocatableSeq.pm (working copy) @@ -423,27 +423,47 @@ unless $resnumber =~ /^\d+$/ and $resnumber > 0; if ($resnumber >= $self->start() and $resnumber <= $self->end()) { - my @residues = split //, $self->seq; - my $count = $self->start(); - my $i; - my ($start,$end,$inc,$test); - my $strand = $self->strand || 0; - # the following bit of "magic" allows the main loop logic to be the - # same regardless of the strand of the sequence - ($start,$end,$inc,$test)= ($strand == -1)? - (scalar(@residues-1),0,-1,sub{$i >= $end}) : - (0,scalar(@residues-1),1,sub{$i <= $end}); + my @chunks; + my $column_incr; + my $current_column; + my $current_residue = $self->start - 1; + my $seq = $self->seq; + my $strand = $self->strand || 0; - for ($i=$start; $test->(); $i+= $inc) { - if ($residues[$i] ne '.' and $residues[$i] ne '-') { - $count == $resnumber and last; - $count++; - } - } - # $i now holds the index of the column. - # The actual column number is this index + 1 + if ($strand == -1) { +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = -1; + $current_column = (CORE::length $seq) + 1; + } + else { +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = 1; + $current_column = 0; + } - return $i+1; + while (my $chunk = shift @chunks) { +# if ($chunk =~ m|^[\.\-]|o) { + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { + $current_column += $column_incr * CORE::length($chunk); + } + else { + if ($current_residue + CORE::length($chunk) < $resnumber) { + $current_column += $column_incr * CORE::length($chunk); + $current_residue += CORE::length($chunk); + } + else { + if ($strand == -1) { + $current_column -= $resnumber - $current_residue; + } + else { + $current_column += $resnumber - $current_residue; + } + return $current_column; + } + } + } } $self->throw("Could not find residue number $resnumber"); From hartzell at alerce.com Tue Aug 25 17:07:43 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 14:07:43 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> Message-ID: <19092.21151.457226.192791@already.dhcp.gene.com> Adam Witney writes: > > > > It would be nice if someone at Ensembl could compile a list of > > BioPerl dependencies. At least that would give a feel for the scope > > of the problem... > > I just downloaded > > $,1s"(B ensembl > $,1s"(B ensembl-compara > $,1s"(B ensembl-variation > $,1s"(B ensembl-functgenomics > > from their website and did a regex on the files for > > /^use (Bio::.+);/ > > which reveals (filtering out Bio::EnsEMBL::*): > > Bio::AlignIO > Bio::Annotation::DBLink > Bio::Das::ProServer::SourceAdaptor > Bio::Das::ProServer::SourceAdaptor::Transport::generic > Bio::Index::Fastq > Bio::LocatableSeq > Bio::Location::Simple > Bio::MAGE::Experiment::Experiment > Bio::MAGE::XMLUtils > Bio::Perl > Bio::PrimarySeq > Bio::PrimarySeqI > Bio::Root::Root > Bio::Root::RootI > Bio::Search::HSP::EnsemblHSP > Bio::Seq > Bio::SeqFeature::FeaturePair > Bio::SeqFeature::Generic > Bio::SeqFeatureI > Bio::SeqIO > Bio::SimpleAlign > Bio::Species > Bio::Tools::CodonTable > Bio::Tools::Run::Phylo::PAML::Codeml > Bio::TreeIO > > does that help? (I have the list broken down by which module/script > contains which if that helps also) What would be most useful to me would be to understand where they *need* to use release 1.2.3. Is there something magical about their use of e.g. Bio::Seq. It's worth noting that your technique won't pick up various modules that are loaded on demand by e.g. Bio::SearchIO. g. From maj at fortinbras.us Wed Aug 26 07:39:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 07:39:40 -0400 Subject: [Bioperl-l] code review on LocatableSeq performance fix. In-Reply-To: <19092.18428.494334.482303@already.dhcp.gene.com> References: <19092.18428.494334.482303@already.dhcp.gene.com> Message-ID: <55514878273F4E3F8D9E438FD2F3AB7D@NewLife> I think it's great. column_from_residue_number doesn't have any secret side effects, and the patch preserves nice integer in, nice integer out, and input and output both are 1-origin indices as far as I can tell. I say go for it- MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, August 25, 2009 4:22 PM Subject: [Bioperl-l] code review on LocatableSeq performance fix. > > [For better or worse] I use pairs of locatable seq's to represent > alignments between cDNAs (spliced mRNA) and genomic sequence. > > I end up using column_from_residue_number a lot to map features back > and forth between the coordinate system. > > My sequences tend to be fairly long, and the current implementation of > column_from_residue_number (which splits the sequences into arrays of > individual characters) performs very badly on them. > > I've included below a small variation on a patch that I've been using > for a while (when I pulled it up to the current bioperl-live I changed > a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It > passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). > > Instead of creating whopping big arrays and then looping over them it > breaks the sequence down into runs of residues/gaps and strides across > them. It also unwinds the strandedness test and avoids the cute trick > of using an anonymous sub (which saves a couple of lines in the source > file but adds *signficant* overhead every time around the loop). > > All hail Devel::NYTProf. > > Chris et al.'s comments about the mysteries and vagaries of > Bio::LocatableSeq makes me leary of just committing it. > > Anyone want to comment on it? > > g. > > Index: Bio/LocatableSeq.pm > =================================================================== > --- Bio/LocatableSeq.pm (revision 16001) > +++ Bio/LocatableSeq.pm (working copy) > @@ -423,27 +423,47 @@ > unless $resnumber =~ /^\d+$/ and $resnumber > 0; > > if ($resnumber >= $self->start() and $resnumber <= $self->end()) { > - my @residues = split //, $self->seq; > - my $count = $self->start(); > - my $i; > - my ($start,$end,$inc,$test); > - my $strand = $self->strand || 0; > - # the following bit of "magic" allows the main loop logic to be the > - # same regardless of the strand of the sequence > - ($start,$end,$inc,$test)= ($strand == -1)? > - (scalar(@residues-1),0,-1,sub{$i >= $end}) : > - (0,scalar(@residues-1),1,sub{$i <= $end}); > + my @chunks; > + my $column_incr; > + my $current_column; > + my $current_residue = $self->start - 1; > + my $seq = $self->seq; > + my $strand = $self->strand || 0; > > - for ($i=$start; $test->(); $i+= $inc) { > - if ($residues[$i] ne '.' and $residues[$i] ne '-') { > - $count == $resnumber and last; > - $count++; > - } > - } > - # $i now holds the index of the column. > - # The actual column number is this index + 1 > + if ($strand == -1) { > +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = -1; > + $current_column = (CORE::length $seq) + 1; > + } > + else { > +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = 1; > + $current_column = 0; > + } > > - return $i+1; > + while (my $chunk = shift @chunks) { > +# if ($chunk =~ m|^[\.\-]|o) { > + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { > + $current_column += $column_incr * CORE::length($chunk); > + } > + else { > + if ($current_residue + CORE::length($chunk) < $resnumber) { > + $current_column += $column_incr * CORE::length($chunk); > + $current_residue += CORE::length($chunk); > + } > + else { > + if ($strand == -1) { > + $current_column -= $resnumber - $current_residue; > + } > + else { > + $current_column += $resnumber - $current_residue; > + } > + return $current_column; > + } > + } > + } > } > > $self->throw("Could not find residue number $resnumber"); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Aug 26 10:59:24 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 26 Aug 2009 16:59:24 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis Message-ID: <4A954DCC.4050200@pasteur.fr> Hi, I am playing with Bio::Restriction::* objects and find it very useful. Especially I am filtering output for blunt and cohesive enzymes. However, there's an exception thrown when I use 'cutters' method from B::R::Analysis : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (34). End must be less than the total length of sequence (total=7) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::PrimarySeq::subseq /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 STACK: Bio::Restriction::Analysis::_cuts /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 STACK: Bio::Restriction::Analysis::cut /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 STACK: Bio::Restriction::Analysis::cutters /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion lib/Bio/Restriction/Analysis/blunt.pm:86 STACK: Bio::Restriction::Analysis::blunt::cut_in_frames lib/Bio/Restriction/Analysis/blunt.pm:65 STACK: ./check_phase.pl:213 ----------------------------------------------------------- The problem with this enzyme is that the cut site is over the enzyme recognition site (from Rebase withrefm.907): <1>BceSI <2> <3>SSAAGCG(27/27) <4> <5>Bacillus cereus <6>ATCC 10987 <7> <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. Lett., vol. 202, pp. 189-193. Xu, S.-Y., Unpublished observations. For this enzyme, here are the values stored into B::R::Enzyme object ($e): $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN $e->cut => 34 $e->string => SSAAGCG $e->seq->seq => SSAAGCG So my question is, wouldn't be faire to set B::PrimarySeq::seq with value of $e->site when such enzyme are seen in the source file. NOTE from B::R::Analysis::_enzymes_sites (commented): # The following should not be an exception, both Type I and Type III # enzymes cut outside of their recognition sequences #if ($site < 0 || $site > length($enz->string)) { # $self->throw("This is (probably) not your fault.\nGot a cut site of $site and a # sequence of ".$enz->string); # } And this is exactly the problem I'm facing! In _enzymes_sites the code is trying to subseq our sequence to get before and after seq as : $beforeseq=$enz->seq->subseq(1, $site); $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); and this throws an error as the cutting site is far over (pos 34) the enzyme know recognition site SSAAGCG (length=7). Has anybody a clue on how to fix/patch it? Thanks for any reply Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Aug 26 11:20:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 10:20:59 -0500 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: <07222470-41ED-4E17-9383-65A7D02CE9E1@illinois.edu> What version of Bioperl are you using? Mark Jensen did some refactoring of this code after the 1.6.0 release that should appear in 1.6.1; I'll be working on the first alpha for that release starting Friday. chris On Aug 26, 2009, at 9:59 AM, Emmanuel Quevillon wrote: > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type > III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Wed Aug 26 11:38:44 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Wed, 26 Aug 2009 11:38:44 -0400 Subject: [Bioperl-l] Generalized reciprocal blast Message-ID: I would like to know whether or not anyone has attempted to create a "generalized" reciprocal blast component for BioPerl? One sees papers all the time where they discuss running reciprocal blasts to compare a new species to an old "standard" species or a set of species or running an all-to-all set of comparisons to match up all of the "known" proteins from species and determine which are outliers (and therefore "novel"). There are also accumulating merged sets in NCBI HomoloGene (which seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) and Ensembl (which seems to be working with a much larger set of 40-50 genomes some of which may be somewhat incomplete and are certainly poorly "explored". I have, I believe, seen code "fragments" from various authors, perhaps some on the BioPerl list, which perform some major subset of a typical "reciprocal blast". Now what I am looking for is a relatively generalizable some-to-some reciprocal blast utility. I want to be able to specify the genes (or gene family), e.g. some of the ~150 known DNA repair genes. It would be helpful to also specify how "tolerant" the blast "true reciprocal" criteria are. There are some genes where there is a very strict 1-to-1 relationship across many genomes. But for genes which involve relatively standard domains, e.g. "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for example its more like 5-to-5 and it would be really nice to be able to specify the strictness or quality level [1] for "matching" genes (and even which genes are to be excluded because they are known to be false homologues). Then to top this off I want to be able to combine known public e.g. (HomoloGene / Uniigene / Ensembl) databases with perhaps local private databases or database subsets (e.g. emerging or specialized genomes). The goal here of course to determine the precise phylogenetic relationships between all of the DNA repair genes and how there may be gain / loss / evolution of function that can be related to species characteristics (size, longevity, etc.). Is there a generalized reciprocal blast component in BioPerl? Or is it a "build-it-yourself" situation (that I have to believe has been built probably a few dozen times by various researchers / organizations / companies)? Thanks, Robert Bradbury 1. This would be handled in BioPerl with a customizable user function which could be tailored to handle specific cases -- for example a function which when handed a set of 100 potential "matches" could go through those 100 matches, identify common domains, and then "re-rate" matches based on considerations such as the type and number of common domains, domains being in the same order, etc. I.e. criteria which may be difficult to completely generalize across entire genomes but are fairly obvious if you are looking at a graphical replication of a gene set in HomoloGene. From jason at bioperl.org Wed Aug 26 11:55:04 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 26 Aug 2009 08:55:04 -0700 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: Robert - BioPerl is has traditionally been a toolkit for building these types of pipelines and not intended to necessarily be a place for larger systems. That said, BRH is a pretty easy algorithm that could be applied with the tools in place, the main issue is what kind of lookup table you want to do for establishing the BRH. Hashes are okay, but I think BDB or Sqlite end up being more scalable and allow for persistence. Really, I would use something like OrthoMCL rather than reciprocal BLAST to identify families anyways. It uses Bioperl under the hood for parsing - though it suffers from some pretty inefficient management of the lookup table for the BRH part of the algorithm - it can be run on your own customized datasets to integrate public and private data. You might also find better luck in building good alignments for the key members of your target gene family of interest and then using a profile HMM (or even just the new HMMER3 jackhmmer or phmmer which don't require a MSA) to identify the full set of homologs in all the databases. If this is the only set of families you care about it is a lot less computational work to go through and pull these out with an HMM or HMMER search and build trees from these results rather than dealing with the computational time of the all-vs-all DB searches that you are proposing. -jason On Aug 26, 2009, at 8:38 AM, Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal > blasts to > compare a new species to an old "standard" species or a set of > species or > running an all-to-all set of comparisons to match up all of the > "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI > HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" > genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly > poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, > perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes > (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be > helpful > to also specify how "tolerant" the blast "true reciprocal" criteria > are. > There are some genes where there is a very strict 1-to-1 > relationship across > many genomes. But for genes which involve relatively standard > domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in > mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes > (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic > relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics > (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is > it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user > function which > could be tailored to handle specific cases -- for example a function > which > when handed a set of 100 potential "matches" could go through those > 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, > domains being > in the same order, etc. I.e. criteria which may be difficult to > completely > generalize across entire genomes but are fairly obvious if you are > looking > at a graphical replication of a gene set in HomoloGene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Wed Aug 26 11:20:41 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 11:20:41 -0400 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: Hi Emmanuel-- This may be fixed in the latest version of Bio::Restriction, which is not available in the standard 1.6 distribution. I suggest you try replacing the Bio/Restriction directory in your distribution with the current bioperl-live modules. You can get these by using Subversion: $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction ./Restriction If you're brave, better might be to obtain the latest trunk and reinstall; $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live $ cd bioperl-live $ perl Build.PL $ ./Build $ ./Build test $ ./Build install Please update the list with your progress- cheers Mark ----- Original Message ----- From: "Emmanuel Quevillon" To: Sent: Wednesday, August 26, 2009 10:59 AM Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Aug 26 12:03:59 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 12:03:59 -0400 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: re:aside -- I can help with this; I promise not to break anything. cheers MAJ ----- Original Message ----- From: "Jason Stajich" To: "Dan Bolser" Cc: "BioPerl List" Sent: Tuesday, August 25, 2009 1:17 PM Subject: Re: [Bioperl-l] $wgEnableMWSuggest on the wiki please? > Can you send sysadmin request mail to the helpdesk - support at open-bio.org > so mauricio or someone can have it in the queue. > > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking > for someone to help with the ALL VOLUNTEER team of now just Mauricio > and Chris Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak > and customize for the various flavors of MW installs that the projects > want but we don't have enough dedicated admins to really support > this. Most of us have gotten into these projects to support our own > bioinformatics programming not sysadmin tasks so there is a bit of gap > here. Some of us (me) were not trained as sysadmin but jumped in and > figured out how to help and do it - and learned valuable life > skills... =) > > We're discussing plans to upgrade the machines in the future which > would improve performance and reliability we hope and also use this > opportunity to streamline the MW installs to be a more easily > maintained wikifarm. > > [/aside] > > -jason > On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > >> Hi, >> >> Can some one set $wgEnableMWSuggest on the BioPerl wiki please? >> >> http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest >> >> >> I generally find this a great feature to have on any MW install. Can >> we also create a page (usually "BioPerl:Configuration" (or >> '$wgSiteName:Configuration')) to report details of the specific MW >> configuration settings used on the wiki? This is also a good place for >> people to request configuration changes to tweak the way the wiki >> works. >> >> >> Cheers, >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed Aug 26 12:25:21 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 Aug 2009 18:25:21 +0200 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <628aabb70908260925q25039506nab6e1c661f704e2a@mail.gmail.com> Hi Robert, Just to add another comment on this: The problem of identifying orthologs is quite a bit trickier than it looks, in part due to the many-to-many relationships you noted. There is a whole body of literature on this topic -- here's a recent review that includes OrthoMCL that Jason mentioned and others: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000262 (disclaimer: I work in a lab that offers one of the many attempts to solve this problem) So I would say that although it is possible to make a customizable function as you describe, there are several existing approaches (read: downloadable code you can run on your data) that would probably give better results. Dave From hsa_rim at yahoo.co.in Wed Aug 26 15:56:38 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 01:26:38 +0530 (IST) Subject: [Bioperl-l] Latest Cytoband files Message-ID: <484629.15190.qm@web94612.mail.in2.yahoo.com> Hi, Can anybody tell me how can I get latest cytoband files with stain information for homo spaiens, mus musculus and others. I am using 36.3 version of RefSeq for Humans and 36.1 version of RefSeq for mus musculus. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From cjfields at illinois.edu Wed Aug 26 16:36:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 15:36:31 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates Message-ID: All, I just pushed one very key bit for nextgen sequence analysis to svn, mainly parsing of all three FASTQ variants. These can be called by using: # grabs the FASTQ parser, specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq-illumina', -file => 'mydata.fq'); # same, explicitly specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq', -variant => 'illumina', -file => 'mydata.fq'); # simple 'fastq' format defaults to 'sanger' variant my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>mydata.fq'); FASTQ works for both input and output. As mentioned before, the next_dataset() method also exists for getting simple hashrefs, see the module documentation for more. This was one of the few remaining blockers for the 1.6.1 point release. I'll run a clean checkout of main trunk to test, then work on merging everything over from trunk starting Friday and push out 1.6.0_1 (first alpha) beginning of next week to get some CPAN Tester information. If everything looks fine the final point release will follow soon after. Cheers! chris From rmb32 at cornell.edu Wed Aug 26 16:56:20 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 Aug 2009 13:56:20 -0700 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <4A95A174.3070706@cornell.edu> Hurray! You rock Chris! R From lsbrath at gmail.com Wed Aug 26 17:08:06 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 26 Aug 2009 17:08:06 -0400 Subject: [Bioperl-l] rendering graphics from genbank files. Message-ID: <69367b8f0908261408g6750c1d2we3409a016fe186b7@mail.gmail.com> Hi, I am running into to problems rendering the 5'UTR and 3'UTR features in the graphic. I get an error message saying that these are string literals. Better yet, how do I add the 5'UTR and 3'UTR regions to the CDS feature when the only features in my genbank file are mRNA, CDS, and gene? What I want is to display the gene structure. I am using the last template provided in bioperl howto graphics. Mgavi From biopython at maubp.freeserve.co.uk Wed Aug 26 17:16:08 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 22:16:08 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> On Wed, Aug 26, 2009 at 9:36 PM, Chris Fields wrote: > All, > > I just pushed one very key bit for nextgen sequence analysis to svn, mainly > parsing of all three FASTQ variants. ?These can be called by using: > > ?# grabs the FASTQ parser, specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq-illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# same, explicitly specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -variant ? => 'illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# simple 'fastq' format defaults to 'sanger' variant > ?my $out = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> '>mydata.fq'); > > FASTQ works for both input and output. ?As mentioned before, the > next_dataset() method also exists for getting simple hashrefs, see the > module documentation for more. > > This was one of the few remaining blockers for the 1.6.1 point release. > ... ?If everything looks fine the final point release will follow soon after. It is looking much better than yesterday - nice work :) However, there are a few rough edges still. =========================== Evil wrapping =========================== Chris - Did you get the zip file of FASTQ examples I sent off list? One of these was the evil_wrapping.fastq file already in Biopython CVS/git (under a new name). This is intended as a real torture test, with line wrapped quality strings where plenty of the lines start with "+" or "@" characters. Bioperl doesn't like this file at all - but I have not dug into why. =========================== Sanger To Illumina 1.3+ =========================== When mapping a Sanger FASTQ file with very high scores to Illumina, these don't get the maximum value imposes (ASCII 126, tidle). e.g. $ ./biopython_sanger2illumina < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:676: UserWarning: Data loss - max PHRED quality 62 in Illumina FASTQ warnings.warn("Data loss - max PHRED quality 62 in Illumina FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ But, with bioperl-live SVN, $ ./bioperl_sanger2illumina < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for illumina:63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ You are using "@" (ASCI 64), which in this context means a PHRED score of zero. =========================== Sanger To Solexa =========================== Likewise when mapping a Sanger FASTQ file with very high scores to Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, tidle). For example, $ ./biopython_sanger2solexa < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; But, $ ./bioperl_sanger2solexa < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for solexa:0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFEDB@><< i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 (an odd thing to happen - getting the lowest score wouldn't surprise me so much). Furthermore, notice that PHRED scores 0 and 1 have both been mapped to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. =========================== Still, things are looking up :) Peter From maj at fortinbras.us Wed Aug 26 17:03:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 17:03:13 -0400 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <4A95A174.3070706@cornell.edu> References: <4A95A174.3070706@cornell.edu> Message-ID: <1E03634D20424F659F417AE7F5D26039@NewLife> +1 ----- Original Message ----- From: "Robert Buels" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 26, 2009 4:56 PM Subject: Re: [Bioperl-l] Next-Gen and the next point release - updates > Hurray! You rock Chris! > > R > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sac at bioperl.org Wed Aug 26 18:33:16 2009 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 26 Aug 2009 15:33:16 -0700 Subject: [Bioperl-l] MGED meeting in Phoenix, AZ, Oct 5-8 Message-ID: <8f200b4c0908261533y74c42b1aif662ef13a8fe6711@mail.gmail.com> The MGED Society's annual meeting is of potential interest to anyone working with functional genomics data sets, or interested in best practices for analyzing and annotating their functional genomics experiments. The meeting topic is "Next-Gen Sequencing and Translational Genomics" and as usual, they've got a great line-up of speakers (included below). It's in Phoenix, AZ Oct 5-8, early registration ends on 5 Sep. (Note that MGED has expanded its reach beyond just microarrays.) For more information on registration and abstract submission, go to * http://www.mgedmeeting.org* For hotel accommodations, go to * http://www.starwoodmeeting.com/StarGroupsWeb/res?id=0903232443&key=42DE2* Keynotes *Hank Greely* Deane F. and Kate Edelman Johnson Professor of Law Stanford Law School *Elaine Mardis* Associate Professor, Genetics, Molecular Microbiology Washington University in St. Louis School of Medicine *Daniel Von Hoff* Director, Clinical Translational Research Division Translational Genomics Research Institute (TGen) Plenary Speakers: *Steven Brenner* Associate Professor, Plant and Microbial Biology University of California, Berkeley *Lynda Chin* Associate Professor, Dermatology Dana Farber Cancer Institute, Harvard Medical School *David Craig* Associate Director, Neurogenomics Division Translational Genomics Research Institute (TGen) *Michael Eisen* Scientist, Lawrence Berkeley National Lab and Associate Professor Department of Molecular and Cellular Biology, University of California, Berkeley *Gad Getz* Head of Cancer Genome Analysis at the Broad Institute of MIT and Harvard *Mathieu Lupien* Assistant Professor, Genetics Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center *Joanna Mountain* Senior Director, Research 23andMe, Inc. *Dana Pe'er* Assistant Professor, Biology and Computer Science Columbia University Biological Sciences *John Quackenbush* Professor of Computational Biology & Bioinformatics, Biostatistics Dana Farber Cancer Institute, Harvard School of Public Health *Cole Trapnell* Ph. D. Student, Computer Science University of Maryland, College Park From cjfields at illinois.edu Wed Aug 26 22:52:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 21:52:13 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: On Aug 26, 2009, at 4:16 PM, Peter wrote: > It is looking much better than yesterday - nice work :) > However, there are a few rough edges still. Not unexpected, actually. > =========================== > Evil wrapping > =========================== > Chris - Did you get the zip file of FASTQ examples I sent off list? > One of > these was the evil_wrapping.fastq file already in Biopython CVS/git > (under > a new name). This is intended as a real torture test, with line > wrapped > quality strings where plenty of the lines start with "+" or "@" > characters. > Bioperl doesn't like this file at all - but I have not dug into why. Now fixed; I've saved this as very_tricky.fastq, but it's the same file. > =========================== > Sanger To Illumina 1.3+ > =========================== > When mapping a Sanger FASTQ file with very high scores to Illumina, > these don't get the maximum value imposes (ASCII 126, tidle). e.g. ... Yes, I know where that one is going wrong. Fixed now for bounds for the above. Partly related to the below. > =========================== > Sanger To Solexa > =========================== > Likewise when mapping a Sanger FASTQ file with very high scores to > Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, > tidle). For example, > > $ ./biopython_sanger2solexa < sanger_93.fastq > /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: > UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ > warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFECB@>;; > > But, > > $ ./bioperl_sanger2solexa < sanger_93.fastq > > --------------------- WARNING --------------------- > MSG: Quality values not found for > solexa: > 0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 > --------------------------------------------------- > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFEDB@><< > > i.e. You've mapped the high value scores to "<", ASCII 60, thus > Solexa -4 > (an odd thing to happen - getting the lowest score wouldn't surprise > me so > much). This one is fixed, it was the same bounding issue as above. > Furthermore, notice that PHRED scores 0 and 1 have both been mapped > to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning > Solexa -5. The two conversions to solexa are still failing. I'm not sure but I think it's something fairly simple, but I can't work on it until Friday (got too many other things on my plate ATM). If I get stumped I'll post a message. > =========================== > > Still, things are looking up :) > > Peter Yes they are, much more so that previously. I'll add these to the tests. chris From tuco at pasteur.fr Thu Aug 27 04:28:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Thu, 27 Aug 2009 10:28:41 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: References: <4A954DCC.4050200@pasteur.fr> Message-ID: <4A9643B9.7000709@pasteur.fr> Mark A. Jensen wrote: > Hi Emmanuel-- > This may be fixed in the latest version of Bio::Restriction, which is not > available in the standard 1.6 distribution. I suggest you try replacing the > Bio/Restriction directory in your distribution with the current > bioperl-live > modules. You can get these by using Subversion: > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction > ./Restriction Hi Mark, Thanks for pointing me to this svn repo. I've just updated the Bio::Restriction::* part just to test it. I don't get any error anymore. I just need to continue working on this with my ideas. I'll let you know if I encounter any other problem. Cheers Emmanuel > > If you're brave, better might be to obtain the latest trunk and reinstall; > > $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > $ cd bioperl-live > $ perl Build.PL > $ ./Build > $ ./Build test > $ ./Build install > > Please update the list with your progress- > cheers > Mark >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From dan.bolser at gmail.com Thu Aug 27 06:34:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 27 Aug 2009 11:34:00 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: <2c8757af0908270334kcb3dfc4w17553e65f7e0e4b5@mail.gmail.com> 2009/8/25 Jason Stajich : > Can you send sysadmin request mail to the helpdesk - support at open-bio.org?so > mauricio or someone can have it in the queue. OK. > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking for > someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris > Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak and > customize for the various flavors of MW installs that the projects want but > we don't have enough dedicated admins to really support this. ?Most of us I know how you feel! > have gotten into these projects to support our own bioinformatics > programming not sysadmin tasks so there is a bit of gap here. Some of us > (me) were not trained as sysadmin but jumped in and figured out how to help > and do it - and learned valuable life skills... =) > > We're discussing plans to upgrade the machines in the future which would > improve performance and reliability we hope and also use this opportunity to > streamline the MW installs to be a more easily maintained wikifarm. Sounds like a good idea. There are also extensions that put more of the MW config on the website itself (restricted to admins of course). Dan. From hsa_rim at yahoo.co.in Thu Aug 27 07:14:03 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 16:44:03 +0530 (IST) Subject: [Bioperl-l] Mapping of genome with cytoband Message-ID: <29549.68962.qm@web94610.mail.in2.yahoo.com> Hi, I need gene , mrna , cds , sts and exon files as per the mapping with cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and .gbk files but the genes and other features are not coming across the whole chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from .gbk / .gbs files the locations on chromosome 1 genes show only half way on the ideogram graph. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From biopython at maubp.freeserve.co.uk Thu Aug 27 07:55:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 27 Aug 2009 12:55:55 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: <320fb6e00908270455y2a80907chfae8007df60e72e2@mail.gmail.com> On Thu, Aug 27, 2009 at 3:52 AM, Chris Fields wrote: > > On Aug 26, 2009, at 4:16 PM, Peter wrote: > >> It is looking much better than yesterday - nice work :) >> However, there are a few rough edges still. > > Not unexpected, actually. > >> =========================== >> Evil wrapping >> =========================== >> Chris - Did you get the zip file of FASTQ examples I sent off list? One of >> these was the evil_wrapping.fastq file already in Biopython CVS/git (under >> a new name). This is intended as a real torture test, with line wrapped >> quality strings where plenty of the lines start with "+" or "@" >> characters. >> Bioperl doesn't like this file at all - but I have not dug into why. > > Now fixed; I've saved this as very_tricky.fastq, but it's the same file. Looks good. >> =========================== >> Sanger To Illumina 1.3+ >> =========================== >> When mapping a Sanger FASTQ file with very high scores to Illumina, >> these don't get the maximum value imposes (ASCII 126, tidle). e.g. > > ... > > Yes, I know where that one is going wrong. ?Fixed now for bounds for the > above. ?Partly related to the below. Looks good. >> =========================== >> Sanger To Solexa >> =========================== >> Likewise when mapping a Sanger FASTQ file with very high scores to >> Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, >> tidle). For example, >> ... >> i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 >> (an odd thing to happen - getting the lowest score wouldn't surprise me so >> much). > > This one is fixed, it was the same bounding issue as above. Yes, the high score truncation looks good. >> Furthermore, notice that PHRED scores 0 and 1 have both been mapped >> to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. > > The two conversions to solexa are still failing. ?I'm not sure but I think > it's something fairly simple, but I can't work on it until Friday (got too > many other things on my plate ATM). ?If I get stumped I'll post a message. Actually it's not just PHRED 0 and 1 that look wrong, all of the low scores are messed up. I could repeat this using the sanger_93.fastq file, but to avoid email line wrapping here I'm using a smaller example file with PHRED scores in the range 40 to 0 only: $ cat sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! Biopython: $ python ./biopython_sanger2solexa.py < sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; BioPerl SVN (with Chris' latest fixes): $ ./bioperl_sanger2solexa.pl < sanger_faked.fastq --------------------- WARNING --------------------- MSG: Data loss for solexa: following values exceed max 62 0 --------------------------------------------------- @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFDCA?=~ The last ten characters are wrong (i.e. PHRED score 0 to 9, which is precisely the range where the PHRED/Solexa mapping is non trivial). Also note that data loss warning is misleading (0 is less than 62). Plus you get the exactly same problems with Illumina to Solexa. This should narrow it down - the bug is in mapping PHRED scores (from either Sanger or Illumina 1.3+ files) to the Solexa encoding. Peter From sanjaysingh765 at gmail.com Thu Aug 27 09:59:13 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Thu, 27 Aug 2009 19:29:13 +0530 Subject: [Bioperl-l] query about libwww-perl collection Message-ID: hello, i want to use libwww-perl collection to query BLINK with multiple queries. it works in very good way for single but how can i used it for multiple queries...lz help me out regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From bosborne11 at verizon.net Thu Aug 27 11:10:30 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 11:10:30 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: > > ----- Original Message ----- From: "Hilmar Lapp" > ... >> As for the FASTA example, I can understand - I've heard repeatedly >> from people that one of the things that they are missing is >> documentation for every SeqIO format we support (such as GenBank, >> UniProt, FASTA, etc) about where to find a particular piece of the >> format in the object model. > .... > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help create > our list of action items. > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 27 13:38:45 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:38:45 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations Message-ID: <4A96C4A5.9090406@cornell.edu> Hi all, Recently a user came into #bioperl looking to truncate an annotated sequence (leaving the region between e.g. 150 to 250 nt), and have the annotations from the original sequence be remapped onto the new truncated sequence. Poking through code, I came across an undocumented function trunc() that from the comments looks like it was written by Jason as part of a master plan to implement this very functionality. Just wondering, what's the status of that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rmb32 at cornell.edu Thu Aug 27 13:40:41 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:40:41 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C4A5.9090406@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> Message-ID: <4A96C519.3020001@cornell.edu> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 Rob Robert Buels wrote: > Hi all, > > Recently a user came into #bioperl looking to truncate an annotated > sequence (leaving the region between e.g. 150 to 250 nt), and have the > annotations from the original sequence be remapped onto the new > truncated sequence. > > Poking through code, I came across an undocumented function trunc() that > from the comments looks like it was written by Jason as part of a master > plan to implement this very functionality. > > Just wondering, what's the status of that? > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 27 14:20:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 13:20:42 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C519.3020001@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> Message-ID: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> It's not implemented completely. As Jason mentioned in the bug report, it was meant to be part of an overall system to truncate sequences with remapped features, but the implementation in place is substandard. It's open for implementation if anyone wants to take it up. I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal with this in a more elegant and lightweight way, and is probably the direction I would take. YMMV. chris On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 > > Rob > > Robert Buels wrote: >> Hi all, >> Recently a user came into #bioperl looking to truncate an annotated >> sequence (leaving the region between e.g. 150 to 250 nt), and have >> the annotations from the original sequence be remapped onto the new >> truncated sequence. >> Poking through code, I came across an undocumented function trunc() >> that from the comments looks like it was written by Jason as part >> of a master plan to implement this very functionality. >> Just wondering, what's the status of that? >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Aug 27 14:41:28 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 11:41:28 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> Message-ID: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Yeah one thought that we batted around at a hackathon many moons ago had been to use Bio::DB::SeqFeature in a lightweight way under the hood to represent sequences in layers more rather than the arbitrary data model that is setup by focusing on handling GenBank records. A lot of the architecture development (that is like 10-15 years old now!) was initially just focused on round-tripping the sequence files. We more recently felt like a new model was more appropriate. With the fast SQLite implementation that Lincoln has put in for DB::SeqFeature we could in theory map every sequence into a SQLite DB and then have the power of the interface. Some more bells and whistles might be needed but the basic API is respected AFAIK and it prevents needing to store whole sequences in memory. The SeqIO->DB::SeqFeature loading would need some finessing so that as parsed the sequence object could be updated efficiently. Actually this might also help reduce the number of objects needed to be created by basically efficiently serializing sequences into the DB on parsing (and with some simple caching this could make for pretty fast system). Since disk is basically not a limitation now could be an interesting experiment? Maybe it is too out there, but if not it could be something major enough that it has to go in a bioperl-2/ bioperl-ng. It sort of assumes the data model of Bio::DB::SeqFeature is adequate for all the messiness of sequence data formats and one problem for some people has been the seq file format => GFF in order to load it into a SeqFeature DB for Gbrowse... So I don't know what are the boundary cases here. Certainly for FASTA it should be straightforward. -jason On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > It's not implemented completely. As Jason mentioned in the bug > report, it was meant to be part of an overall system to truncate > sequences with remapped features, but the implementation in place is > substandard. It's open for implementation if anyone wants to take > it up. > > I should point out, though, in my opinion Bio::DB::GFF/SeqFeature > deal with this in a more elegant and lightweight way, and is > probably the direction I would take. YMMV. > > chris > > On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > >> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >> >> Rob >> >> Robert Buels wrote: >>> Hi all, >>> Recently a user came into #bioperl looking to truncate an >>> annotated sequence (leaving the region between e.g. 150 to 250 >>> nt), and have the annotations from the original sequence be >>> remapped onto the new truncated sequence. >>> Poking through code, I came across an undocumented function >>> trunc() that from the comments looks like it was written by Jason >>> as part of a master plan to implement this very functionality. >>> Just wondering, what's the status of that? >>> Rob >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From lsbrath at gmail.com Thu Aug 27 15:04:36 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 27 Aug 2009 15:04:36 -0400 Subject: [Bioperl-l] rendering the 5' & 3' UTR in a graphic Message-ID: <69367b8f0908271204p7f153be1p6673faac931b646d@mail.gmail.com> Hello, I am able to render all of the features except the 5' & 3' UTR. This is how the features part of the Genbank file looks: FEATURES Location/Qualifiers source 1..185000 /note="locus_tag=Nbl1" /organism="Mus musculus" gene 142646..153328 /note="locus_tag=Nbl1" /gene="ENSMUSG00000041120" /note="neuroblastoma, suppression of tumorigenicity 1 [Source:MGI;Acc:MGI:104591]" 5'UTR 142646..150000 /note="Nbl1" mRNA join(142646..142794,149973..150167,150269..150380, 152019..153328) /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" CDS join(150001..150167,150269..150380,152019..152276) /db_xref="CCDS:CCDS18839.1" /db_xref="MGI:Nbl1" /db_xref="Vega_mouse_transcript:OTTMUST00000022949" /protein_id="ENSMUSP00000045608" /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" misc_feature 150001..152276 /note="deletion" 3'UTR 152277..153328 /gene="Nbl1" ORIGIN - 1 GACCAGAGCC ACTCGCTAGG AGTCACACCG AGCCTGGGGG TCCGAAGGGA ACAGCATCAA He is the code: # file: embl2picture.pl # This is code example 6 in the Graphics-HOWTO # Author: Lincoln Stein use strict; #use lib "$ENV{HOME}/projects/bioperl-live"; use Bio::Graphics; use Bio::SeqIO; use constant USAGE =>< Render a GenBank/EMBL entry into drawable form. Return as a GIF or PNG image on standard output. File must be in embl, genbank, or another SeqIO- recognized format. Only the first entry will be rendered. Example to try: embl2picture.pl factor7.embl | display - END my $file = shift or die USAGE; my $io = Bio::SeqIO->new(-file=>$file) or die USAGE; my $seq = $io->next_seq or die USAGE; my $wholeseq = Bio::SeqFeature::Generic->new( -start => 1, -end => $seq->length, -display_name => $seq->display_name ); # script reads the features from the sequence object by calling all_SeqFeatures() my @features = $seq->all_SeqFeatures; # sorts each feature by its primary tag into a hash # of array references named %sorted_features my %sorted_features; my %want = map {$_ =>1} qw/source CDS gene utr5prime utr3prime mRNA misc_feature/; for my $f (@features) { #get cds, primer_bind, and genes features only my $tag = $f->primary_tag; # create a hash of $f keys and $tag values #push @{$sorted_features{$tag}},$f if ($tag =~ /CDS|gene|mRNA|source|misc_feature|5'UTR|3'UTR/); push @{$sorted_features{$tag}},$f if ($want{$tag}); } # we create the Bio::Graphics::Panel object. # As in previous examples, we specify the width of the image, # as well as some extra white space to pad out the left and right borders. my $panel = Bio::Graphics::Panel->new( -length => $seq->length, -key_style => 'between', -width => 400, -pad_left => 10, -pad_right => 10, ); # We now add two tracks, one for the scale # and the other for the sequence as a whole. $panel->add_track($wholeseq, -glyph => 'arrow', -bump => 0, -double => 1, -tick => 2, -bgcolor => 'blue', -label => 1, ); =cut $panel->add_track($wholeseq, -glyph => 'generic', -bgcolor => 'blue', -label => 1, ); =cut # Locate primary tag of "CDS" and create a track using a glyph # at creation time. After we handle this special case, we remove # the CDS feature type from the %sorted_features associative array. if ($sorted_features{CDS}) { $panel->add_track($sorted_features{CDS}, -glyph => 'transcript2', -bgcolor => 'orange', -fgcolor => 'black', -font2color => 'red', -key => 'CDS', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'CDS'}; } # Locate primary tag of "mRNA" and create a track using a glyph # at creation time. After we handle this special case, we remove # the mRNA feature type from the %sorted_features associative array. if ($sorted_features{mRNA}) { $panel->add_track($sorted_features{mRNA}, -glyph => 'transcript2', -bgcolor => 'red', -fgcolor => 'black', -font2color => 'red', -key => 'mRNA', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'mRNA'}; } #=cut # Locate primary tag of "5'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 5'UTR feature type from the %sorted_features associative array. if ($sorted_features{utr5prime}) { $panel->add_track($sorted_features{utr5prime}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => 'utr5prime', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{utr5prime}; } =cut # Locate primary tag of "3'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 3'UTR feature type from the %sorted_features associative array. if ($sorted_features{3\'UTR}) { $panel->add_track($sorted_features{'3\'UTR'}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => '3\'UTR', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'3\'UTR'}; } =cut # general case # Create a track for each feature type. In order to distinguish the tracks by color, # we initialize an array of 9 color names and simply cycle through them my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua); my $idx = 0; for my $tag (sort keys %sorted_features) { my $features = $sorted_features{$tag}; $panel->add_track($features, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'red', -key => "${tag}s", -bump => +1, -height => 8, # -description option to point to a subroutine # that will generate more informative description strings. -description => \&generic_description, ); } binmode(STDOUT); print $panel->png; exit 0; sub gene_label { my $feature = shift; my @notes; foreach (qw(product gene)) { @notes = eval {$feature->get_tag_values($_)}; last; } $notes[0]; } sub gene_description { my $feature = shift; my @notes; foreach (qw(note)) { # Notice that we place calls to get_tag_values() inside eval{} blocks # in order to avoid having an exception raised if the feature does not # have a tag with the desired value. @notes = eval{$feature->get_tag_values($_)}; last; } return unless @notes; substr($notes[0],30) = '...' if length $notes[0] > 30; $notes[0]; } sub generic_description { my $feature = shift; my $description; foreach ($feature->get_all_tags) { my @values = $feature->get_tag_values($_); $description .= $_ eq 'note' ? "@values" : "$_=@values; "; } $description =~ s/; $//; # get rid of last $description; } sub fp_utr{ my $five_prime_utr = '5\'UTR'; return $five_prime_utr; } This is how the image currently looks: Any ideas why I am unable to render the 5' & 3' UTR features? From jorvis at gmail.com Thu Aug 27 15:23:05 2009 From: jorvis at gmail.com (Joshua Orvis) Date: Thu, 27 Aug 2009 15:23:05 -0400 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: I should weigh in here since I am the above-mentioned 'user' who posed the question in #bioperl. To clarify, to train one particular gene finder I need to take a full genbank file with annotation for a whole genome and create separate gbk records, one for each gene. Each record will then contain the gene, exon coordinates for the CDS and sequence for the gene. I can iterate through the features of the full record and do the math myself for each spliced coordinate, making/writing individual records as I go, but thought I would see if BioPerl had any mechanism to extract a region of an annotated record and treat the starting base of that extraction as position 1, recoordinating all the other features that were present. Then I could just iterate through the features of the whole entry, extracting regions for each gene as I see them. Hopefully this makes sense. Joshua On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > > Yeah one thought that we batted around at a hackathon many moons ago had > been to use Bio::DB::SeqFeature in a lightweight way under the hood to > represent sequences in layers more rather than the arbitrary data model that > is setup by focusing on handling GenBank records. A lot of the architecture > development (that is like 10-15 years old now!) was initially just focused > on round-tripping the sequence files. We more recently felt like a new model > was more appropriate. With the fast SQLite implementation that Lincoln has > put in for DB::SeqFeature we could in theory map every sequence into a > SQLite DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is respected > AFAIK and it prevents needing to store whole sequences in memory. The > SeqIO->DB::SeqFeature loading would need some finessing so that as parsed > the sequence object could be updated efficiently. > > Actually this might also help reduce the number of objects needed to be > created by basically efficiently serializing sequences into the DB on > parsing (and with some simple caching this could make for pretty fast > system). Since disk is basically not a limitation now could be an > interesting experiment? Maybe it is too out there, but if not it could be > something major enough that it has to go in a bioperl-2/bioperl-ng. It > sort of assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some people has > been the seq file format => GFF in order to load it into a SeqFeature DB for > Gbrowse... So I don't know what are the boundary cases here. Certainly for > FASTA it should be straightforward. > > -jason > > On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > > It's not implemented completely. As Jason mentioned in the bug report, it >> was meant to be part of an overall system to truncate sequences with >> remapped features, but the implementation in place is substandard. It's >> open for implementation if anyone wants to take it up. >> >> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >> with this in a more elegant and lightweight way, and is probably the >> direction I would take. YMMV. >> >> chris >> >> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >> >> Looks like bug 1572 is related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>> >>> Rob >>> >>> Robert Buels wrote: >>> >>>> Hi all, >>>> Recently a user came into #bioperl looking to truncate an annotated >>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>> annotations from the original sequence be remapped onto the new truncated >>>> sequence. >>>> Poking through code, I came across an undocumented function trunc() that >>>> from the comments looks like it was written by Jason as part of a master >>>> plan to implement this very functionality. >>>> Just wondering, what's the status of that? >>>> Rob >>>> >>> >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Aug 27 16:00:24 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 13:00:24 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: So when I did this for the retraining of AUGUSTUS I loaded all my gene models in Bio::DB::GFF as GFF3 and then just extracted each locus I needed +/- some surrounding sequence context and wrote it out as genbank file. There might have been one or two problems collapsing the features back into Genbank's concept of a CDS as a single-feature rather than individual, but I just make a split-location and added the sub-pieces to it. It was only a few lines of code to do it right - the flatten/unflatten being one of the most annoying parts maybe we could work out to streamline. -jason On Aug 27, 2009, at 12:23 PM, Joshua Orvis wrote: > I should weigh in here since I am the above-mentioned 'user' who > posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate > gbk > records, one for each gene. Each record will then contain the gene, > exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the > math myself > for each spliced coordinate, making/writing individual records as I > go, but > thought I would see if BioPerl had any mechanism to extract a region > of an > annotated record and treat the starting base of that extraction as > position > 1, recoordinating all the other features that were present. Then I > could > just iterate through the features of the whole entry, extracting > regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich > wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons >> ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood >> to >> represent sequences in layers more rather than the arbitrary data >> model that >> is setup by focusing on handling GenBank records. A lot of the >> architecture >> development (that is like 10-15 years old now!) was initially just >> focused >> on round-tripping the sequence files. We more recently felt like a >> new model >> was more appropriate. With the fast SQLite implementation that >> Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence >> into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is >> respected >> AFAIK and it prevents needing to store whole sequences in memory. >> The >> SeqIO->DB::SeqFeature loading would need some finessing so that as >> parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed >> to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). Since disk is basically not a limitation now could be an >> interesting experiment? Maybe it is too out there, but if not it >> could be >> something major enough that it has to go in a bioperl-2/bioperl- >> ng. It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate >> for all >> the messiness of sequence data formats and one problem for some >> people has >> been the seq file format => GFF in order to load it into a >> SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. >> Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> It's not implemented completely. As Jason mentioned in the bug >> report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is >>> substandard. It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature >>> deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an >>>>> annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and >>>>> have the >>>>> annotations from the original sequence be remapped onto the new >>>>> truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function >>>>> trunc() that >>>>> from the comments looks like it was written by Jason as part of >>>>> a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Thu Aug 27 16:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 15:19:56 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: On Aug 27, 2009, at 1:41 PM, Jason Stajich wrote: > Yeah one thought that we batted around at a hackathon many moons ago > had been to use Bio::DB::SeqFeature in a lightweight way under the > hood to represent sequences in layers more rather than the arbitrary > data model that is setup by focusing on handling GenBank records. A > lot of the architecture development (that is like 10-15 years old > now!) was initially just focused on round-tripping the sequence > files. We more recently felt like a new model was more appropriate. > With the fast SQLite implementation that Lincoln has put in for > DB::SeqFeature we could in theory map every sequence into a SQLite > DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is > respected AFAIK and it prevents needing to store whole sequences in > memory. The SeqIO->DB::SeqFeature loading would need some finessing > so that as parsed the sequence object could be updated efficiently. Exactly my thought. Probably worth pushing the FeatureHolderI interface into something like a SeqFeature::Collection. What about annotation? Maybe add that to the 'source' feature? Also makes me think Seq needs to be RangeI (or potentially locatable to another sequence). Bio::DB::SF::Segment is. I'm thinking the old way of doing it (parsing a file) is still possible, but underneath would be an Bio::Index or similar, and the returned Bio::Seq would have a backend Bio::Index/ Bio::SeqFeature::Collection database (the latter maybe being lazily implemented). > Actually this might also help reduce the number of objects needed to > be created by basically efficiently serializing sequences into the > DB on parsing (and with some simple caching this could make for > pretty fast system). Since disk is basically not a limitation now > could be an interesting experiment? Yes. > Maybe it is too out there, but if not it could be something major > enough that it has to go in a bioperl-2/bioperl-ng. It sort of > assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some > people has been the seq file format => GFF in order to load it into > a SeqFeature DB for Gbrowse... So I don't know what are the boundary > cases here. Certainly for FASTA it should be straightforward. > > -jason Well, one could possibly test something like this on a branch, or with their own Bio::Seq, or in Biome ;> Just sayin'.... chris From maj at fortinbras.us Thu Aug 27 20:58:34 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 27 Aug 2009 20:58:34 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Message-ID: <4C2E185C74CF449495BC8FDC26419702@NewLife> Thanks Brian; these are really valuable insights and suggestions. Of course, the "todo list" is not "mine", but the community's (otherwise, I would have used Post-its), and I have added your action items to it. My thinking about a survey is twofold. Intermittent users may, likely will, have different issues than the usual suspects here on the list, or they will put those issues in a different way--likely with more expression of affect, which I personally think is key. It seems to me that documentation is the public face of this project, and hearing visceral reactions from "the public" will help us (or me) prioritize. The other fold is, this kind of data is better acquired a) actively, rather than passively ("Please respond to this thread") and b) anonymously. Obviously, it can't be active in the sense of spamming, but we could reduce the energy barrier by providing something clickable with a few textboxes to the list. cheers MAJ ----- Original Message ----- From: Brian Osborne To: Mark A. Jensen Cc: BioPerl List ; Chris Fields Sent: Thursday, August 27, 2009 11:10 AM Subject: Re: [Bioperl-l] on BP documentation Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: ----- Original Message ----- From: "Hilmar Lapp" ... As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Aug 27 22:00:01 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 22:00:01 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <4C2E185C74CF449495BC8FDC26419702@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> <4C2E185C74CF449495BC8FDC26419702@NewLife> Message-ID: <047387CF-C3AD-4E2E-8FB8-091AB23D5FEE@verizon.net> Mark, As you wish. As I said, the one who does the work calls the shots, this is not a democracy. The fundamental problem is, and I speak with some experience here, that detailed examination of documentation is of so little interest that participation in the survey will be limited ("the usual suspects"), and the results will be skewed. You're not going to get reactions from "the public", the thousands of Bioperl users. But, if you feel comfortable with the notion that a survey will justify your actions, do it. But honestly, I know that you already know what to do. Brian O. On Aug 27, 2009, at 8:58 PM, Mark A. Jensen wrote: > My thinking about a survey is twofold. Intermittent users may, > likely will, have different issues than the usual suspects here on > the list, or they will put those issues in a different way--likely > with more expression of affect, which I personally think is key. It > seems to me that documentation is the public face of this project, > and hearing visceral reactions from "the public" will help us (or > me) prioritize. The other fold is, this kind of data is better > acquired a) actively, rather than passively ("Please respond to this > thread") and b) anonymously. Obviously, it can't be active in the > sense of spamming, but we could reduce the energy barrier by > providing something clickable with a few textboxes to the list. From David.Messina at sbc.su.se Fri Aug 28 04:40:47 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 Aug 2009 10:40:47 +0200 Subject: [Bioperl-l] on BP documentation Message-ID: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> > - Use Dobfuscator links to reveal method documentation > -- Most notably in SeqIO HOWTO Do you mean to click on a method name in a HOWTO and open up the Deobfuscator view of that method's documentation? I like that. > -- Does Deobfuscator have a bug or two that need to be fixed? I use > it, it seems to work but I've heard a rumor... It's true -- sometimes the Deobfuscator claims that a method isn't documented when it is. Mark, I can commit to fixing this. It's long overdue, so I'm happy to use your doc push as an impetus. Dave From maj at fortinbras.us Fri Aug 28 07:31:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 28 Aug 2009 07:31:05 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> References: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> Message-ID: Dave-- thanks for stepping up- MAJ ----- Original Message ----- From: "Dave Messina" To: "Brian Osborne" Cc: "Mark A. Jensen" ; "BioPerl List" ; "Chris Fields" Sent: Friday, August 28, 2009 4:40 AM Subject: Re: [Bioperl-l] on BP documentation > >> - Use Dobfuscator links to reveal method documentation >> -- Most notably in SeqIO HOWTO > > Do you mean to click on a method name in a HOWTO and open up the Deobfuscator > view of that method's documentation? I like that. > > >> -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it >> seems to work but I've heard a rumor... > > It's true -- sometimes the Deobfuscator claims that a method isn't documented > when it is. > > Mark, I can commit to fixing this. It's long overdue, so I'm happy to use > your doc push as an impetus. > > > Dave > > > From fgarret at ub.edu Fri Aug 28 12:37:54 2009 From: fgarret at ub.edu (Filipe Garrett) Date: Fri, 28 Aug 2009 18:37:54 +0200 Subject: [Bioperl-l] splice alignment Message-ID: <4A9807E2.4080608@ub.edu> Hi all, I need to analyse the 1st, 2nd and 3rd positions of an alignment separately. I've been through BioPerl pages but couldn't find no direct way to do it. The closest I fond was "slice" (AlignI) but it just extracts a contiguous subsequence. Is there any subroutine that does the job? Or maybe a more generic one, so we can select the columns to be extracted; eg: @aln_pos = qw/1,4,7,10,13,14,17,20/; $aln_1 = $aln->get_pos(@aln_pos); thanks in adv, FG -- Filipe G. Vieira Departament de Genetica Universitat de Barcelona Av. Diagonal, 645 08028 Barcelona SPAIN Phone: +34 934 035 306 Fax: +34 934 034 420 fgarret at ub.edu http://www.ub.edu/molevol/ From mmorley at mail.med.upenn.edu Fri Aug 28 17:18:28 2009 From: mmorley at mail.med.upenn.edu (Michael Morley) Date: Fri, 28 Aug 2009 17:18:28 -0400 Subject: [Bioperl-l] How to plot coverage using Bio::DB::Sam and Bio::Graphics? Message-ID: <4A9849A4.7060702@mail.med.upenn.edu> Have a few questions some perhaps too simple which I know I should have been able to find the answers but have eluded me. Problem: What I want to do visualize coverage (Illumina RNA-seq) across a gene for 40 or so samples. I thought about gbrowse but what I was hoping to was to use Bio::Graphics and created a few PNGs of the genes I'm interested in, nothing too fancy. My current attempt: So I've used Bio::DB::Sam (thank you LDS!!,great package) as following.. Works perfect. my $features = $sam->features(-type=>'coverage',-seq_id=>$chrom,-start=>$genomest,-end=>$genomest); Then I tried this: $panel->add_track($features, -glyph => 'xyplot', -graph_type=>'histogram', ); After poking at the return of '-type=converge', I don't think this is possible directly but any ideas how I can do it? The coverage is too deep in the region to plot every sequence in the alignment, I was able to do it just was not useful. One last question.. I also would like to plot the gene model as well. If I simply grab the genbank file for refseq NM###, the features only have exon,cds,etc and coordinates based off the mRNA seq. So how does one get the genomic info and then create the track for a gene/transcript as you would see in gbrowse? Any help I'd greatly appreciate it! -Michael From roy.chaudhuri at gmail.com Sat Aug 29 09:22:53 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Sat, 29 Aug 2009 23:22:53 +1000 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: <1372eece0908290622mc21f297w503225242d82ada9@mail.gmail.com> Hi Joshua, A couple of years ago I did implement (in a fairly hacky way) a trunc_with_features method that does exactly this. It was incorporated into Bio::SeqUtils and is still there as far as I know. Maybe it would be suitable for your purposes? Roy. 2009/8/28 Joshua Orvis : > I should weigh in here since I am the above-mentioned 'user' who posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate gbk > records, one for each gene. ?Each record will then contain the gene, exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the math myself > for each spliced coordinate, making/writing individual records as I go, but > thought I would see if BioPerl had any mechanism to extract a region of an > annotated record and treat the starting base of that extraction as position > 1, recoordinating all the other features that were present. ?Then I could > just iterate through the features of the whole entry, extracting regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood to >> represent sequences in layers more rather than the arbitrary data model that >> is setup by focusing on handling GenBank records. ?A lot of the architecture >> development (that is like 10-15 years old now!) was initially just focused >> on round-tripping the sequence files. We more recently felt like a new model >> was more appropriate. ?With the fast SQLite implementation that Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is respected >> AFAIK and it prevents needing to store whole sequences in memory. ?The >> SeqIO->DB::SeqFeature loading would need some finessing so that as parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). ?Since disk is basically not a limitation now could be an >> interesting experiment? ?Maybe it is too out there, but if not it could be >> something major enough that it has to go in a bioperl-2/bioperl-ng. ? It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate for all >> the messiness of sequence data formats and one problem for some people has >> been the seq file format => GFF in order to load it into a SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. ?Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> ?It's not implemented completely. ?As Jason mentioned in the bug report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is substandard. ?It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. ?YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> ?Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>>> annotations from the original sequence be remapped onto the new truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function trunc() that >>>>> from the comments looks like it was written by Jason as part of a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY ?14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adlai at refenestration.com Sun Aug 30 12:16:41 2009 From: adlai at refenestration.com (adlai burman) Date: Sun, 30 Aug 2009 18:16:41 +0200 Subject: [Bioperl-l] Install on host server Message-ID: Hey there, I have an embarrassingly silly question. I have BioPerl set up and working on my computer. Does anyone here know if there is a standard way to ask one's hosting server to install BioPerl so you can use it within a web page? Barring that, is there a standard way to set it up for your own domain on a hosting server that knows nothing about BioPerl? Thanks, Adlai From ymc at yahoo.com Mon Aug 31 02:10:10 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 30 Aug 2009 23:10:10 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? Message-ID: <472878.20951.qm@web30402.mail.mud.yahoo.com> Hi Chris I added a check for LocatableSeq in dpAlign.pm. It will now create an Bio::Seq object internally to copy the sequence in LocatableSeq but taking out all the gaps. This should make it behave properly. I commited the updated Bio/Tools/dpAlign.pm to SVN. In dpAlign.pm, I also added a note saying what will happen if you supplied LocatableSeq to the functions in this module. With regard to that warning, I think the person who reported the bug misused the instantiator of LocatableSeq. He/she can't use the length of the sequence with gaps as the "end". The "end" should be the length without gaps. Let me know if you have any questions or concerns. Have a great day! Yee Man --- On Wed, 8/19/09, Yee Man Chan wrote: > From: Yee Man Chan > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 8:01 PM > I noticed that the $qalseq is a > LocatableSeq with gaps. I don't think my program was written > to support LocatableSeq with gaps. If I removed the gaps, > then I would have the scores agree with each other which > should be the desired outcome. > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9986984 residue count gives end value > 104. > Overriding value [101] with value 104 for > Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 291 > Getting score for ABC|9986984 -> ABC|9944760 > = 291 > > Do you think I should check for this LocatableSeq type and > give an error or should I remove the gaps if this is a > LocatableSeq? > > Yee Man > > > --- On Wed, 8/19/09, Chris Fields > wrote: > > > From: Chris Fields > > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for > CPAN, was Re:? Problems with Bioperl-ext package on > WinVista? > > To: "Yee Man Chan" > > Cc: "Robert Buels" , > "BioPerl List" > > Date: Wednesday, August 19, 2009, 7:49 AM > > I'll have a look.? It's probably > > something that hasn't been updated to deal with > > LocatableSeq's pathological end point checking. > > > > chris > > > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > > > > I tried that sample script that reportedly caused > the > > dpAlign "bug" but I can't reproduced it. All I get is > a > > warning from LocatableSeq. > > > ------------------------------------------- > > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > > "-Iblib/lib" "-Iblib/arch" > > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > > > --------------------- WARNING > --------------------- > > > MSG: In sequence ABC|9944760 residue count gives > end > > value 101. > > > Overriding value [104] with value 101 for > > Bio::LocatableSeq::end(). > > > > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > > > --------------------------------------------------- > > > Getting score for ABC|9944760 -> ABC|9986984 > > > = 300 > > > Getting score for ABC|9986984 -> ABC|9944760 > > > = 303 > > > ------------------------------------------ > > > > > > Does the test script crash in your machine? > > > > > > Yee Man > > > > > > --- On Tue, 8/18/09, Chris Fields > > wrote: > > > > > >> From: Chris Fields > > >> Subject: Re: Packaging Bio::Ext::HMM for > CPAN, was > > Re: [Bioperl-l] Problems with Bioperl-ext package on > > WinVista? > > >> To: "Robert Buels" > > >> Cc: "Yee Man Chan" , > > "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM > > >> On Aug 18, 2009, at 11:37 PM, Robert > > >> Buels wrote: > > >> > > >>> Yee Man Chan wrote: > > >>>> Is it going to be an arrangement > similar > > to > > >> bioconductor? If so, I suppose then it makes > > sense. But you > > >> might want to develop scripts to > automatically > > download and > > >> install new modules to make it user > friendly. > > >>> Yes, we are probably going to make a > > Task::BioPerl or > > >> something similar. > > >>> > > >>>> What do you mean by Bio-Ext is going > away? > > I > > >> notice quite many people using dpAlign. So > if > > Bio-Ext is > > >> going away, then at least dpAlign should > become > > another spin > > >> off. > > >>> By going away, I meant that everything > in > > there is > > >> going to be spinned off.? Except modules > that > > are no > > >> longer maintainable, if there are any in > there. > > >>> > > >>> Rob > > >> > > >> dpAlign could become another spinoff, yes, if > it's > > used > > >> (and works fine).? The problematic code > dealt > > with pSW, > > >> alignment statistics, and staden io_lib > support > > (the latter > > >> which is fairly bit rotted now): > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > >> > > >> dpAlign has it's own bug: > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > >> > > >> chris > > >> > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > From tuco at pasteur.fr Mon Aug 31 10:13:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Mon, 31 Aug 2009 16:13:41 +0200 Subject: [Bioperl-l] Can't add track to Panel Bio::Graphics Message-ID: <4A9BDA95.2020109@pasteur.fr> Hi, I'm trying to create png image using Bio::Graphics. I followed the Howto available at bioperl.org. I'm stacked when trying to add new track to my panel. So far, I can create the panel, add 2 tracks, then, probably mistaking, I can add more tracks to my panel. Here is the code. my $panel = Bio::Graphics::Panel->new( -length => $self->seq()->length(), -width => 800, -pad_top => 5, -pad_bottom => 5, -pad_left => 5, -pad_right => 5, #-key_style => 'between', ); my $bsg = Bio::SeqFeature::Generic->new( -start => 1, -seq => $self->seq()->seq(), -end => $self->seq()->length(), -display_name => $self->seq()->id(). " (".$self->seq->length()." na)", ); $bsg->attach_seq($self->seq()); #Display the reference sequence ############ #### Those 2 tracks are well displayed on the final image ########### $panel->add_track($bsg, -glyph => 'dna', -label => 1); $panel->add_track($bsg, -glyph => 'arrow', -tick => 2, -fgcolor => 'black'); #Build, if present, the single cut if(keys %$spositions){ #Create the specail track for the single cut my $strack = $panel->add_track( -glyph => 'crossbox', -label => 1, -fgcolor => 'red', -key => 'Single cut', -connector => 'dashed', ); foreach my $enz (sort { $a cmp $b } keys %{$spositions->{$strand}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->start(), -end => $spositions->{$strand}->{$enz}->{$enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->end(), -end => $spositions->{$strand}->{$enz}->{$enz}->end()); $strack->add_feature($bsfg); $strack->add_feature($bsfg2); } } #Build, if present, the double cut if(keys %$dpositions){ my $dtrack = $panel->add_track( -glyph => 'crossbox', -label => 1, -key => 'Double cut', -connector => 'dashed', ); foreach my $couple (sort { $a cmp $b } keys %{$dpositions->{$strand}}){ foreach my $cc_enz (sort { $a cmp $b } keys %{$dpositions->{$strand}->{$couple}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $couple, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->start(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $cc_enz, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->end(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->end()); $dtrack->add_feature($bsfg); $dtrack->add_feature($bsfg2); } } } print $panel->png(); Can somebody tell me what I'm missing or doing wrong? Thanks for any help Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From marcelo011982 at gmail.com Mon Aug 31 14:12:58 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Mon, 31 Aug 2009 15:12:58 -0300 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> Message-ID: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> done: #!/usr/bin/perl -w use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'Rpp2Blast.txt'); ... while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION #---------------------------------------------- my $accGB = $hit->description(); $accGB =~ m/(gb=.*?\s)/; #---------------------------------------------- print MYFILE ... $1,"\t" , #numero de acesso ao genbank ... $hsp->hit->end, "\t","\n"; ... } } } On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata wrote: > hi all.. > I was doing a script that take some information of the results of blastn > files. > Everythig was ok, but i have some dificult to pic the Genbank code number > (the 'gb' below). > I tried > > $obj->each_accession_number > $hit->name > > And some variation of this. > > > > ------------------------------ > >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h > segment 1 gmrtDrNS01 > Glycine max cDNA 3', mRNA sequence /clone_end=3' > /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 > Length = 853 > > Score = 1336 bits (674), Expect = 0.0 > Identities = 793/832 (95%), Gaps = 8/832 (0%) > Strand = Plus / Minus > > > Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt > 294917 > |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| > Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc > 794 > ---------------------------------------- > > > But, i still don't get it. > > thank you > with regards > Miwata > From jason at bioperl.org Mon Aug 31 15:49:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 31 Aug 2009 12:49:08 -0700 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> Message-ID: <4DBC8ED9-6D98-414A-A361-3FAB3EEE955C@bioperl.org> if you run blastall with -I T (show GI's in defline) you will also be able to get the genbank identifier out with $hit->ncbi_gi through some automagic parsing of the ID line -jason On Aug 31, 2009, at 11:12 AM, Marcelo Iwata wrote: > done: > > #!/usr/bin/perl -w > use strict; > use Bio::SearchIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => 'Rpp2Blast.txt'); > ... > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION > #---------------------------------------------- > my $accGB = $hit->description(); > $accGB =~ m/(gb=.*?\s)/; > #---------------------------------------------- > > > print MYFILE > ... > > $1,"\t" , #numero de acesso ao genbank > ... > $hsp->hit->end, "\t","\n"; > ... > > } > } > } > > > > On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata >wrote: > >> hi all.. >> I was doing a script that take some information of the results of >> blastn >> files. >> Everythig was ok, but i have some dificult to pic the Genbank code >> number >> (the 'gb' below). >> I tried >> >> $obj->each_accession_number >> $hit->name >> >> And some variation of this. >> >> >> >> ------------------------------ >>> gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water >>> stressed 5h >> segment 1 gmrtDrNS01 >> Glycine max cDNA 3', mRNA sequence /clone_end=3' >> /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 >> Length = 853 >> >> Score = 1336 bits (674), Expect = 0.0 >> Identities = 793/832 (95%), Gaps = 8/832 (0%) >> Strand = Plus / Minus >> >> >> Query: 294858 >> aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt >> 294917 >> |||||||||||| |||||| ||||||||||||||||| >> |||||||||||||||||||| >> Sbjct: 853 >> aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc >> 794 >> ---------------------------------------- >> >> >> But, i still don't get it. >> >> thank you >> with regards >> Miwata >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Russell.Smithies at agresearch.co.nz Mon Aug 31 17:43:25 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 1 Sep 2009 09:43:25 +1200 Subject: [Bioperl-l] Mapping of genome with cytoband In-Reply-To: <29549.68962.qm@web94610.mail.in2.yahoo.com> References: <29549.68962.qm@web94610.mail.in2.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB81F183@exchsth.agresearch.co.nz> Have you tried getting the data from UCSC (or the test site: http://genome-test.cse.ucsc.edu ) If you use Galaxy to get the data then convert to gff, it may save a bit of work. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shafeeq rim > Sent: Thursday, 27 August 2009 11:14 p.m. > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Mapping of genome with cytoband > > Hi, > > I need gene , mrna , cds , sts and exon files as per the mapping with > cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and > .gbk files but the genes and other features are not coming across the whole > chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from > .gbk / .gbs files the locations on chromosome 1 genes show only half way on > the ideogram graph. > > Thanks > > > > See the Web's breaking stories, chosen by people like you. Check out > Yahoo! Buzz. http://in.buzz.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Sat Aug 1 00:35:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 1 Aug 2009 00:35:04 -0400 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: References: Message-ID: <99E27D08408340B9B0611751A17DF266@NewLife> Sorry, I cut off the last script. The entire thing follows: /usr/local/bin/conv-ASMake.sh : #!/usr/bin/sed -f #converting an ActiveState PERL Makefile to run under cygwin make: s/^DIRFILESEP = ^\\/DIRFILESEP = \// s/^NOOP = rem/NOOP = :/ # -or- NOOP = echo -n # byebye volume s/C:/\/cygdrive\/c/ # sed to convert directory \ to / s/\([\)0-9a-zA-Z.]\)\\\([\(0-9a-zA-Z]\)/\1\/\2/g # convert full perl s/\/usr\/bin\/perl/\/cygdrive\/c\/Perl\/bin\/perl/ # a key conversion for DOC_INSTALL action /^DESTINSTALLVENDORHTMLDIR/ a\ DECYGDESTINSTALLARCHLIB = $(subst /cygdrive/c,c:,$(DESTINSTALLARCHLIB)) # --- MakeMaker tools_other section: # let cygwin do native linux commands /^MAKE/ c\ MAKE = make /^CHMOD/ c\ CHMOD = chmod /^CP/ c\ CP = cp /^MV/ c\ MV = mv /^NOOP/ c\ NOOP = : /^RM_F/ c\ RM_F = rm -f /^RM_RF/ c\ RM_RF = rm -rf /^TEST_F[^I]/ c\ TEST_F = test -f /^TOUCH/ c\ TOUCH = touch /^TEST_S/ c\ TEST_S = test -s /^DEV_NULL/ c\ DEV_NULL = > /dev/null 2>&1 /^ECHO[^_]/ c\ ECHO = echo /^ECHO_N/ c\ ECHO_N = echo -n # override OS-specific File::Spec /^MOD_INSTALL/ c\ MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e "use File::Spec::Cygwin;@File::Spec::ISA=('File::Spec::Cygwin');" -e "map { s[/cygdrive/c][] } @ARGV;install({@ARGV}, '$(VERBINST)', 0, '$(UNINST)');" -- /^FIXIN/ c\ FIXIN = $(PERLRUN) "-MExtUtils::MY" -e "MY->fixin(shift)" # remove cygwin volume prefix for doc installs /Appending installation info to/ s/DESTIN/DECYGDESTIN/ /perllocal\.pod/ s/DESTIN/DECYGDESTIN/ /NOECHO) \$(MKPATH/ s/DESTIN/DECYGDESTIN/ #end conv-ASMake.sh ----- Original Message ----- From: "Jonathan Cline" To: Cc: Sent: Friday, July 31, 2009 11:24 PM Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl >I recently mentioned working on Bio::Robotics for Tecan. Vendors > being MS-Win specific, the vendor software allows third-party software > communication through a named pipe (the literal filename is > "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific > and this pseudo-pipe is opened with sysopen() ). This is broken under > cygwin-perl due to cygwin's method of handling paths -- the sysopen > fails. However it works under ActiveState Perl and communication > through the named pipe (to the robot hardware) is OK. The standard > workaround is usually to use cygwin bash, and force the PATH to use > ActiveState perl. (Typical MS Windows incompatibility problem.) The > issue is: Perl module libraries for CPAN work under cygwin-perl > (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN > module use, or "make test", result in a bad list of incompatibility > problems. Yet ActiveState Perl is required for communicating to the > vendor application (unless there is some workaround to raw filesystem > access in cygwin-perl that I haven't found in 2 days of working this). > The stand-alone scripts I have work fine to access the named pipe > (using ActiveState Perl) since the standalone scripts have no module > INC dependencies, no CPAN module test harness, etc etc. > > This isn't specifically a Bio:: issue, though if anyone has > suggestions please email. I could try msys and see if it handles the > named-pipe-special-file better, if msys has an msys-perl distribution. > > -- > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jncline at gmail.com Sun Aug 2 23:32:20 2009 From: jncline at gmail.com (Jonathan Cline) Date: Sun, 02 Aug 2009 22:32:20 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> Message-ID: <4A765A44.7030902@gmail.com> Smithies, Russell wrote: > I "acquired" an old Biomek 1000 that I'm thinking of modernising. It was originally controlled by a monstrously large but slow pc (IBM Value Point Model 466DX2 computer with Microsoft Windows* Version 3.1) > My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) and use software like mach3 www.machsupport.com along with G-code to control it. > I come from an engineering background so it seemed like the easy way to me :-) > > Now I just need a bit of free time to get it working... > > --Russell > > > I agree, that's probably the best way to go. It's hard to know what amount of s/w processing was done on the host PC vs. the embedded controller. If you were able to connect directly to the robot hardware with serial port(s) or whatever it's using, it would be tough to find out the comm protocol unless someone has already reverse engineered it (which is doubtful). Also from what I have seen online, attempting to run the old software under virtual machine is unpredictable due to timing differences in the serial port communication. So removal of the old electronics is probably the best bet. If it has one arm, then it's much easier. As for robots with working workstation software, it seems the annoyance factor is that while the scripting languages are powerful (for GUI scripting that is), they are still relatively low level. Bio types with a bit of CS seem to immediately turn to visual basic, labview, or even excel spreadsheets and macros, in order to provide a higher level abstraction for the workstation software. To me, it seems natural that there should be a "protocol compiler" which takes biology protocols as input, and gives robot instructions as output (google "protolexer"). The huge bottleneck of course is that everyone's robotics work tables and equipment are somewhat unique to their needs. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >> Sent: Thursday, 30 July 2009 2:07 p.m. >> To: bioperl-l at lists.open-bio.org >> Cc: Jonathan Cline >> Subject: [Bioperl-l] Bio::Robotics namespace discussion >> >> I am writing a module for communication with biology robotics, as >> discussed recently on #bioperl, and I invite your comments. >> >> Currently this mode talks to a Tecan genesis workstation robot ( >> http://images.google.com/images?q=tecan genesis ). Other vendors are >> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >> 'net with the exception of some visual basic and labview scripts which I >> have found. There are some computational biologists who program for >> robots via high level s/w, but these scripts are not distributed as OSS. >> >> With Tecan, there is a datapipe interface for hardware communication, as >> an added $$ option from the vendor. I haven't checked other vendors to >> see if they likewise have an open communication path for third party >> software. By allowing third-party communication, then naturally the >> next step is to create a socket client-server; especially as the robot >> vendor only support MS Win and using the local machine has typical >> Microsoft issues (like losing real time communication with the hardware >> due to GUI animation, bad operating system stability, no unix except >> cygwin, etc). >> >> >> On Namespace: >> >> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >> s/w modules already called 'robots' (web spider robots, chat bots, www >> automate, etc) so I chose the longer name "robotics" to differentiate >> this module as manipulating real hardware. Bio::Robotics is the >> abstraction for generic robotics and Bio::Robotics::(vendor) is the >> manufacturer-specific implementation. Robot control is made more >> complex due to the very configurable nature of the work table (placement >> of equipment, type of equipment, type of attached arm, etc). The >> abstraction has to be careful not to generalize or assume too much. In >> some cases, the Bio::Robotics modules may expand to arbitrary equipment >> such as thermocyclers, tray holders, imagers, etc - that could be a >> future roadmap plan. >> >> Here is some theoretical example usage below, subject to change. At >> this time I am deciding how much state to keep within the Perl module. >> By keeping state, some robot programming might be simplified (avoiding >> deadlock or tracking tip state). In general I am aiming for a more >> "protocol friendly" method implementation. >> >> >> To use this software with locally-connected robotics hardware: >> >> use Bio::Robotics; >> >> my $tecan = Bio::Robotics->new("Tecan") || die; >> $tecan->attach() || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack1"); >> $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray", to >> => "DNATray"); >> ... >> >> To use this software with remote robotics hardware over the network: >> >> # On the local machine, run: >> use Bio::Robotics; >> >> my @connected_hardware = Bio::Robotics->query(); >> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >> @connected_hardware\n"; >> $tecan->attach() || die; >> $tecan->configure("my work table configuration file") || die; >> # Run the server and process commands >> while (1) { >> $error = $tecan->server(passwordplaintext => "0xd290"); >> if ($tecan->lastClientCommand() =~ /^shutdown/) { >> last; >> } >> } >> $tecan->detach(); >> exit(0); >> >> # On the remote machine (the client), run: >> use Bio::Robotics; >> >> my $server = "heavybio.dyndns.org:8080"; >> my $password = "0xd290"; >> my $tecan = Bio::Robotics->new("Tecan"); >> $tecan->connect($server, $mypassword) || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack200"); >> $tecan->pipette(aspirate => "1", dispense => "1", >> from => "sampleTray A1", to => "DNATray A2", >> volume => "45", liquid => "Buffer"); >> $tecan->pipette(drop => "1"); >> ... >> $tecan->disconnect(); >> exit(0); >> >> >> >> -- >> >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From dan.bolser at gmail.com Tue Aug 4 08:03:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:03:00 +0100 Subject: [Bioperl-l] problem with t/LocalDB/SeqFeature.t when host ne localhost In-Reply-To: References: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> Message-ID: <2c8757af0908040503oe2a258dkac4311bb099dc3ac@mail.gmail.com> 2009/7/31 Chris Fields : > Dan, > > Can you file this as a BioPerl bug? ?I'm planning on driving towards > releasing 1.6.1 alpha1 soon (next few weeks) and I would like to get this > one fixed. http://bugzilla.open-bio.org/show_bug.cgi?id=2899 Dan. From dan.bolser at gmail.com Tue Aug 4 08:14:02 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:14:02 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0908040514w198085cfgf4a1adc344095f36@mail.gmail.com> 2009/4/27 Heikki Lehvaslaiho : > Dan, > > Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. > > Test and extend, > > ? ?-Heikki Thanks for help with this. I finally got round to looking at the code (after several others had done the same). I have messed with the code a bit, and added a 'mask_below_threshold' method [1] and some tests to go with it (including some extra tests) [2]. Cheers, Dan. [1] http://bugzilla.open-bio.org/show_bug.cgi?id=2897 [2] http://bugzilla.open-bio.org/show_bug.cgi?id=2898 > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> I'll take your code and put it into bioperl-live rewritten the way I >> suggested and add few tests. >> >> That should get you started, >> >> ? -Heikki >> >> 2009/4/27 Dan Bolser : >>> Hi Heikki, >>> >>> Thanks very much for the advice on how to better implement the clear >>> range method within the Bio::Seq::Quality object. I can understand the >>> logic of what you have written, and it all sounds reasonable. The only >>> problem is that I am very inexperienced with working on object >>> oriented Perl (my 'one man' projects to date have never really >>> required me to think beyond scripts, and its been years since I >>> actually tried to code objects in Perl). >>> >>> To be specific, when you say, "Lets add a method that sets the >>> threshold and stores it internally as $self->_threshold", ignoring any >>> other functionality, what would that method look like? in particular, >>> how would $self->_threshold be implemented? >>> >>> I think once I see that detail, I can go ahead and try to code what >>> you suggested. >>> >>> >>> Similarly (Chris), where would I put the tests / how would they be implemented? >>> >>> >>> Thanks again for the feedback. >>> >>> All the best, >>> Dan. >>> >>> >>> >>> 2009/4/27 Heikki Lehvaslaiho : >>>> Dan, >>>> >>>> It looks like your method does two different things: >>>> >>>> 1. Returns the longest subsequence above the threshold >>>> 2. Analyses the the sequence for the number of ranges the current >>>> threshold creates. >>>> >>>> Why not separate these functions? >>>> >>>> Lets add a method that sets the threshold and stores it internally as >>>> $self->_threshold. Setting it to a new values should trigger emptying >>>> all the caches (see below.) >>>> >>>> Lets have two more public methods: >>>> >>>> 1. get_clean_range() - optional argument 'threshold' >>>> >>>> It returns the longest clean subseq. >>>> >>>> 2. count_clean_ranges() -again optional argument 'threshold' >>>> >>>> This returns the number of ranges detected. >>>> >>>> Both methods call first the public method threshold if the argument >>>> has been given and then an internal method ?_find_clean_ranges(). That >>>> method calculates all the ranges and stores them internally ?(as >>>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>>> that needs to be emptied whenever any of the critical values of the >>>> object changes: threshold, quality or seq. Create an internal method >>>> $self->_clear_cache, that does that. >>>> >>>> Now the quality new object does not get created until you call >>>> get_clean_range() which accesses the cached values (or creates them if >>>> they are not there). >>>> >>>> This design allows you to have no extra penalty for adding more >>>> methods that act on cached values. For example, it might be sensible >>>> thing to do ?at some point to look at all the ranges that are longer >>>> than some length. Then you could write in your program: >>>> >>>> >>>> $qual->threshold(10); >>>> if ($qual->count_clean_ranges = 1) { >>>> ?my $newqual = $qual->get_clean_range() >>>> ?# do your analysis >>>> } elsif ($qual->count_clean_ranges = 0) { >>>> ? # do some reporting and logging >>>> } else { ?# more than one ranges >>>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>>> ? # do some more work and possibly select the best one(s) >>>> } >>>> >>>> >>>> >>>> Yours, >>>> >>>> ? -Heikki >>>> >>>> 2009/4/24 Chris Fields : >>>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>>> possible, tests don't hurt either! >>>>> >>>>> chris >>>>> >>>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>>> >>>>>> Its a bit rough and ready, but it does what I need... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> =head2 get_clear_range >>>>>> >>>>>> Title ? ?: get_clear_range >>>>>> >>>>>> Title ? ?: subqual >>>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>>> Function : Get the clear range using the given quality score as a >>>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>>> >>>>>> Returns ?: a new Bio::Seq::Quality object >>>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>>> >>>>>> =cut >>>>>> >>>>>> sub get_clear_range >>>>>> { >>>>>> ? my $self = shift; >>>>>> ? my $qual = $self->qual; >>>>>> ? my $minQual = shift || 13; >>>>>> >>>>>> ? my (@ranges, $rangeFlag); >>>>>> >>>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Log the range >>>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? ? ? ?else{ >>>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? } >>>>>> ? ## Did we exit the last clear range? >>>>>> ? if(defined($rangeFlag)){ >>>>>> ? ? ? ?my $i = scalar(@$qual); >>>>>> ? ? ? ?## Log the range >>>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? } >>>>>> >>>>>> ? unless(@ranges){ >>>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>>> ? } >>>>>> >>>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>>> >>>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>>> >>>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>>> >>>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>>> ? ? ? ?} >>>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>>> >>>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>>> $_->[1]+1), >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>>> $_->[1]+1) >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>>> ? } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>>> in (apart from all the debugging output that I spit out). >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>>> >>>>>> 2009/4/24 Dan Bolser : >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I couldn't find out how to get the 'clear range' from a >>>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>>> >>>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>>> I take the time to finish that off. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Dan. >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>>> -- >>>> ? ?-Heikki >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +27 (0)714328090 >>>> Sent from Claremont, WC, South Africa >>>> >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From dan.bolser at gmail.com Tue Aug 4 12:32:31 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 17:32:31 +0100 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> Message-ID: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> 2009/7/28 shalabh sharma : > Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to find > overall percentage similarity between them. > How i can do that? Tried using blast? You can download that. Try asking in irc://irc.freenode.net/#bioinformatics Dan. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shameer at ncbs.res.in Tue Aug 4 12:43:40 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 4 Aug 2009 22:13:40 +0530 (IST) Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> Message-ID: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Hello Shalabh, You may try ALISTAT. Available as a part of SQUID library from Prof. Sean Eddy. Make an alignment of your 100 sequences and use alignment as input of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ Best, Khader Shameer > 2009/7/28 shalabh sharma : >> Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to >> find >> overall percentage similarity between them. >> How i can do that? > > Tried using blast? > > You can download that. > > > Try asking in irc://irc.freenode.net/#bioinformatics > > Dan. > > >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 4 13:36:34 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 Aug 2009 13:36:34 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Message-ID: <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> Hi All, thanks a lot. @Khader Shameer, ALISTAT is what i was looking for. But still it gives you the average identity, what i need exactly is the average similarity. Thanks Shalabh Sharma On Tue, Aug 4, 2009 at 12:43 PM, K. Shameer wrote: > Hello Shalabh, > > You may try ALISTAT. Available as a part of SQUID library from Prof. Sean > Eddy. Make an alignment of your 100 sequences and use alignment as input > of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ > > Best, > Khader Shameer > > > 2009/7/28 shalabh sharma : > >> Hi All, I have some protein sequences (around 100) i need to > >> find > >> overall percentage similarity between them. > >> How i can do that? > > > > Tried using blast? > > > > You can download that. > > > > > > Try asking in irc://irc.freenode.net/#bioinformatics > > > > Dan. > > > > > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From shalabh.sharma7 at gmail.com Wed Aug 5 09:31:21 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 Aug 2009 09:31:21 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> Message-ID: <9fcc48c70908050631q1a080b74x12e81985b455332e@mail.gmail.com> Hi, Thanks for the reply. I used clustalW for the MSA. Also i was just wondering that what if i use smith Waterman (EMBOSS' water) and pass the same library as query sequences and reference library, then just parse it and calculate average similarity.Is this right approach? Thanks Shalabh On Wed, Aug 5, 2009 at 3:10 AM, Dan Bolser wrote: > 2009/8/4 shalabh sharma : > > Hi All, thanks a lot. > > @Khader Shameer, ALISTAT is what i was looking for. But still it gives > you > > the average identity, what i need exactly is the average similarity. > > The problem is that identity is well defined. Similarity is more > vague, and at least depends on a particular alignment scoring matrix. > How did you align your sequences? > > Dan. > > >> > Try asking in irc://irc.freenode.net/#bioinformatics > >> > > > ;-) > From michael.watson at bbsrc.ac.uk Wed Aug 5 09:50:35 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 5 Aug 2009 14:50:35 +0100 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank Message-ID: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Hi I want to download GSS sequences using Bio::DB::GenBank. When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. I'm using bioperl 1.5.1. Any clues? Mick From rmb32 at cornell.edu Wed Aug 5 11:28:46 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 05 Aug 2009 08:28:46 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <4A79A52E.7000104@cornell.edu> I think you're looking for the -db => 'nucgss' option. I'll add a better listing of this (undocumented) options to the Bio::DB::Query::GenBank docs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu michael watson (IAH-C) wrote: > Hi > > I want to download GSS sequences using Bio::DB::GenBank. > > When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. > > I'm using bioperl 1.5.1. > > Any clues? > > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Wed Aug 5 12:16:04 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 5 Aug 2009 09:16:04 -0700 Subject: [Bioperl-l] Job opening at Genentech [SSF, CA]. Message-ID: <19065.45124.4999.922147@already.dhcp.gene.com> I have an opening in my group in the Bioinformatics department at Genentech [South San Francisco, CA]. At the moment (for the next year or so) our main focus is rebuilding and extending a system for collecting, processing, and disseminating information about mutations and variations (think web interfaces, relational databases, alignments, workflows/pipelines). In the future we'll pick up projects related to next-gen sequencing (Me too!!! In the future, what isn't related to next-gen?), data integration, and/or lab-specific projects. First and foremost I'm looking for someone who's sharp and who enjoys computers, biology, and technology; someone who gets excited about picking up new tools but who also has a sense of responsibility and restraint. I'm looking for someone who's familiar with several languages and tools; modern Perl complemented with C is my first choice these days, supplemented with R and (when necessary) anything from the rest of the programming language bestiary. There's a fair amount of Java flying around here too so familiarity with it and the JVM world will help. Relational databases are part of the picture: Oracle for the big stuff; SQLite, Postgresql, and MySQL play niche roles. I generally interact with them via ORM's, lately it's been Rose::DB::Object on the Perl side though I've been convinced to take another look at DBIx::Class. Most of my web apps use CGI::Application, as fastcgi's, mod_perl, or simple CGI scripts, but (as with ORM's) I may take another look at Catalyst. I'm looking for someone who's interested in building real software. We'll be putting together a set of tools and data that need to hang together and evolve for at least 4-5 years. Deploy and run won't cut it. Requirements will change, so it's important to me that we build things so they're as modular and flexible as possible. Testing, source control, and documentation matter. A strong candidate will have an understanding of basic bioinformatics concepts and the ability to pick up new biology and computer science concepts as necessary. At the junior end of the spectrum I'd expect a bachelor's degree + 3 years of experience, at the upper end would a masters + 5 years (or a PhD interested in moving towards the production side of the house). I can imagine running through one or more detail oriented interview questions that drilled down (or took of on a tangent) from the following: - What's the difference between Smith-Waterman, blast, sim4, gmap, and/or bowtie alignment algorithms or tools? Which would you use when, and why? - Why is Moose better than Class::Accessor? (yes, it's Perl centered, but it could spin out into any language [e.g. why is Java better than Perl?]). What's a MOP? Who cares? - CVS, subversion, git, mercurial. You've already picked one? Which one? Why? Why not? - XML or JSON or YAML. Pick one for moving data back and forth in an Ajax based interface. Why? Would it also work well in other contexts? - How would you store information about positional features on a genome so that you could get fast random access? How would your solution tie into a larger data context? Genentech's a great place to work: solid salaries, great benefits, Bay Area location (who could ask for more?). We're open source friendly and with the arrival Robert Gentleman (our new Director, of Bioconductor/R fame) likely to become more so. The recent Roche acquisition hasn't changed life much, it seems to mostly be a source of opportunities for those of us in Research. If you know anyone who fits the bill, have them drop me a note. Thanks! g. From hilgert at cshl.edu Wed Aug 5 16:27:28 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Wed, 5 Aug 2009 16:27:28 -0400 Subject: [Bioperl-l] Bio::SeqIO issue Message-ID: Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org From cjfields at illinois.edu Wed Aug 5 17:04:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:04:14 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > Is my impression correct that Bio::SeqIO just assumes that sequences > are > being submitted in FASTA format? No. See: http://www.bioperl.org/wiki/HOWTO:SeqIO SeqIO tries to guess at the format using the file extension, and if one isn't present makes use of Bio::Tools::GuessSeqFormat. It's possible that the extension is causing the problem, or that GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to guessing). In any case, it's always advisable to explicitly indicate the format when possible. Relevant lines: return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/i; ... return 'raw' if /\.(txt)$/i; > In our experience, implementing > Bio::SeqIO led to the first line of files being cut off, regardless of > whether the files were indeed fasta files or files that only contained > sequence. Files that only contain sequence are 'raw'. Ones in FASTA are 'fasta'. > Which, in the latter, led to sequence submissions that had the > first line of nucleotides removed. Has anyone tried to write a fix for > this? This sounds like a bug, but we have very little to go on beyond your description. What version of bioperl are you using, OS, etc? What does your data look like? File extension? chris > Thanks, > > Uwe > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Aug 5 17:03:04 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:03:04 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40624DA61@EX02.asurite.ad.asu.edu> SeqIO is just a base framework for reading/writing of files. If you want it to read a fasta format, then you tell it create it the object. $seqio = Bio::SeqIO->new(-format=>'fasta'); Will tell the program to use Bio::SeqIO::fasta for the object. Look at the docs for the various formats that Bio::SeqIO supports. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hilgert, Uwe Sent: Wednesday, August 05, 2009 1:27 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::SeqIO issue Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 5 17:37:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:37:52 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> Message-ID: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Uwe, Please keep replies on the list. It's very possible that's the issue; IIRC the fasta parser pulls out the full sequence in chunks (based on local $/ = "\n>") and splits the header off as the first line in that chunk. You could probably try leaving the format out and letting SeqIO guess it, or passing the file into Bio::Tools::GuessSeqFormat directly, but it's probably better to go through the files and add a file extension that corresponds to the format. chris On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > Thanks, Chris. The files have no extension, but we indicate what > format > to use, like in the manual: > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > I wonder now whether this could exactly cause the problem: as we are > telling that input files are in fasta format they are being treated as > such (=remove first line) - regardless of whether they really are > fasta? > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > Uwe Hilgert, Ph.D. > Dolan DNA Learning Center > Cold Spring Harbor Laboratory > > C: (516) 857-1693 > V: (516) 367-5185 > E: hilgert at cshl.edu > F: (516) 367-5182 > W: http://www.dnalc.org > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, August 05, 2009 5:04 PM > To: Hilgert, Uwe > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >> Is my impression correct that Bio::SeqIO just assumes that sequences >> are >> being submitted in FASTA format? > > No. See: > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > SeqIO tries to guess at the format using the file extension, and if > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > possible that the extension is causing the problem, or that > GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to > guessing). In any case, it's always advisable to explicitly indicate > the format when possible. > > Relevant lines: > > return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > i; > ... > return 'raw' if /\.(txt)$/i; > >> In our experience, implementing >> Bio::SeqIO led to the first line of files being cut off, regardless >> of >> whether the files were indeed fasta files or files that only >> contained >> sequence. > > Files that only contain sequence are 'raw'. Ones in FASTA are > 'fasta'. > >> Which, in the latter, led to sequence submissions that had the >> first line of nucleotides removed. Has anyone tried to write a fix >> for >> this? > > This sounds like a bug, but we have very little to go on beyond your > description. What version of bioperl are you using, OS, etc? What > does your data look like? File extension? > > chris > >> Thanks, >> >> Uwe >> >> >> >> >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> >> Uwe Hilgert, Ph.D. >> >> Dolan DNA Learning Center >> >> Cold Spring Harbor Laboratory >> >> >> >> V: (516) 367-5185 >> >> E: hilgert at cshl.edu >> >> F: (516) 367-5182 >> >> W: http://www.dnalc.org >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Wed Aug 5 17:45:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:45:03 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <1A4207F8295607498283FE9E93B775B40624DA9B@EX02.asurite.ad.asu.edu> I'm not sure, but I think the module is fasta, not Fasta. So it should be -format=>'fasta', unless you're on a case-insensitive system that is forgiving the capital... Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Wednesday, August 05, 2009 2:38 PM > To: Hilgert, Uwe > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and > splits the > header off as the first line in that chunk. You could probably try > leaving the format out and letting SeqIO guess it, or passing > the file > into Bio::Tools::GuessSeqFormat directly, but it's probably > better to > go through the files and add a file extension that > corresponds to the > format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > > > Thanks, Chris. The files have no extension, but we indicate what > > format > > to use, like in the manual: > > > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > > > I wonder now whether this could exactly cause the problem: as we are > > telling that input files are in fasta format they are being > treated as > > such (=remove first line) - regardless of whether they really are > > fasta? > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > C: (516) 857-1693 > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, August 05, 2009 5:04 PM > > To: Hilgert, Uwe > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > > > >> Is my impression correct that Bio::SeqIO just assumes that > sequences > >> are > >> being submitted in FASTA format? > > > > No. See: > > > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > > SeqIO tries to guess at the format using the file extension, and if > > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > > possible that the extension is causing the problem, or that > > GuessSeqFormat guessing wrong (it's apt to do that, as it's > forced to > > guessing). In any case, it's always advisable to > explicitly indicate > > the format when possible. > > > > Relevant lines: > > > > return 'fasta' if > /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > > i; > > ... > > return 'raw' if /\.(txt)$/i; > > > >> In our experience, implementing > >> Bio::SeqIO led to the first line of files being cut off, > regardless > >> of > >> whether the files were indeed fasta files or files that only > >> contained > >> sequence. > > > > Files that only contain sequence are 'raw'. Ones in FASTA are > > 'fasta'. > > > >> Which, in the latter, led to sequence submissions that had the > >> first line of nucleotides removed. Has anyone tried to > write a fix > >> for > >> this? > > > > This sounds like a bug, but we have very little to go on beyond your > > description. What version of bioperl are you using, OS, etc? What > > does your data look like? File extension? > > > > chris > > > >> Thanks, > >> > >> Uwe > >> > >> > >> > >> > >> > >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >> > >> Uwe Hilgert, Ph.D. > >> > >> Dolan DNA Learning Center > >> > >> Cold Spring Harbor Laboratory > >> > >> > >> > >> V: (516) 367-5185 > >> > >> E: hilgert at cshl.edu > >> > >> F: (516) 367-5182 > >> > >> W: http://www.dnalc.org > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Aug 5 18:53:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Aug 2009 18:53:56 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Aug 5 19:12:52 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 5 Aug 2009 19:12:52 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> If these items were included in a Bugzilla report, that would be most convenient (= most likely to get looked carefully) and is the best place for us to keep track of these kinds of issues-- http://bugzilla.bioperl.org/ cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 05, 2009 6:53 PM Subject: Re: [Bioperl-l] Bio::SeqIO issue >I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >>> guessing). In any case, it's always advisable to explicitly indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 6 00:43:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 23:43:45 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: The SeqIO::fasta parser sets: local $/ = "\n>"; then splits the resulting chunks of data (each corresponding to a full FASTA-formatted sequence) into two pieces: my ($top,$sequence) = split(/\n/,$entry,2); If there is no description line (e.g. the file is all raw sequence data) these lines would result in reading in the whole file, then split out the first line. chris On Aug 5, 2009, at 5:53 PM, Hilmar Lapp wrote: > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show > us your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the > line, or that the line endings in your data file are from a > different OS than the one you're running the script on. (Or that you > are running a very old version of BioPerl, which is entirely > possible if you installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls >> out the full sequence in chunks (based on local $/ = "\n>") and >> splits the header off as the first line in that chunk. You could >> probably try leaving the format out and letting SeqIO guess it, or >> passing the file into Bio::Tools::GuessSeqFormat directly, but it's >> probably better to go through the files and add a file extension >> that corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being >>> treated as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a >>>> fix for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 01:12:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 00:12:13 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> Message-ID: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be > most convenient (= most likely to get looked carefully) > and is the best place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From eigenrosen at gmail.com Thu Aug 6 03:12:24 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 00:12:24 -0700 Subject: [Bioperl-l] Trouble with Clustalw Message-ID: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> I'm a complete bioperl novice, trying to do Clustalw on some fasta files, and am running into trouble: ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 550. Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 551. Can't exec "align": No such file or directory at /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/ Root/Root.pm:328 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 STACK: TestClust:22 ----------------------------------------------------------- Here's my code: #!/usr/bin/perl -w use Bio::Perl; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::Seq; use strict; use warnings; my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); my @seq_array = read_all_sequences($ARGV[0],'fasta'); for (my $i = 0; $i < @seq_array; $i++){ (my $seq = $seq_array[$i]->seq()) =~ s/-//g; $seq_array[$i]->seq($seq); } write_sequence(">test",'fasta', at seq_array); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); my @align_array = $aln->each_seq(); write_sequence(">testfile",'fasta', at align_array); The loop is just there to take out some gaps that were placed in a blast previous to this. The write_sequence call confirms that @seq_array is a valid array of Bio:Seq objects at the time align calls it. Here's some output in "test": >A0220B0939one.1 FV584Q101DEWY9 TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >A0220B0939one.2 FV584Q101A4DG7 TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG ... Thanks, Mike From florian.mittag at uni-tuebingen.de Thu Aug 6 05:38:38 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 11:38:38 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907151500.21947.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> Message-ID: <200908061138.38809.florian.mittag@uni-tuebingen.de> Hi! I just noticed, that we didn't solve this problem completely. On Wednesday, 15. July 2009 15:00, Florian Mittag wrote: > > Well, it is like this with version 9.5 of DB2 Express-C: > > > > SELECT NULL FROM bioentry; > > > > yields: > > SQL0206N "NULL" is not valid in the context where it is used. > > SQLSTATE=42703 SQLCODE=-206 > > > > But if I do: > > > > SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; > > > > [...] > > > > It ran fine without the NULL column, but that isn't necessarily a sign of > > correctness. My problem was that (as stated above) the old version of DB2 > > requires you to cast the NULL value to a data type, which I wasn't able > > to determine from the code. With the new version, it should work, so I'll > > have to rerun my tests again and see if the problem is still there. > > You convinced me that the NULL column is supposed to be there, so I found > another workaround around line 1273 in BaseDriver.pm: > > if((! $attr) || (! $entitymap->{$tbl}) || > $dont_select_attrs->{$tbl .".". $attr}) { > #push(@attrs, "NULL"); > push(@attrs, "cast(NULL as VARCHAR(255))"); > } else { > > Since I don't know how to determine the datatype of the column that is set > to NULL, I simply chose VARCHAR and tested it. And it worked! (BTW: The > column set to NULL is named "rank" in the case below.) Although this solution works, it is not the best, because it breaks compatibility with all other database types, e.g., MySQL. Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" only when the driver is DB2? - Florian From hlapp at gmx.net Thu Aug 6 09:36:08 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:36:08 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: Why is specifying fasta format when your input is not in fasta format not a user error? I agree with the not removing newlines in raw format being a bug. -hilmar On Aug 6, 2009, at 1:12 AM, Chris Fields wrote: > Just to confirm: the following is using bioperl-live on my macbook > pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug > or a user issue (if it's the former, we can easily add an exception > indicating lack of a header). Note that 'raw' also fails for the > raw example below (doesn't appear to remove newlines). > > -c > > cjfields4:fasta cjfields$ cat raw_v_fasta.pl > #!/usr/bin/perl -w > > use strict; > use warnings; > use IO::String; > use Bio::SeqIO; > use Test::More qw(no_plan); > > my %seq; > > $seq{raw} = < MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > RAW > > $seq{fasta} = < >CATH_RAT > MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > FASTA > > my %newdata; > for my $input (sort keys %seq) { > my $fh = IO::String->new($seq{$input}); > my $seq = Bio::SeqIO->new(-format => 'fasta', > -fh => $fh)->next_seq; > $newdata{$input} = $seq->seq; > } > is($newdata{raw}, $newdata{fasta}, 'format'); > > cjfields4:fasta cjfields$ perl raw_v_fasta.pl > not ok 1 - format > # Failed test 'format' > # at raw_v_fasta.pl line 36. > # got: > 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > # expected: > 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > 1..1 > # Looks like you failed 1 test of 1. > > On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > >> If these items were included in a Bugzilla report, that would be >> most convenient (= most likely to get looked carefully) >> and is the best place for us to keep track of these kinds of >> issues-- http://bugzilla.bioperl.org/ >> cheers MAJ >> ----- Original Message ----- From: "Hilmar Lapp" >> To: "Chris Fields" >> Cc: "BioPerl List" >> Sent: Wednesday, August 05, 2009 6:53 PM >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> Uwe - I'd like you to go back to Chris' initial questions that >>> you haven't answered yet: "What version of bioperl are you using, >>> OS, etc? What does your data look like?" I'd add to that, can >>> you show us your full script, or a smaller code snippet that >>> reproduces the problem. >>> I suspect that either something in your script is swallowing the >>> line, or that the line endings in your data file are from a >>> different OS than the one you're running the script on. (Or that >>> you are running a very old version of BioPerl, which is entirely >>> possible if you installed through CPAN.) >>> -hilmar >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out the full sequence in chunks (based on local $/ = "\n>") and >>>> splits the header off as the first line in that chunk. You >>>> could probably try leaving the format out and letting SeqIO >>>> guess it, or passing the file into Bio::Tools::GuessSeqFormat >>>> directly, but it's probably better to go through the files and >>>> add a file extension that corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate >>>>> what format >>>>> to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated as >>>>> such (=remove first line) - regardless of whether they really >>>>> are fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences >>>>>> are >>>>>> being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>>> forced to >>>>> guessing). In any case, it's always advisable to explicitly >>>>> indicate >>>>> the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>>> $/ i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless of >>>>>> whether the files were indeed fasta files or files that only >>>>>> contained >>>>>> sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix for >>>>>> this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Aug 6 09:42:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:42:06 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200908061138.38809.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> Message-ID: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > only when the driver is DB2? Not yet, but that's the solution I had in mind, i.e., introducing a method in the Bio::DB::DBI::* (driver-specific) classes that returns whatever NULL as a SELECT field should be represented as. What will be very hard or nearly impossible to do is to cast to the actual type of the column, so if simply using VARCHAR(255) does the trick for DB2 that'd be great. BTW you did check that simply aliasing the column does not fix the problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will throw an error, right? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florian.mittag at uni-tuebingen.de Thu Aug 6 10:12:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 16:12:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> Message-ID: <200908061612.21852.florian.mittag@uni-tuebingen.de> On Thursday, 6. August 2009 15:42, Hilmar Lapp wrote: > On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > > only when the driver is DB2? > > Not yet, but that's the solution I had in mind, i.e., introducing a > method in the Bio::DB::DBI::* (driver-specific) classes that returns > whatever NULL as a SELECT field should be represented as. Sounds like a good idea! > What will be > very hard or nearly impossible to do is to cast to the actual type of > the column, so if simply using VARCHAR(255) does the trick for DB2 > that'd be great. Surprisingly, it does. At least, I haven't noticed any problems if the target data type is for example an integer. With all the trouble I have with DB2, I didn't expect this. > BTW you did check that simply aliasing the column does not fix the > problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will > throw an error, right? Yepp: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL AS col1, term.ontology_id FROM term WHERE identifier = ? [IBM][CLI Driver][DB2/LINUX] SQL0418N A statement contains a use of an untyped parameter marker or a null value that is not valid. - Florian From hilgert at cshl.edu Thu Aug 6 11:01:05 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:01:05 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: I'm not sure what version we have. Cornel may have installed it a while ago from CVS: Module id = Bio::Root::Build CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm INST_VERSION 1.006900 cpan> m Bio::Root::Version Module id = Bio::Root::Version CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm INST_VERSION 1.006900 cpan> m Bio::SeqIO Module id = Bio::SeqIO CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm INST_VERSION undef Cornel still has the checked-out "bioperl-live" directory and the last changes are from March this year. As per why he used "Fasta" instead of 'fasta" as the format parameter in Bio::SeqIO, it's because that what it says in the modules manual. He now tried 'fasta' instead and see no changes in behavior. Omitting the format parameter altogether, fasta-formatted sequence continues to be treated correctly, the first line being removed. However, raw sequence is being treated differently in that the first line is not being removed any more. Instead, the program returns the first line only. Which, in the example I am going to forward in my next message, will return 60 amino acids out of raw sequence of 300 aa. Can't win with raw sequence... The files may be created on different platforms, we didn't notice any difference between using files created on Windows or Linux. Thanks Uwe -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Wednesday, August 05, 2009 6:54 PM To: Chris Fields Cc: Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hilgert at cshl.edu Thu Aug 6 11:03:53 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:03:53 -0400 Subject: [Bioperl-l] FW: Bio::SeqIO issue Message-ID: If you don't specify any format only the first line gets returned: not ok 1 - format # Failed test 'format' # at test/test_fasta.pl line 35. # got: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. -----Original Message----- From: Hilgert, Uwe Sent: Thursday, August 06, 2009 9:12 AM To: Ghiban, Cornel Subject: FW: [Bioperl-l] Bio::SeqIO issue -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 1:12 AM To: Mark A. Jensen Cc: Hilgert, Uwe; BioPerl List; Hilmar Lapp Subject: Re: [Bioperl-l] Bio::SeqIO issue Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWT FSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCK FNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVG YGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be most > convenient (= most likely to get looked carefully) and is the best > place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From hlapp at gmx.net Thu Aug 6 11:18:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 11:18:06 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while > ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the > format parameter altogether, fasta-formatted sequence continues to be > treated correctly, the first line being removed. However, raw sequence > is being treated differently in that the first line is not being > removed > any more. Instead, the program returns the first line only. Which, in > the example I am going to forward in my next message, will return 60 > amino acids out of raw sequence of 300 aa. Can't win with raw > sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bosborne11 at verizon.net Thu Aug 6 11:20:49 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 11:20:49 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <2F73C3DC-D943-4EC3-834A-EA2984FDDB5D@verizon.net> Uwe et al, Yes, this argument works irrespective of case: The format name is case-insensitive: 'FASTA', 'Fasta' and 'fasta' are all valid. From Bio::SeqIO. Brian O. On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the From cjfields at illinois.edu Thu Aug 6 12:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:30:01 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> On Aug 6, 2009, at 8:36 AM, Hilmar Lapp wrote: > Why is specifying fasta format when your input is not in fast format > not a user error? Agreed. My point is should we worry about adding an exception (which may be a little more user-friendly). Right now the bad stuff happens silently. > I agree with the not removing newlines in raw format being a bug. > > -hilmar Acc. to the SeqIO::raw docs, this is a little trickier. The documented behavior explicitly indicates that each line (sans non- whitespace) is assumed to be a separate sequence, so changing that behavior breaks API. I suppose we can have $/ set locally to a cached $/ default value or undef: # assumes entire file is read in my $io = Bio::SeqIO->new(-format => 'raw', -gulp => 1); chris From hlapp at gmx.net Thu Aug 6 12:42:00 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 12:42:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> Message-ID: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> On Aug 6, 2009, at 12:30 PM, Chris Fields wrote: > Agreed. My point is should we worry about adding an exception > (which may be a little more user-friendly). Right now the bad stuff > happens silently. Great point. We don't want silent failures, do we. > >> I agree with the not removing newlines in raw format being a bug. >> >> -hilmar > > Acc. to the SeqIO::raw docs, this is a little trickier. The > documented behavior explicitly indicates that each line (sans non- > whitespace) is assumed to be a separate sequence, so changing that > behavior breaks API. Ah - true indeed. I like the optional argument feature - that way it's easy for the user to choose. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Thu Aug 6 12:49:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:49:53 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From biopython at maubp.freeserve.co.uk Thu Aug 6 12:51:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Aug 2009 17:51:34 +0100 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> Message-ID: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: > >>> I agree with the not removing newlines in raw format being a bug. >>> >>> ? ? ? ?-hilmar >> >> Acc. to the SeqIO::raw docs, this is a little trickier. ?The documented >> behavior explicitly indicates that each line (sans non-whitespace) is >> assumed to be a separate sequence, so changing that behavior breaks API. > > Ah - true indeed. I like the optional argument feature - that way it's easy > for the user to choose. > For reference, "raw" as a format in EMBOSS seems to give just one sequence regardless of any line breaks. Adding an optional argument might be clearest, but have you considered using the new BioPerl SeqIO variant argument to have two forms of raw (the original variant giving one sequence per line, and a new variant where you just get one sequence regardless of any line breaks)? Peter From cjfields at illinois.edu Thu Aug 6 12:58:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:58:07 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> Message-ID: On Aug 6, 2009, at 11:51 AM, Peter wrote: > On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: >> >>>> I agree with the not removing newlines in raw format being a bug. >>>> >>>> -hilmar >>> >>> Acc. to the SeqIO::raw docs, this is a little trickier. The >>> documented >>> behavior explicitly indicates that each line (sans non-whitespace) >>> is >>> assumed to be a separate sequence, so changing that behavior >>> breaks API. >> >> Ah - true indeed. I like the optional argument feature - that way >> it's easy >> for the user to choose. >> > > For reference, "raw" as a format in EMBOSS seems to give just one > sequence regardless of any line breaks. Yes, and that's the behavior I would expect, actually. > Adding an optional argument might be clearest, but have you considered > using the new BioPerl SeqIO variant argument to have two forms of raw > (the original variant giving one sequence per line, and a new variant > where you just get one sequence regardless of any line breaks)? > > Peter That's a good point. We'd have to keep 'raw' as the prior behavior, but 'raw-complete' could be used for such a circumstance ('raw-gulp' sounds just wrong ;) chris From rmb32 at cornell.edu Thu Aug 6 13:14:12 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 Aug 2009 10:14:12 -0700 Subject: [Bioperl-l] tigrxml parsing Message-ID: <4A7B0F64.9070205@cornell.edu> Hi all, Recently in #bioperl somebody came by trying to use Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz svn HEAD tigrxml.pm was not at all happy with these files, eventually dieing with ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: start is undefined STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 STACK: Bio::RangeI::contains Bio/RangeI.pm:255 STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/Generic.pm:783 STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/Base.pm:266 STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/Expat.pm:225 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/Expat.pm:45 STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm:2631 STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 STACK: /crypt/rob/test2.pl:10 ----------------------------------------------------------- Looking at the medicago XML and comparing it to the bioperl-live/t/data/test.tigrxml, the two look VERY different in structure. Lots of things that are attrs in test.tigrxml seem to be elements in the medicago XML, for example. So I guess the question is: is the medicago TIGR XML malformed? Can tigrxml.pm be expected to parse it? What, if anything, should be done about this? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From hilgert at cshl.edu Thu Aug 6 15:36:36 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 15:36:36 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Hmmm, I fail to see how supplying raw sequence could be a called "bad" input or a "problem". In our case, for example, not every user is a bioinformatics expert and Cornel was suggesting to account for that instead of trying to "train" the user to adhere to requirements that have not much to do with what s/he tries to accomplish. I don't really see data being modified, rather that the data format is being adopted to the needs of the software; which I would argue should be something the software is being able to take care of. Uwe -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 12:50 PM To: Ghiban, Cornel Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 16:09:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:09:22 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <6729F9CC-ACF9-4BC4-9905-7EA24C1DCA61@illinois.edu> If one supplies raw sequence (no descriptor) to a FASTA parser (requires a descriptor), then it is bad input. One can't reasonably expect the parser to work correctly under those circumstance. Garbage in, garbage out. The simplest and (IMHO) best solution under such circumstances is for the parser to die meaningfully ("Sequence is not FASTA format; '>' descriptor line is missing" or similar). Tacking a '>' onto bad data doesn't make it magically work, it's just bad data with a '>' appended. To take this one step further, what if this were genbank data? Or XML? A well-formed exception, though initially inconvenient to the user, will indicate the problem right away. Silently trying to fix the problem by appending '>' to bad input data wouldn't work, and the resulting failure downstream (likely from validate_seq) would obscure the real problem, being the user is using the wrong format parser. chris On Aug 6, 2009, at 2:36 PM, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being > adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > >> Hi, >> >> It doesn't matter what sequence we use. As Chris Fields's showed in >> his test, not having >> ">" as the 1st character on the first line is the problem. >> We always assumed the sequence is in FASTA format and this seems to >> be wrong. >> >> I think, the solution to our problem is to check whether the ">" >> symbol is present or not. >> If not present then it will be added. >> >> Thank you, >> Cornel Ghiban >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Thursday, August 06, 2009 11:18 AM >> To: Hilgert, Uwe >> Cc: Chris Fields; BioPerl List; Ghiban, Cornel >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> Uwe - could you send an actual data file (as an attachment) that >> reproduces the problem, or is that not possible? >> >> -hilmar >> >> On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: >> >>> I'm not sure what version we have. Cornel may have installed it a >>> while ago from CVS: >>> >>> Module id = Bio::Root::Build >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::Root::Version >>> Module id = Bio::Root::Version >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::SeqIO >>> Module id = Bio::SeqIO >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >>> INST_VERSION undef >>> >>> Cornel still has the checked-out "bioperl-live" directory and the >>> last >>> changes are from March this year. >>> >>> As per why he used "Fasta" instead of 'fasta" as the format >>> parameter >>> in Bio::SeqIO, it's because that what it says in the modules manual. >>> He now tried 'fasta' instead and see no changes in behavior. >>> Omitting >>> the format parameter altogether, fasta-formatted sequence continues >>> to >>> be treated correctly, the first line being removed. However, raw >>> sequence is being treated differently in that the first line is not >>> being removed any more. Instead, the program returns the first line >>> only. Which, in the example I am going to forward in my next >>> message, >>> will return 60 amino acids out of raw sequence of 300 aa. Can't win >>> with raw sequence... >>> >>> >>> The files may be created on different platforms, we didn't notice >>> any >>> difference between using files created on Windows or Linux. >>> >>> Thanks >>> Uwe >>> >>> >>> >>> >>> -----Original Message----- >>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>> Sent: Wednesday, August 05, 2009 6:54 PM >>> To: Chris Fields >>> Cc: Hilgert, Uwe; BioPerl List >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> >>> Uwe - I'd like you to go back to Chris' initial questions that you >>> haven't answered yet: "What version of bioperl are you using, OS, >>> etc? >>> What does your data look like?" I'd add to that, can you show us >>> your >>> full script, or a smaller code snippet that reproduces the problem. >>> >>> I suspect that either something in your script is swallowing the >>> line, >>> or that the line endings in your data file are from a different OS >>> than the one you're running the script on. (Or that you are >>> running a >>> very old version of BioPerl, which is entirely possible if you >>> installed through CPAN.) >>> >>> -hilmar >>> >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out >>>> the full sequence in chunks (based on local $/ = "\n>") and splits >>>> the header off as the first line in that chunk. You could probably >>>> try leaving the format out and letting SeqIO guess it, or passing >>>> the >>>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>>> better to go through the files and add a file extension that >>>> corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate what >>>>> format to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated >>>>> as such (=remove first line) - regardless of whether they really >>>>> are >>>>> fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe >>>>> Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences are being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>>> to guessing). In any case, it's always advisable to explicitly >>>>> indicate the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>>> i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless >>>>>> of whether the files were indeed fasta files or files that only >>>>>> contained sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix >>>>>> for this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:25:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:25:45 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> Message-ID: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Michael, Are you using ClustalW 2? I'm not sure but I don't think the wrapper has been updated for the latest version (I think parsing still works, though). chris On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > I'm a complete bioperl novice, trying to do Clustalw on some fasta > files, and am running into trouble: > > ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 550. > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 551. > Can't exec "align": No such file or directory at /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - > output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ > Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 > STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 > STACK: TestClust:22 > ----------------------------------------------------------- > > Here's my code: > > #!/usr/bin/perl -w > > use Bio::Perl; > use Bio::AlignIO; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::Seq; > use strict; > use warnings; > > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my @seq_array = read_all_sequences($ARGV[0],'fasta'); > > for (my $i = 0; $i < @seq_array; $i++){ > (my $seq = $seq_array[$i]->seq()) =~ s/-//g; > $seq_array[$i]->seq($seq); > } > > write_sequence(">test",'fasta', at seq_array); > > my $seq_array_ref = \@seq_array; > my $aln = $factory->align($seq_array_ref); > > my @align_array = $aln->each_seq(); > write_sequence(">testfile",'fasta', at align_array); > > > The loop is just there to take out some gaps that were placed in a > blast previous to this. The write_sequence call confirms that > @seq_array is a valid array of Bio:Seq objects at the time align > calls it. Here's some output in "test": > > >A0220B0939one.1 FV584Q101DEWY9 > TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC > CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT > TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT > TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG > CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG > CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA > CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA > CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT > AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG > >A0220B0939one.2 FV584Q101A4DG7 > TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG > ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC > AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG > TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG > GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA > GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT > CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT > CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT > ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG > ... > > Thanks, > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:30:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:30:30 -0500 Subject: [Bioperl-l] tigrxml parsing In-Reply-To: <4A7B0F64.9070205@cornell.edu> References: <4A7B0F64.9070205@cornell.edu> Message-ID: Robert, This popped up recently (may be related): http://thread.gmane.org/gmane.comp.lang.perl.bio.general/19782 http://bugzilla.open-bio.org/show_bug.cgi?id=2868 It might be possible to map this into bioperl, but someone needs to take it up. chris On Aug 6, 2009, at 12:14 PM, Robert Buels wrote: > Hi all, > > Recently in #bioperl somebody came by trying to use > Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz > > svn HEAD tigrxml.pm was not at all happy with these files, > eventually dieing with > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: start is undefined > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 > STACK: Bio::RangeI::contains Bio/RangeI.pm:255 > STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/ > Generic.pm:783 > STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 > STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 > STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/ > Base.pm:266 > STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/ > Expat.pm:225 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm: > 469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/ > Expat.pm:45 > STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 > STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm: > 2631 > STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 > STACK: /crypt/rob/test2.pl:10 > ----------------------------------------------------------- > > Looking at the medicago XML and comparing it to the bioperl-live/t/ > data/test.tigrxml, the two look VERY different in structure. Lots > of things that are attrs in test.tigrxml seem to be elements in the > medicago XML, for example. > > So I guess the question is: is the medicago TIGR XML malformed? > Can tigrxml.pm be expected to parse it? What, if anything, should > be done about this? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From eigenrosen at gmail.com Thu Aug 6 16:39:09 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 13:39:09 -0700 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Hi Chris, I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the top of the module being called. Mike On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the > wrapper has been updated for the latest version (I think parsing > still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > >> I'm a complete bioperl novice, trying to do Clustalw on some fasta >> files, and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >> Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a >> blast previous to this. The write_sequence call confirms that >> @seq_array is a valid array of Bio:Seq objects at the time align >> calls it. Here's some output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Thu Aug 6 16:49:56 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 6 Aug 2009 16:49:56 -0400 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <69367b8f0908061349i48f4d2b1tcbccb00d5a3de5ca@mail.gmail.com> Hi Micheal, Have you considered calling clustalw from perl's "system" command and passing in the files for alignment? Mgavi On Thu, Aug 6, 2009 at 4:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > > I'm a complete bioperl novice, trying to do Clustalw on some fasta files, >> and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf -output=gcg >> -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a blast >> previous to this. The write_sequence call confirms that @seq_array is a >> valid array of Bio:Seq objects at the time align calls it. Here's some >> output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Aug 6 17:00:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 16:00:37 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <2C8DF4CB-40B0-41DB-882A-AAF346A008B2@illinois.edu> Michael, No, I meant was what version of clustalw (the actual executable) you are using. This is the bioperl wrapper svn version. What happens if you enter 'clustalw' on the command line? Do you get: ************************************************************** ******** CLUSTAL 2.0.11 Multiple Sequence Alignments ******** ************************************************************** I think the above version has problems with bioperl, though I can't recall exactly what the problems were. chris On Aug 6, 2009, at 3:39 PM, Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at > the top of the module being called. > > Mike > On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has been updated for the latest version (I think parsing >> still works, though). >> >> chris >> >> On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: >> >>> I'm a complete bioperl novice, trying to do Clustalw on some fasta >>> files, and am running into trouble: >>> >>> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 550. >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 551. >>> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >>> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >>> Bio/Root/Root.pm:328 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >>> STACK: TestClust:22 >>> ----------------------------------------------------------- >>> >>> Here's my code: >>> >>> #!/usr/bin/perl -w >>> >>> use Bio::Perl; >>> use Bio::AlignIO; >>> use Bio::Tools::Run::Alignment::Clustalw; >>> use Bio::SimpleAlign; >>> use Bio::Seq; >>> use strict; >>> use warnings; >>> >>> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >>> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >>> >>> for (my $i = 0; $i < @seq_array; $i++){ >>> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >>> $seq_array[$i]->seq($seq); >>> } >>> >>> write_sequence(">test",'fasta', at seq_array); >>> >>> my $seq_array_ref = \@seq_array; >>> my $aln = $factory->align($seq_array_ref); >>> >>> my @align_array = $aln->each_seq(); >>> write_sequence(">testfile",'fasta', at align_array); >>> >>> >>> The loop is just there to take out some gaps that were placed in a >>> blast previous to this. The write_sequence call confirms that >>> @seq_array is a valid array of Bio:Seq objects at the time align >>> calls it. Here's some output in "test": >>> >>> >A0220B0939one.1 FV584Q101DEWY9 >>> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >>> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >>> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >>> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >>> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >>> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >>> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >>> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >>> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >>> >A0220B0939one.2 FV584Q101A4DG7 >>> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >>> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >>> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >>> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >>> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >>> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >>> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >>> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >>> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >>> ... >>> >>> Thanks, >>> Mike >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From bosborne11 at verizon.net Thu Aug 6 16:01:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 16:01:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Chris, Yes, I think so. By the way, this is related to an old bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1508 Brian O. > This is a simple validation issue: should we throw an exception on > bad input (no '>') From bix at sendu.me.uk Thu Aug 6 17:18:02 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 06 Aug 2009 22:18:02 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <4A7B488A.2060600@sendu.me.uk> Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the > top of the module being called. I'm guessing your error is caused simply by not having clustalw installed. BioPerl run modules provide perl wrappers to external executables. They don't replace the need for those executables. From cjfields at illinois.edu Thu Aug 6 20:47:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 19:47:47 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: I added the exception and tests to svn (r15895), so I closed that bug out. Almost forgot about that one, thanks for pointing it out! chris On Aug 6, 2009, at 3:01 PM, Brian Osborne wrote: > Chris, > > Yes, I think so. > > By the way, this is related to an old bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1508 > > > Brian O. > > >> This is a simple validation issue: should we throw an exception on >> bad input (no '>') > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 22:30:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 21:30:09 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A765A44.7030902@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: Jonathan, Just to make sure you aren't accidentally 'warnocked' by the core devs: Your code sounds quite nice! However, we will begin the process of massively restructuring bioperl pretty soon, so I don't think it's a good idea to gear your code towards fitting directly into core. The best alternative should be fairly obvious, which is to release it to CPAN listing BioPerl 1.6.0 as a dependency if it is required. Your modules may or may not need the Bio* namespace (that's up to you, actually); there are several non-bioperl modules that also share the Bio* namespace, and I believe there are modules that aren't Bio* that use BioPerl (Gbrowse comes to mind). If you're focusing on interaction with robotics, Robotics::Bio::X might be a better namespace for instance (b/c you could expand later into other possibly non-bio robotics interfaces). The cpan-discuss list is probably a good place to ask, or (after you register on PAUSE) you can register the module namespace and see if there are any objections to the request. chris On Aug 2, 2009, at 10:32 PM, Jonathan Cline wrote: > Smithies, Russell wrote: >> I "acquired" an old Biomek 1000 that I'm thinking of modernising. >> It was originally controlled by a monstrously large but slow pc >> (IBM Value Point Model 466DX2 computer with Microsoft Windows* >> Version 3.1) >> My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) >> and use software like mach3 www.machsupport.com along with G-code >> to control it. >> I come from an engineering background so it seemed like the easy >> way to me :-) >> >> Now I just need a bit of free time to get it working... >> >> --Russell >> >> >> > I agree, that's probably the best way to go. It's hard to know what > amount of s/w processing was done on the host PC vs. the embedded > controller. If you were able to connect directly to the robot > hardware > with serial port(s) or whatever it's using, it would be tough to find > out the comm protocol unless someone has already reverse engineered it > (which is doubtful). Also from what I have seen online, attempting > to > run the old software under virtual machine is unpredictable due to > timing differences in the serial port communication. So removal of > the > old electronics is probably the best bet. If it has one arm, then > it's > much easier. > > As for robots with working workstation software, it seems the > annoyance > factor is that while the scripting languages are powerful (for GUI > scripting that is), they are still relatively low level. Bio types > with > a bit of CS seem to immediately turn to visual basic, labview, or even > excel spreadsheets and macros, in order to provide a higher level > abstraction for the workstation software. To me, it seems natural > that > there should be a "protocol compiler" which takes biology protocols as > input, and gives robot instructions as output (google "protolexer"). > The huge bottleneck of course is that everyone's robotics work tables > and equipment are somewhat unique to their needs. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > > >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>> Sent: Thursday, 30 July 2009 2:07 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Cc: Jonathan Cline >>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>> >>> I am writing a module for communication with biology robotics, as >>> discussed recently on #bioperl, and I invite your comments. >>> >>> Currently this mode talks to a Tecan genesis workstation robot ( >>> http://images.google.com/images?q=tecan genesis ). Other vendors >>> are >>> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >>> 'net with the exception of some visual basic and labview scripts >>> which I >>> have found. There are some computational biologists who program for >>> robots via high level s/w, but these scripts are not distributed >>> as OSS. >>> >>> With Tecan, there is a datapipe interface for hardware >>> communication, as >>> an added $$ option from the vendor. I haven't checked other >>> vendors to >>> see if they likewise have an open communication path for third party >>> software. By allowing third-party communication, then naturally the >>> next step is to create a socket client-server; especially as the >>> robot >>> vendor only support MS Win and using the local machine has typical >>> Microsoft issues (like losing real time communication with the >>> hardware >>> due to GUI animation, bad operating system stability, no unix except >>> cygwin, etc). >>> >>> >>> On Namespace: >>> >>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are >>> many >>> s/w modules already called 'robots' (web spider robots, chat bots, >>> www >>> automate, etc) so I chose the longer name "robotics" to >>> differentiate >>> this module as manipulating real hardware. Bio::Robotics is the >>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>> manufacturer-specific implementation. Robot control is made more >>> complex due to the very configurable nature of the work table >>> (placement >>> of equipment, type of equipment, type of attached arm, etc). The >>> abstraction has to be careful not to generalize or assume too >>> much. In >>> some cases, the Bio::Robotics modules may expand to arbitrary >>> equipment >>> such as thermocyclers, tray holders, imagers, etc - that could be a >>> future roadmap plan. >>> >>> Here is some theoretical example usage below, subject to change. At >>> this time I am deciding how much state to keep within the Perl >>> module. >>> By keeping state, some robot programming might be simplified >>> (avoiding >>> deadlock or tracking tip state). In general I am aiming for a more >>> "protocol friendly" method implementation. >>> >>> >>> To use this software with locally-connected robotics hardware: >>> >>> use Bio::Robotics; >>> >>> my $tecan = Bio::Robotics->new("Tecan") || die; >>> $tecan->attach() || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack1"); >>> $tecan->pipette(aspirate => "1", dispense => "1", from => >>> "sampleTray", to >>> => "DNATray"); >>> ... >>> >>> To use this software with remote robotics hardware over the network: >>> >>> # On the local machine, run: >>> use Bio::Robotics; >>> >>> my @connected_hardware = Bio::Robotics->query(); >>> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >>> @connected_hardware\n"; >>> $tecan->attach() || die; >>> $tecan->configure("my work table configuration file") || die; >>> # Run the server and process commands >>> while (1) { >>> $error = $tecan->server(passwordplaintext => "0xd290"); >>> if ($tecan->lastClientCommand() =~ /^shutdown/) { >>> last; >>> } >>> } >>> $tecan->detach(); >>> exit(0); >>> >>> # On the remote machine (the client), run: >>> use Bio::Robotics; >>> >>> my $server = "heavybio.dyndns.org:8080"; >>> my $password = "0xd290"; >>> my $tecan = Bio::Robotics->new("Tecan"); >>> $tecan->connect($server, $mypassword) || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack200"); >>> $tecan->pipette(aspirate => "1", dispense => "1", >>> from => "sampleTray A1", to => "DNATray A2", >>> volume => "45", liquid => "Buffer"); >>> $tecan->pipette(drop => "1"); >>> ... >>> $tecan->disconnect(); >>> exit(0); >>> >>> >>> >>> -- >>> >>> ## Jonathan Cline >>> ## jcline at ieee.org >>> ## Mobile: +1-805-617-0223 >>> ######################## >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Aug 7 05:19:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Aug 2009 10:19:14 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? ?I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris That shouldn't matter, according to Des Higgins ClustalW 2 is intended to be completely compatible with ClustalW 1.83, including the command line options. They will be adding new stuff in ClustalW 3. The only think to worry about with ClustalW 2 is parsing the output, as the header line of the alignments has changed very slightly. I can tell you from personal experience that the Biopython command line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for example, and would expect the same to be true for BioPerl. Peter From paola.bisignano at gmail.com Fri Aug 7 08:11:58 2009 From: paola.bisignano at gmail.com (Paola Bisignano via Scour) Date: Fri, 7 Aug 2009 05:11:58 -0700 Subject: [Bioperl-l] Scour Friend Invite Message-ID: <4a7c1a0e5b82d@gmail.com> Hey, Check out: http://scour.com/invite/paola82/ I'm using a new search engine called Scour.com. It shows Google/Yahoo/MSN results and user comments all on one page. Best of all we get rewarded for using it by collecting points with every search, comment and vote. The points are redeemable for Visa gift cards. Join through my invite link so we can be friends and search socially! I know you'll like it, - Paola Bisignano This message was sent to you as a friend referral to join scour.com, please feel free to review our http://scour.com/privacy page and our http://scour.com/communityguidelines/antispam page. If you prefer not to receive invitations from ANY scour members, please click here - http://www.scour.com/unsub/e/YmlvcGVybC1sQGxpc3RzLm9wZW4tYmlvLm9yZw== Write to us at: Scour, Inc., 15303 Ventura Blvd. Suite 220, Sherman Oaks, CA 91403, USA. campaignid: scour200908070001 Scour.com From hlapp at gmx.net Fri Aug 7 09:21:51 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 7 Aug 2009 09:21:51 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4a7c1a0e5b82d@gmail.com> References: <4a7c1a0e5b82d@gmail.com> Message-ID: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Just FYI, I am addressing this offline. Note to everyone: we don't tolerate this and it will get you removed from the list immediately (and banned for the second offense). This is a large list. You better spend the time and be very careful who you send this kind of stuff to before you waste everyone else's. -hilmar From stefan.kirov at bms.com Fri Aug 7 10:25:52 2009 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 07 Aug 2009 10:25:52 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> References: <4a7c1a0e5b82d@gmail.com> <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Message-ID: <4A7C3970.10501@bms.com> Hilmar Lapp wrote: > Just FYI, I am addressing this offline. Note to everyone: we don't > tolerate this and it will get you removed from the list immediately > (and banned for the second offense). This is a large list. You better > spend the time and be very careful who you send this kind of stuff to > before you waste everyone else's. > > -hilmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > It is quite possible this guy has no idea scour is spamming people on his behalf. It seems to me there should be spam-filter trained to take care of these guys. As a reference: http://forums.digitalpoint.com/showthread.php?t=955786 http://markmail.org/message/fzlutwd3mkforbsu -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From jdalzell03 at qub.ac.uk Mon Aug 3 19:18:24 2009 From: jdalzell03 at qub.ac.uk (Johnathan Dalzell) Date: Tue, 4 Aug 2009 00:18:24 +0100 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 Message-ID: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl 5.10 and the activePerl equivalent. I'm wrking through vista, and ovver multiple times, this is the furthest I can get through installation.... Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] a - will install all scripts Do you want to run tests that require connection to servers across the internet (likely to cause some failures)? y/n [n] y - will run internet-requiring tests Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/lib/Data/Dumper.pm lin e 190, line 9. Creating new 'Build' script for 'BioPerl' version '1.006000' ---- Unsatisfied dependencies detected during ---- ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- SOAP::Lite [requires] GraphViz [requires] Convert::Binary::C [requires] Algorithm::Munkres [requires] XML::Twig [requires] DB_File [requires] Set::Scalar [requires] XML::Parser::PerlSAX [requires] XML::Writer [requires] XML::SAX::Writer [requires] Clone [requires] XML::DOM::XPath [requires] PostScript::TextBlock [requires] Running Build test Delayed until after prerequisites Running Build install Delayed until after prerequisites Running install for module 'SOAP::Lite' Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP-Lite-0.710.08.tar.gz ok CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz We are about to install SOAP::Lite and for your convenience will provide you with list of modules and prerequisites, so you'll be able to choose only modules you need for your configuration. XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by default. Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. Press to see the detailed list. Feature Prerequisites Install? ----------------------------- ---------------------------- -------- Core Package [*] Scalar::Util always [*] Test::More [*] URI [*] MIME::Base64 [*] version [*] XML::Parser (v2.23) Client HTTP support [*] LWP::UserAgent always Client HTTPS support [ ] Crypt::SSLeay [ no ] Client SMTP/sendmail support [ ] MIME::Lite [ no ] Client FTP support [*] IO::File [ yes ] [*] Net::FTP Standalone HTTP server [*] HTTP::Daemon [ yes ] Apache/mod_perl server [ ] Apache [ no ] FastCGI server [ ] FCGI [ no ] POP3 server [ ] MIME::Parser [ no ] [*] Net::POP3 IO server [*] IO::File [ yes ] MQ transport support [ ] MQSeries [ no ] JABBER transport support [ ] Net::Jabber [ no ] MIME messages [ ] MIME::Parser [ no ] DIME messages [*] IO::Scalar (v2.105) [ no ] [ ] DIME::Tools (v0.03) [ ] Data::UUID (v0.11) SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] Compression support for HTTP [*] Compress::Zlib [ yes ] MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] --- An asterix '[*]' indicates if the module is currently installed. Do you want to proceed with this configuration? [yes] yes Checking if your kit is complete... Looks good Writing Makefile for SOAP::Lite cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport\TCP.pm cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport\POP3.pm cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema19 99.pm cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema20 01.pm cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport\MQ.pm cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport\FTP.pm cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP\Transport\JABBER.pm cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_2.pm cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport\IO.pm cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_1.pm cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP\Transport\LOCAL.pm cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP\Transport\MAILTO.pm cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/SOAPsh.pl blib\script\S OAPsh.pl pl2bat.bat blib\script\SOAPsh.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/stubmaker.pl blib\scrip t\stubmaker.pl pl2bat.bat blib\script\stubmaker.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/XMLRPCsh.pl blib\script \XMLRPCsh.pl pl2bat.bat blib\script\XMLRPCsh.pl MKUTTER/SOAP-Lite-0.710.08.tar.gz C:\strawberry\c\bin\dmake.EXE -- OK Running make test C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib\lib' , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/013-array-deserializati on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03-server.t t/04-attach. t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08-schema.t t/096_characters.t t /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t t/IO/SessionSet.t t/SO AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/Deserializer/XMLSchema199 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t t /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/SOAP/Transport/FTP.t t/S OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t t/SOAP/Transport/MAILT O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/HTTP/CGI.t t/XML/Parser /Lite.t t/XMLRPC/Lite.t t/01-core.t .................................. ok t/010-serializer.t ........................... ok t/012-cloneable.t ............................ ok t/013-array-deserialization.t ................ ok t/014_UNIVERSAL_use.t ........................ ok t/015_UNIVERSAL_can.t ........................ ok t/02-payload.t ............................... ok t/03-server.t ................................ ok t/04-attach.t ................................ skipped: Could not find MIME::Parser - is M IME::Tools installed? Aborting. t/05-customxml.t ............................. ok t/06-modules.t ............................... ok t/07-xmlrpc_payload.t ........................ ok t/08-schema.t ................................ ok t/096_characters.t ........................... skipped: (no reason given) t/097_kwalitee.t ............................. skipped: (no reason given) t/098_pod.t .................................. skipped: (no reason given) t/099_pod_coverage.t ......................... skipped: (no reason given) t/IO/SessionData.t ........................... ok t/IO/SessionSet.t ............................ ok t/SOAP/Data.t ................................ ok t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok t/SOAP/Lite/Packager.t ....................... ok t/SOAP/Schema/WSDL.t ......................... ok t/SOAP/Serializer.t .......................... 1/12 Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Lite .pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. t/SOAP/Serializer.t .......................... ok t/SOAP/Transport/FTP.t ....................... 1/7 Use of uninitialized value in split at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 55. substr outside of string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SO AP/Transport/FTP.pm line 56. Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/perl/lib/IO/Socket/INET. pm line 117. Use of uninitialized value $server in concatenation (.) or string at C:\strawberry\cpan\bu ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. t/SOAP/Transport/FTP.t ....................... ok t/SOAP/Transport/HTTP.t ...................... ok t/SOAP/Transport/HTTP/CGI.t .................. everytime I get to the CGI.t at the end here the installation won't move! Any suggestions would be greatly appreciated, I've been trying to force it through, literally for 5 hours now.... cheers, jonny From ghiban at cshl.edu Thu Aug 6 12:04:38 2009 From: ghiban at cshl.edu (Ghiban, Cornel) Date: Thu, 6 Aug 2009 12:04:38 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Message-ID: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Hi, It doesn't matter what sequence we use. As Chris Fields's showed in his test, not having ">" as the 1st character on the first line is the problem. We always assumed the sequence is in FASTA format and this seems to be wrong. I think, the solution to our problem is to check whether the ">" symbol is present or not. If not present then it will be added. Thank you, Cornel Ghiban -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Thursday, August 06, 2009 11:18 AM To: Hilgert, Uwe Cc: Chris Fields; BioPerl List; Ghiban, Cornel Subject: Re: [Bioperl-l] Bio::SeqIO issue Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format parameter > in Bio::SeqIO, it's because that what it says in the modules manual. > He now tried 'fasta' instead and see no changes in behavior. Omitting > the format parameter altogether, fasta-formatted sequence continues to > be treated correctly, the first line being removed. However, raw > sequence is being treated differently in that the first line is not > being removed any more. Instead, the program returns the first line > only. Which, in the example I am going to forward in my next message, > will return 60 amino acids out of raw sequence of 300 aa. Can't win > with raw sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, etc? > What does your data look like?" I'd add to that, can you show us your > full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing the >> file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>> Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences are being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to guessing). In any case, it's always advisable to explicitly >>> indicate the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, regardless >>>> of whether the files were indeed fasta files or files that only >>>> contained sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 8 08:38:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 Aug 2009 08:38:46 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4A7C3970.10501@bms.com> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> Message-ID: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Thanks Stefan--this makes a lot more sense to me than supposing a priori that a previous legitimate user of this list is spamming bioperl-l intentionally. I would prefer to initially give the benefit of the doubt to the intelligence of the users, rather than scare people off who are likely to be already mortified that their emails have been commandeered like this. I would definitely support an spam filter that works. MAJ ----- Original Message ----- From: "Stefan Kirov" To: "Hilmar Lapp" Cc: "BioPerl List" Sent: Friday, August 07, 2009 10:25 AM Subject: Re: [Bioperl-l] Scour Friend Invite > Hilmar Lapp wrote: >> Just FYI, I am addressing this offline. Note to everyone: we don't >> tolerate this and it will get you removed from the list immediately >> (and banned for the second offense). This is a large list. You better >> spend the time and be very careful who you send this kind of stuff to >> before you waste everyone else's. >> >> -hilmar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > It is quite possible this guy has no idea scour is spamming people on > his behalf. It seems to me there should be spam-filter trained to take > care of these guys. > As a reference: > http://forums.digitalpoint.com/showthread.php?t=955786 > http://markmail.org/message/fzlutwd3mkforbsu > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 10:18:59 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 10:18:59 -0400 Subject: [Bioperl-l] SeqIO documentation Message-ID: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Chris, Since we've been discussing formats I just wanted to mention that I've changed this documentation from SeqIO.pm: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then Fasta format is assumed. To: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then SeqIO will throw a fatal error. The code is clear, if SeqIO can't figure out what the format is then it dies, "fasta" is not the default format. Brian O. From cjfields at illinois.edu Sat Aug 8 12:23:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:23:44 -0500 Subject: [Bioperl-l] SeqIO documentation In-Reply-To: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> References: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Message-ID: Brian, That fits current behavior, so yes that makes sense. chris On Aug 8, 2009, at 9:18 AM, Brian Osborne wrote: > Chris, > > Since we've been discussing formats I just wanted to mention that > I've changed this documentation from SeqIO.pm: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then Fasta > format is assumed. > > To: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then SeqIO > will throw a fatal error. > > The code is clear, if SeqIO can't figure out what the format is then > it dies, "fasta" is not the default format. > > > Brian O. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:24:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:24:48 -0500 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Message-ID: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite > > >> Hilmar Lapp wrote: >>> Just FYI, I am addressing this offline. Note to everyone: we don't >>> tolerate this and it will get you removed from the list immediately >>> (and banned for the second offense). This is a large list. You >>> better >>> spend the time and be very careful who you send this kind of stuff >>> to >>> before you waste everyone else's. >>> >>> -hilmar >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> It is quite possible this guy has no idea scour is spamming people on >> his behalf. It seems to me there should be spam-filter trained to >> take >> care of these guys. >> As a reference: >> http://forums.digitalpoint.com/showthread.php?t=955786 >> http://markmail.org/message/fzlutwd3mkforbsu >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:26:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:55 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> Message-ID: <0A43205F-828F-4CC9-ADC3-EBCE92690765@illinois.edu> On Aug 7, 2009, at 4:19 AM, Peter wrote: > On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields > wrote: >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has >> been updated for the latest version (I think parsing still works, >> though). >> >> chris > > That shouldn't matter, according to Des Higgins ClustalW 2 is intended > to be completely compatible with ClustalW 1.83, including the command > line options. They will be adding new stuff in ClustalW 3. The only > think to worry about with ClustalW 2 is parsing the output, as the > header line of the alignments has changed very slightly. > > I can tell you from personal experience that the Biopython command > line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for > example, and would expect the same to be true for BioPerl. > > Peter I would think so as well, but I encountered some issues on my OS using ClustalW 2 with the last release: http://bugzilla.open-bio.org/show_bug.cgi?id=2728 I think it's something small, like something hard-coded in (version maybe) that's causing the problem, just didn't have time to check. chris From cjfields at illinois.edu Sat Aug 8 12:26:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:38 -0500 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <0963ED84-359B-465B-9BA2-956A0AB23587@illinois.edu> Have you tried installing SOAP::Lite directly? That seems to be the hanging point. The funny thing is this is somehow assigning everything as a requirement (SOAP::Lite is a 'recommends'). Worth investigating, but I don't have access to a Windows box (either for XP, Vista, or Win7). Hopefully we'll get a PPM up soon; it's in the roadmap for 1.6.1. In the meantime, (as a strictly temporary measure) have you tried setting PERL5LIB to point to a local copy of bioperl-1.6? chris On Aug 3, 2009, at 6:18 PM, Johnathan Dalzell wrote: > Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl > 5.10 and the activePerl equivalent. I'm wrking through vista, and > ovver multiple times, this is the furthest I can get through > installation.... > > > Install [a]ll Bioperl scripts, [n]one, or choose groups > [i]nteractively? [a] a > - will install all scripts > Do you want to run tests that require connection to servers across > the internet > (likely to cause some failures)? y/n [n] y > - will run internet-requiring tests > Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/ > lib/Data/Dumper.pm lin > e 190, line 9. > Creating new 'Build' script for 'BioPerl' version '1.006000' > ---- Unsatisfied dependencies detected during ---- > ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- > SOAP::Lite [requires] > GraphViz [requires] > Convert::Binary::C [requires] > Algorithm::Munkres [requires] > XML::Twig [requires] > DB_File [requires] > Set::Scalar [requires] > XML::Parser::PerlSAX [requires] > XML::Writer [requires] > XML::SAX::Writer [requires] > Clone [requires] > XML::DOM::XPath [requires] > PostScript::TextBlock [requires] > Running Build test > Delayed until after prerequisites > Running Build install > Delayed until after prerequisites > Running install for module 'SOAP::Lite' > Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP- > Lite-0.710.08.tar.gz > ok > CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > We are about to install SOAP::Lite and for your convenience will > provide > you with list of modules and prerequisites, so you'll be able to > choose > only modules you need for your configuration. > XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by > default. > Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. > Press to see the detailed list. > Feature Prerequisites Install? > ----------------------------- ---------------------------- -------- > Core Package [*] Scalar::Util always > [*] Test::More > [*] URI > [*] MIME::Base64 > [*] version > [*] XML::Parser (v2.23) > Client HTTP support [*] LWP::UserAgent always > Client HTTPS support [ ] Crypt::SSLeay [ no ] > Client SMTP/sendmail support [ ] MIME::Lite [ no ] > Client FTP support [*] IO::File [ yes ] > [*] Net::FTP > Standalone HTTP server [*] HTTP::Daemon [ yes ] > Apache/mod_perl server [ ] Apache [ no ] > FastCGI server [ ] FCGI [ no ] > POP3 server [ ] MIME::Parser [ no ] > [*] Net::POP3 > IO server [*] IO::File [ yes ] > MQ transport support [ ] MQSeries [ no ] > JABBER transport support [ ] Net::Jabber [ no ] > MIME messages [ ] MIME::Parser [ no ] > DIME messages [*] IO::Scalar (v2.105) [ no ] > [ ] DIME::Tools (v0.03) > [ ] Data::UUID (v0.11) > SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] > Compression support for HTTP [*] Compress::Zlib [ yes ] > MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] > --- An asterix '[*]' indicates if the module is currently installed. > Do you want to proceed with this configuration? [yes] yes > Checking if your kit is complete... > Looks good > Writing Makefile for SOAP::Lite > cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod > cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm > cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm > cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm > cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm > cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm > cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm > cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport > \TCP.pm > cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm > cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport > \POP3.pm > cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm > cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod > cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm > cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm > cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm > cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm > cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm > cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod > cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod > cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod > cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm > cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm > cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod > cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm > cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm > cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod > cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema19 > 99.pm > cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm > cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm > cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod > cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport > \HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema20 > 01.pm > cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod > cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm > cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm > cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport > \MQ.pm > cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport > \FTP.pm > cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP > \Transport\JABBER.pm > cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm > cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod > cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm > cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_2.pm > cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport > \IO.pm > cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_1.pm > cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP > \Transport\LOCAL.pm > cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm > cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod > cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm > cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP > \Transport\MAILTO.pm > cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > SOAPsh.pl blib\script\S > OAPsh.pl > pl2bat.bat blib\script\SOAPsh.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > stubmaker.pl blib\scrip > t\stubmaker.pl > pl2bat.bat blib\script\stubmaker.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > XMLRPCsh.pl blib\script > \XMLRPCsh.pl > pl2bat.bat blib\script\XMLRPCsh.pl > MKUTTER/SOAP-Lite-0.710.08.tar.gz > C:\strawberry\c\bin\dmake.EXE -- OK > Running make test > C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib\lib' > , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/ > 013-array-deserializati > on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03- > server.t t/04-attach. > t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08- > schema.t t/096_characters.t t > /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t > t/IO/SessionSet.t t/SO > AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/ > Deserializer/XMLSchema199 > 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/ > Deserializer/XMLSchemaSOAP1_1.t t > /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/ > SOAP/Transport/FTP.t t/S > OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t > t/SOAP/Transport/MAILT > O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/ > HTTP/CGI.t t/XML/Parser > /Lite.t t/XMLRPC/Lite.t > t/01-core.t .................................. ok > t/010-serializer.t ........................... ok > t/012-cloneable.t ............................ ok > t/013-array-deserialization.t ................ ok > t/014_UNIVERSAL_use.t ........................ ok > t/015_UNIVERSAL_can.t ........................ ok > t/02-payload.t ............................... ok > t/03-server.t ................................ ok > t/04-attach.t ................................ skipped: Could not > find MIME::Parser - is M > IME::Tools installed? Aborting. > t/05-customxml.t ............................. ok > t/06-modules.t ............................... ok > t/07-xmlrpc_payload.t ........................ ok > t/08-schema.t ................................ ok > t/096_characters.t ........................... skipped: (no reason > given) > t/097_kwalitee.t ............................. skipped: (no reason > given) > t/098_pod.t .................................. skipped: (no reason > given) > t/099_pod_coverage.t ......................... skipped: (no reason > given) > t/IO/SessionData.t ........................... ok > t/IO/SessionSet.t ............................ ok > t/SOAP/Data.t ................................ ok > t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok > t/SOAP/Lite/Packager.t ....................... ok > t/SOAP/Schema/WSDL.t ......................... ok > t/SOAP/Serializer.t .......................... 1/12 Use of > uninitialized value $values[0] > in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08- > wfOzhM\blib\lib/SOAP/Lite > .pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > t/SOAP/Serializer.t .......................... ok > t/SOAP/Transport/FTP.t ....................... 1/7 Use of > uninitialized value in split at > C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/ > Transport/FTP.pm line 55. > substr outside of string at C:\strawberry\cpan\build\SOAP- > Lite-0.710.08-wfOzhM\blib\lib/SO > AP/Transport/FTP.pm line 56. > Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/ > perl/lib/IO/Socket/INET. > pm line 117. > Use of uninitialized value $server in concatenation (.) or string at > C:\strawberry\cpan\bu > ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. > t/SOAP/Transport/FTP.t ....................... ok > t/SOAP/Transport/HTTP.t ...................... ok > t/SOAP/Transport/HTTP/CGI.t .................. > > everytime I get to the CGI.t at the end here the installation won't > move! Any suggestions would be greatly appreciated, I've been > trying to force it through, literally for 5 hours now.... > > cheers, > jonny > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 12:42:12 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 12:42:12 -0400 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <979637B9-F2EC-47A0-9283-440AA2558481@verizon.net> Jonathan, It looks like you're not the only one having problems with SOAP::Lite on Windows. For a possible workaround: http://objectmix.com/perl/638075-how-install-soap-lite-windows.html Brian O. On Aug 3, 2009, at 7:18 PM, Johnathan Dalzell wrote: > SOAP/Transport/HTTP/CGI From stefan.kirov at bms.com Sat Aug 8 16:45:32 2009 From: stefan.kirov at bms.com (Kirov, Stefan) Date: Sat, 8 Aug 2009 16:45:32 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife>, <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> Message-ID: There is indeed, actually my mail with the same header was held for a while. In any case I think these pay-to-search/invite-colleagues/et spam-whole-address-book sites should be banned if they are not formally not spam, since the user is at least partially aware of the effect. I am not sure if this is a good solution, I am just frustrated, because these companies are quite unethical. Maybe not as unethical as others (few come to my mind, but will not name them :-)), but still... On the other hand they have not been a real problem before. As long as this is not a frequent thing I guess the filter is doing a great job. Stefan ________________________________________ From: Chris Fields [cjfields at illinois.edu] Sent: Saturday, August 08, 2009 12:24 PM To: Mark A. Jensen Cc: Kirov, Stefan; Hilmar Lapp; BioPerl List Subject: Re: [Bioperl-l] Scour Friend Invite I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. From j_martin at lbl.gov Sat Aug 8 22:41:53 2009 From: j_martin at lbl.gov (Joel Martin) Date: Sat, 8 Aug 2009 19:41:53 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <20090809024152.GA26943@eniac.jgi-psf.org> Hello, It sounds like you want a layer to to figure out what they're giving your program before you open it, you could use Bio::Tools::GuessSeqFormat and spare your user the pain of knowledge. It seems reasonable that coddling happens only when requested. use IO::String; use Bio::SeqIO; use Bio::Tools::GuessSeqFormat; my @files = ( 'NC_000913.fasta', '.gb' ); for my $file ( @files ) { my ( $string, $strio, $out ); $strio = IO::String->new( $string ); $out = Bio::SeqIO->new ( -fh => $strio, -format => 'raw' ); my $guesser = new Bio::Tools::GuessSeqFormat( -file => $file ); my $in = Bio::SeqIO->new( -format => $guesser->guess , -file => $file ); while ( my $seq = $in->next_seq() ) { $out->write_seq( $seq ); print substr($string, 0, 30), "\n"; } } Joel On Thu, Aug 06, 2009 at 03:36:36PM -0400, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > > > Hi, > > > > It doesn't matter what sequence we use. As Chris Fields's showed in > > his test, not having > > ">" as the 1st character on the first line is the problem. > > We always assumed the sequence is in FASTA format and this seems to > > be wrong. > > > > I think, the solution to our problem is to check whether the ">" > > symbol is present or not. > > If not present then it will be added. > > > > Thank you, > > Cornel Ghiban > > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Thursday, August 06, 2009 11:18 AM > > To: Hilgert, Uwe > > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > Uwe - could you send an actual data file (as an attachment) that > > reproduces the problem, or is that not possible? > > > > -hilmar > > > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > > > >> I'm not sure what version we have. Cornel may have installed it a > >> while ago from CVS: > >> > >> Module id = Bio::Root::Build > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::Root::Version > >> Module id = Bio::Root::Version > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::SeqIO > >> Module id = Bio::SeqIO > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > >> INST_VERSION undef > >> > >> Cornel still has the checked-out "bioperl-live" directory and the > >> last > >> changes are from March this year. > >> > >> As per why he used "Fasta" instead of 'fasta" as the format parameter > >> in Bio::SeqIO, it's because that what it says in the modules manual. > >> He now tried 'fasta' instead and see no changes in behavior. Omitting > >> the format parameter altogether, fasta-formatted sequence continues > >> to > >> be treated correctly, the first line being removed. However, raw > >> sequence is being treated differently in that the first line is not > >> being removed any more. Instead, the program returns the first line > >> only. Which, in the example I am going to forward in my next message, > >> will return 60 amino acids out of raw sequence of 300 aa. Can't win > >> with raw sequence... > >> > >> > >> The files may be created on different platforms, we didn't notice any > >> difference between using files created on Windows or Linux. > >> > >> Thanks > >> Uwe > >> > >> > >> > >> > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Wednesday, August 05, 2009 6:54 PM > >> To: Chris Fields > >> Cc: Hilgert, Uwe; BioPerl List > >> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >> > >> I don't think that can be the problem. If anything, providing the > >> format ought to be better in terms of result than not providing it? > >> > >> Uwe - I'd like you to go back to Chris' initial questions that you > >> haven't answered yet: "What version of bioperl are you using, OS, > >> etc? > >> What does your data look like?" I'd add to that, can you show us your > >> full script, or a smaller code snippet that reproduces the problem. > >> > >> I suspect that either something in your script is swallowing the > >> line, > >> or that the line endings in your data file are from a different OS > >> than the one you're running the script on. (Or that you are running a > >> very old version of BioPerl, which is entirely possible if you > >> installed through CPAN.) > >> > >> -hilmar > >> > >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> > >>> Uwe, > >>> > >>> Please keep replies on the list. > >>> > >>> It's very possible that's the issue; IIRC the fasta parser pulls out > >>> the full sequence in chunks (based on local $/ = "\n>") and splits > >>> the header off as the first line in that chunk. You could probably > >>> try leaving the format out and letting SeqIO guess it, or passing > >>> the > >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably > >>> better to go through the files and add a file extension that > >>> corresponds to the format. > >>> > >>> chris > >>> > >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >>> > >>>> Thanks, Chris. The files have no extension, but we indicate what > >>>> format to use, like in the manual: > >>>> > >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > >>>> > >>>> I wonder now whether this could exactly cause the problem: as we > >>>> are > >>>> telling that input files are in fasta format they are being treated > >>>> as such (=remove first line) - regardless of whether they really > >>>> are > >>>> fasta? > >>>> > >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe > >>>> Hilgert, Ph.D. > >>>> Dolan DNA Learning Center > >>>> Cold Spring Harbor Laboratory > >>>> > >>>> C: (516) 857-1693 > >>>> V: (516) 367-5185 > >>>> E: hilgert at cshl.edu > >>>> F: (516) 367-5182 > >>>> W: http://www.dnalc.org > >>>> > >>>> -----Original Message----- > >>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>> Sent: Wednesday, August 05, 2009 5:04 PM > >>>> To: Hilgert, Uwe > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >>>> > >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >>>> > >>>>> Is my impression correct that Bio::SeqIO just assumes that > >>>>> sequences are being submitted in FASTA format? > >>>> > >>>> No. See: > >>>> > >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO > >>>> > >>>> SeqIO tries to guess at the format using the file extension, and if > >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > >>>> possible that the extension is causing the problem, or that > >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced > >>>> to guessing). In any case, it's always advisable to explicitly > >>>> indicate the format when possible. > >>>> > >>>> Relevant lines: > >>>> > >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > >>>> i; > >>>> ... > >>>> return 'raw' if /\.(txt)$/i; > >>>> > >>>>> In our experience, implementing > >>>>> Bio::SeqIO led to the first line of files being cut off, > >>>>> regardless > >>>>> of whether the files were indeed fasta files or files that only > >>>>> contained sequence. > >>>> > >>>> Files that only contain sequence are 'raw'. Ones in FASTA are > >>>> 'fasta'. > >>>> > >>>>> Which, in the latter, led to sequence submissions that had the > >>>>> first line of nucleotides removed. Has anyone tried to write a fix > >>>>> for this? > >>>> > >>>> This sounds like a bug, but we have very little to go on beyond > >>>> your > >>>> description. What version of bioperl are you using, OS, etc? What > >>>> does your data look like? File extension? > >>>> > >>>> chris > >>>> > >>>>> Thanks, > >>>>> > >>>>> Uwe > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >>>>> > >>>>> Uwe Hilgert, Ph.D. > >>>>> > >>>>> Dolan DNA Learning Center > >>>>> > >>>>> Cold Spring Harbor Laboratory > >>>>> > >>>>> > >>>>> > >>>>> V: (516) 367-5185 > >>>>> > >>>>> E: hilgert at cshl.edu > >>>>> > >>>>> F: (516) 367-5182 > >>>>> > >>>>> W: http://www.dnalc.org > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Sun Aug 9 06:38:30 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 11:38:30 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EA726.60303@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > OK, I propose to look into these. Almost certainly I'll be doing "convert > run/db/network to Module::Build". I'll try to resolve the bugs you've > mentioned. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. Chris already started on "convert run/db/network to Module::Build" for some reason, but his attempt doesn't actually result in any modules getting installed (setting pm_files() like that isn't enough). The easiest, cleanest and most standard solution is to create a lib directory and svn move Bio into it. Does anyone have an objection to me doing this for the network, db and run packages? It will only affect developers currently working on code in those packages, and they just need to be aware that an svn update will be rather dramatic after my change. From cjfields at illinois.edu Sun Aug 9 09:05:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:05:17 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7EA726.60303@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> Message-ID: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> ... > > Chris already started on "convert run/db/network to Module::Build" > for some reason, but his attempt doesn't actually result in any > modules getting installed (setting pm_files() like that isn't enough). > > The easiest, cleanest and most standard solution is to create a lib > directory and svn move Bio into it. Does anyone have an objection to > me doing this for the network, db and run packages? It will only > affect developers currently working on code in those packages, and > they just need to be aware that an svn update will be rather > dramatic after my change. If it stimulates you into doing this then I'm all for it, but I've waited on getting this fixed long enough I decided to take it on myself to work on it, using the simplest ones. You had mentioned several times you would do this and I hadn't seen any progress. The point: I would really like to get another point release out before we work on splitting things up. Simple as that. From what I have seen (with my few tests) everything (modules, scripts) gets copied into blib just fine and the temp folder for script generation gets cleaned up; I haven't progressed beyond to the installation step, but there isn't anything to me that indicates it wouldn't work. I won't be available until Wed. at the earliest for additional comment (out of town, no internet connection). chris From bix at sendu.me.uk Sun Aug 9 09:15:07 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 14:15:07 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> Message-ID: <4A7ECBDB.9030505@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> The easiest, cleanest and most standard solution is to create a lib >> directory and svn move Bio into it. Does anyone have an objection to >> me doing this for the network, db and run packages? It will only >> affect developers currently working on code in those packages, and >> they just need to be aware that an svn update will be rather dramatic >> after my change. > > From what I have seen (with my few tests) everything (modules, scripts) > gets copied into blib just fine and the temp folder for script > generation gets cleaned up; I haven't progressed beyond to the > installation step, but there isn't anything to me that indicates it > wouldn't work. ./Build testinstall will show you it doesn't work as-is. If you're in a rush I'll just do the svn moves and we can revert later if anyone complains. From cjfields at illinois.edu Sun Aug 9 09:19:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:19:30 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <2790F9A5-43E8-47E5-B5AA-98239B95EF04@illinois.edu> On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. > > If you're in a rush I'll just do the svn moves and we can revert > later if anyone complains. Works for me. The sooner it gets done the better (next week, would be nice, but two is fine so we don't rush it too much). I'll be working on several other bits, including FASTQ, when I get back Wed, then I'll merge over and work on the next point release. chris From cjfields at illinois.edu Sun Aug 9 09:34:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:34:07 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. Sorry, I'll be leaving in the next hour, but for the above, did you mean './Build fakeinstall'? As long as you're moving everything into /lib (which I fully support), we should consider hard_coding scripts into bp_foo.PLS syntax seeing as we're going through additional trouble of converting them over. That is, unless there is a specific purpose to keeping them without the 'bp_'. chris From bix at sendu.me.uk Sun Aug 9 10:00:18 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 15:00:18 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <4A7ED672.20701@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>>> The easiest, cleanest and most standard solution is to create a lib >>>> directory and svn move Bio into it. Does anyone have an objection to >>>> me doing this for the network, db and run packages? It will only >>>> affect developers currently working on code in those packages, and >>>> they just need to be aware that an svn update will be rather >>>> dramatic after my change. >>> >>> From what I have seen (with my few tests) everything (modules, >>> scripts) gets copied into blib just fine and the temp folder for >>> script generation gets cleaned up; I haven't progressed beyond to the >>> installation step, but there isn't anything to me that indicates it >>> wouldn't work. >> >> ./Build testinstall will show you it doesn't work as-is. > > Sorry, I'll be leaving in the next hour, but for the above, did you mean > './Build fakeinstall'? Yes, sorry. > As long as you're moving everything into /lib (which I fully support), > we should consider hard_coding scripts into bp_foo.PLS syntax seeing as > we're going through additional trouble of converting them over. That > is, unless there is a specific purpose to keeping them without the 'bp_'. (The final suffix is supposed to be .pl - we convert from PLS to pl in core, no conversion needed in db) Yes, for only a handful of scripts, it actually makes sense to flatten them all into a new bin directory, which is the default script location for Module::Build. So for example I'd do: svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl etc. From bix at sendu.me.uk Sun Aug 9 12:13:03 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 17:13:03 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EF58F.9000909@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. These issues should now be resolved. I'll note that for future cases similar to 3), if a user chooses to install an optional dependency using CPAN/CPANPLUS and the installation of that external module causes an infinite loop, it's an issue of that module or CPAN/CPANPLUS, not BioPerl. The solution from our end is to tell the user to choose not to install that dependency or ask on the CPAN mailing list if they really need it. (I've often got stuck in infinite loops just trying to install Bundle::CPAN! CPAN itself will detect infinite loops after a while and kill itself.) From jdalzell03 at qub.ac.uk Sun Aug 9 05:06:26 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Sun, 9 Aug 2009 02:06:26 -0700 (PDT) Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <24885345.post@talk.nabble.com> Thanks for the replies, I emailed Chris and Brian individually, but I guess it would be helpfull if I threw my solution to "the dogs" In the end I found that by downloading subversion (you need to sign up to collabnet for a user account first), and following the installation instructions of the relevant subversion pages on the bioperl site (http://www.bioperl.org/wiki/Using_Subversion), that It downloaded fine first time. No need for CPAN, or a PPM, just copy paste 'svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live' into your command line, and it auto installs in under 30 seconds...definately the way to go for anyone else out there trying to bust-a-move on a Win machine. At time of writing, I have also installed BioPerl-db (same as above, copy and paste 'svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db' into command line), and BioPerl-run (I typed in 'svn co svn://code.open-bio.org/bioperl/bioperl-run/trunk bio' (I THINK), and it worked fine. The relevant installation instructions don't give an explicit command for BP-run installation, but I think that matches the branches and trunk in the subversion repository (if not, sorry, but you can cross ref its position in there easily by following the links). Both have worked without problem on Strawberry Perl 5.10 through WinVista, so far. Jonny -- View this message in context: http://www.nabble.com/bioperl-1.6-installation-on-vista-with-perl-5.10-tp24875623p24885345.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From mwhagen85 at gmail.com Mon Aug 10 14:54:53 2009 From: mwhagen85 at gmail.com (OjoLoco) Date: Mon, 10 Aug 2009 11:54:53 -0700 (PDT) Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits Message-ID: <24905417.post@talk.nabble.com> Hello all, I have found matching sequences between two genomes and I would now like to create a graphic that contains a heat map-like track that will show areas of the genome that were found more often than others. For every nt I have the number of times it was found, so if it was found very often it would be a darker color than say a nt that wasn't found at all. Is there any way to achieve this using built in BioPerl graphics? Thank you for your time. -- View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Mon Aug 10 15:22:36 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 10 Aug 2009 15:22:36 -0400 Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits In-Reply-To: <24905417.post@talk.nabble.com> References: <24905417.post@talk.nabble.com> Message-ID: Hi, You should be able to do that with wiggle_density and wiggle_xyplot glyphs. See http://gmod.org/wiki/GBrowse/Uploading_Wiggle_Tracks for instructions on constructing wiggle plots. After you have a wiggle plot, you'll need the wiggle2gff3.pl script (which is part of GBrowse, but it will should run fine on its own), which you can get from GMOD's cvs: http://gmod.cvs.sourceforge.net/viewvc/*checkout*/gmod/Generic-Genome-Browser/bin/wiggle2gff3.pl which will convert the wig file to a binary file. Then you can create Bio::SeqFeatureI objects that will work with Bio::Graphics to draw the density or xyplot. Note as well that Bio::Graphics is no longer part of the main BioPerl distribution, so you'll need to get the most recent version from CPAN. Also, fair warning: I've never actually done this; I've only used wiggle plots in the context of GBrowse, but it should work pretty much as described. Scott On Aug 10, 2009, at 2:54 PM, OjoLoco wrote: > > Hello all, > I have found matching sequences between two genomes and I would > now like > to create a graphic that contains a heat map-like track that will > show areas > of the genome that were found more often than others. For every nt > I have > the number of times it was found, so if it was found very often it > would be > a darker color than say a nt that wasn't found at all. Is there any > way to > achieve this using built in BioPerl graphics? Thank you for your time. > -- > View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jdalzell03 at qub.ac.uk Tue Aug 11 11:07:52 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:07:52 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <24919498.post@talk.nabble.com> Hi, trying to run the example given for Bio::Tools::HMM on the Bioperl site, and when I try to run it, I get this in the command line... "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. Compilation failed in require at HMM.txt line 4. BEGIN failed--compilation aborted at HMM.txt line 4." I have installed the entire bioperl-ext package through subversion, and it looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it won't work. Am I missing something? I'm under the impression that the C-compiler comes with bioperl-ext (which installed with no reported problems)? I concede that I am extrememly new to both Perl in general and Bioperl more specifically, but I have followed the instructions which I can find. I have the bioperl core installed in addition to bioperl-db and bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that most work through Linux systems...I am at times sorely tempted myself. Any suggestions would be welcomed gratefully, cheers, Jonny ps. this is the partial script I was trying to run... #!/usr/bin/perl -w usr strict; use Bio::Tools::HMM; use Bio::SeqIO; use Bio::Matrix::Scoring; #Create a HMM object #ACGT are the bases NC mean non-coding and coding $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); #Initialise some training observation sequences $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); @seqs = ($seq1, $seq2); #Train the HMM with the observation sequences $hmm ->baum_welch_training(\@seqs); #Get parameters $init = $hmm->init_prob; #Returns an array reference $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring I realise that this is incomplete. -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shameer at ncbs.res.in Tue Aug 11 13:07:20 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 11 Aug 2009 22:37:20 +0530 (IST) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Hello Jonny, Are you sure that you have a compiled version of HMMER installed in your machine ? -- K. Shameer > Hi, > > trying to run the example given for Bio::Tools::HMM on the Bioperl site, > and > when I try to run it, I get this in the command line... > > "The C-compiled engine for Hidden Markov Model (HMM) has not been > installed. > Please read the install the bioperl-ext package > > BEGIN failed--compilation aborted at > C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. > Compilation failed in require at HMM.txt line 4. > BEGIN failed--compilation aborted at HMM.txt line 4." > > I have installed the entire bioperl-ext package through subversion, and it > looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it > won't work. Am I missing something? I'm under the impression that the > C-compiler comes with bioperl-ext (which installed with no reported > problems)? I concede that I am extrememly new to both Perl in general and > Bioperl more specifically, but I have followed the instructions which I > can > find. I have the bioperl core installed in addition to bioperl-db and > bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that > most > work through Linux systems...I am at times sorely tempted myself. > > Any suggestions would be welcomed gratefully, > cheers, > Jonny > > ps. this is the partial script I was trying to run... > > #!/usr/bin/perl -w > > usr strict; > use Bio::Tools::HMM; > use Bio::SeqIO; > use Bio::Matrix::Scoring; > > #Create a HMM object > #ACGT are the bases NC mean non-coding and coding > $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); > > #Initialise some training observation sequences > $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); > $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); > @seqs = ($seq1, $seq2); > > #Train the HMM with the observation sequences > $hmm ->baum_welch_training(\@seqs); > > #Get parameters > $init = $hmm->init_prob; #Returns an array reference > $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring > $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring > > I realise that this is incomplete. > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jdalzell03 at qub.ac.uk Tue Aug 11 11:14:59 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:14:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24919603.post@talk.nabble.com> I should point out perhaps that CPAN is not an option on a Win setup...it has never worked for anything I have tried to install. Although I'm using Strawberry Perl now, I had no success getting bioperl or any of its components through the activestate PPM either (One of the reasons I ended up going to Strawberry). The only option I have for installation is the subversion server. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919603.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 11:42:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:42:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24920117.post@talk.nabble.com> I realise that this looks like there is a problem with Bio::Tools::HMM when looking at the source code, but I've even tried replacing the HMM.pm file I had with the HMM.pm script at http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, and now I'm getting... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: C:/strawberry/perl/lib C:/strawberry/perl/site/ lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." ?? jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24920117.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 14:52:21 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 11:52:21 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Message-ID: <24923606.post@talk.nabble.com> Hi, I'm as sure as I can be. I look in the HHMER folder and it contains "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something to do with @INC, but I put "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at the top of my script, which definately encompasses the directory it should be in, and I still get... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib C:/strawberry/perl/site/lib/ Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." I'm out of ideas. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From rmb32 at cornell.edu Tue Aug 11 15:23:56 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:23:56 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24920117.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> Message-ID: <4A81C54C.5020905@cornell.edu> Jonny, For quicker help you might want to try #bioperl on freenode. That said, the problem here is that when you get code from subversion, you are not really 'installing' it, you are just copying it to your machine. Part of the installation process is compiling these things, and for that you need a working C compiler. I don't know anything about using BioPerl on Windows, but as a general recommendation I would say go back to the CPAN and/or ppm directions and getting those working. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Jonny Dalzell wrote: > I realise that this looks like there is a problem with Bio::Tools::HMM when > looking at the source code, but I've even tried replacing the HMM.pm file I > had with the HMM.pm script at > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, > and now I'm getting... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: > C:/strawberry/perl/lib C:/strawberry/perl/site/ > lib .) at HMM.txt line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > ?? > > jonny From maj at fortinbras.us Tue Aug 11 15:22:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 15:22:42 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <7C7654A8A64E49158F6761EE09C9F297@NewLife> Jonny, You need the HMMER application, which is not part of BioPerl. See http://hmmer.janelia.org/ for download options. MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 2:52 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Tue Aug 11 15:48:11 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:48:11 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81C54C.5020905@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> Message-ID: <4A81CAFB.5050903@cornell.edu> Elaborating more, the 'C-compiled engine' error comes because Bio::Ext::HMM is not installed, because bioperl-ext is not installed (correctly), because Bio::Ext::HMM is an XS extension written in C. Which needs to be compiled. With a C compiler. As part of some kind of installation process, not just copying the files to a machine with subversion. Rob Robert Buels wrote: > Jonny, > > For quicker help you might want to try #bioperl on freenode. > > That said, the problem here is that when you get code from subversion, > you are not really 'installing' it, you are just copying it to your > machine. Part of the installation process is compiling these things, > and for that you need a working C compiler. > > I don't know anything about using BioPerl on Windows, but as a general > recommendation I would say go back to the CPAN and/or ppm directions and > getting those working. > > Rob > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From bix at sendu.me.uk Tue Aug 11 16:11:43 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 11 Aug 2009 21:11:43 +0100 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <4A81D07F.6000703@sendu.me.uk> Jonny Dalzell wrote: > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. lib (or at least one entry in your PERL5LIB) needs to point to the directory that contains the Bio directory. So: use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; Now it will be able to locate Bio::Tools::Hmm. You'll still get your original error because you don't have Hmmer installed. See Mark's reply. From jdalzell03 at qub.ac.uk Tue Aug 11 16:29:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:29:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81D07F.6000703@sendu.me.uk> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> Message-ID: <24925178.post@talk.nabble.com> Hi, thanks. I did install HHMER from the site Mark suggested, and it is within the directories that perl recognizes when reading the script...still I get "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package" Is it possible that this module simply won't run through windows? jonny Sendu Bala-2 wrote: > > Jonny Dalzell wrote: >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >> something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >> the top of my script, which definately encompasses the directory it >> should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >> HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. > > lib (or at least one entry in your PERL5LIB) needs to point to the > directory that contains the Bio directory. So: > > use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > > Now it will be able to locate Bio::Tools::Hmm. You'll still get your > original error because you don't have Hmmer installed. See Mark's reply. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 16:31:36 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:31:36 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81CAFB.5050903@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> Message-ID: <24925211.post@talk.nabble.com> OK, so is there any particular C-compiler which I should use? Thanks, jonny Robert Buels wrote: > > Elaborating more, the 'C-compiled engine' error comes because > Bio::Ext::HMM is not installed, because bioperl-ext is not installed > (correctly), because Bio::Ext::HMM is an XS extension written in C. > Which needs to be compiled. With a C compiler. As part of some kind of > installation process, not just copying the files to a machine with > subversion. > > Rob > > Robert Buels wrote: >> Jonny, >> >> For quicker help you might want to try #bioperl on freenode. >> >> That said, the problem here is that when you get code from subversion, >> you are not really 'installing' it, you are just copying it to your >> machine. Part of the installation process is compiling these things, >> and for that you need a working C compiler. >> >> I don't know anything about using BioPerl on Windows, but as a general >> recommendation I would say go back to the CPAN and/or ppm directions and >> getting those working. >> >> Rob >> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Tue Aug 11 17:05:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 17:05:10 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925178.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: Jonny, It will run in Win/Vis but there are some caveats. The BioPerl package has some plain C components, as Rob pointed out. These need to be compiled, and the objects/libraries put in the right place. CPAN will cause this to happen when you have a compiler available; ActiveState .ppm will download the binaries directly from the repository (my understanding, anyway). CPAN is always available by doing > perl -MCPAN -e shell but you may not have a C compiler around. This is a little tricky. You can either explore Visual C/C++ options from MS here http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, and install Cygwin (www.cygwin.com), which creates a linux-like environment with GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful as the real thing, I grant. Which bring me to a third possibility, that I haven't tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot system (https://help.ubuntu.com/community/WindowsDualBoot). MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 4:29 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > thanks. I did install HHMER from the site Mark suggested, and it is within > the directories that perl recognizes when reading the script...still I get > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > Please read the install the bioperl-ext package" > > Is it possible that this module simply won't run through windows? > > jonny > > > > Sendu Bala-2 wrote: >> >> Jonny Dalzell wrote: >>> Hi, >>> >>> I'm as sure as I can be. I look in the HHMER folder and it contains >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >>> something >>> to do with @INC, but I put >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >>> the top of my script, which definately encompasses the directory it >>> should >>> be in, and I still get... >>> >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >>> C:/strawberry/perl/site/lib/ >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >>> HMM.txt >>> line 5. >>> BEGIN failed--compilation aborted at HMM.txt line 5." >>> >>> I'm out of ideas. >> >> lib (or at least one entry in your PERL5LIB) needs to point to the >> directory that contains the Bio directory. So: >> >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; >> >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your >> original error because you don't have Hmmer installed. See Mark's reply. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue Aug 11 17:39:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 12 Aug 2009 09:39:30 +1200 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB6F93AA@exchsth.agresearch.co.nz> Dev-C++ http://www.bloodshed.net/devcpp.html is a good (i.e. free under GPL) Windows compiler I've used before. Might save having to install Cygwin. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Wednesday, 12 August 2009 9:05 a.m. > To: Jonny Dalzell; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Jonny, > It will run in Win/Vis but there are some caveats. The BioPerl package has > some > plain C components, as Rob pointed out. These need to be compiled, and the > objects/libraries put in the right place. CPAN will cause this to happen when > you have a compiler available; ActiveState .ppm will download the binaries > directly from the repository (my understanding, anyway). CPAN is always > available by doing > > > perl -MCPAN -e shell > > but you may not have a C compiler around. This is a little tricky. You can > either explore Visual C/C++ options from MS here > http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, > and install Cygwin (www.cygwin.com), which creates a linux-like environment > with > GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful > as > the real thing, I grant. Which bring me to a third possibility, that I haven't > tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot > system (https://help.ubuntu.com/community/WindowsDualBoot). > MAJ > ----- Original Message ----- > From: "Jonny Dalzell" > To: > Sent: Tuesday, August 11, 2009 4:29 PM > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > > > > > Hi, > > > > thanks. I did install HHMER from the site Mark suggested, and it is within > > the directories that perl recognizes when reading the script...still I get > > > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > > Please read the install the bioperl-ext package" > > > > Is it possible that this module simply won't run through windows? > > > > jonny > > > > > > > > Sendu Bala-2 wrote: > >> > >> Jonny Dalzell wrote: > >>> Hi, > >>> > >>> I'm as sure as I can be. I look in the HHMER folder and it contains > >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > >>> something > >>> to do with @INC, but I put > >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > >>> the top of my script, which definately encompasses the directory it > >>> should > >>> be in, and I still get... > >>> > >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > >>> C:/strawberry/perl/site/lib/ > >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > >>> HMM.txt > >>> line 5. > >>> BEGIN failed--compilation aborted at HMM.txt line 5." > >>> > >>> I'm out of ideas. > >> > >> lib (or at least one entry in your PERL5LIB) needs to point to the > >> directory that contains the Bio directory. So: > >> > >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > >> > >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your > >> original error because you don't have Hmmer installed. See Mark's reply. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > -- > > View this message in context: > > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista-- > tp24919498p24925178.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Tue Aug 11 19:44:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:44:23 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext that generates HMM's (XS-based bindings I think). I have managed to compile it successfully on Ubuntu and Mac OS X, but WinVista is a whole different bag-o-worms altogether (untested AFAIK). For the record, I do not recommend using it; I'm unsure about it's maintenance status, so it may be released separately. It would be best to use something better supported, such as the HMMER wrapper in bioperl-run and the hmmer parsers in bioperl-core. We may also have wrappers for similar code available in biolib at some future point. chris On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ > Tools/";" at > the top of my script, which definately encompasses the directory it > should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ > per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 11 19:48:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:48:08 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925211.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> <24925211.post@talk.nabble.com> Message-ID: <3A5CA958-3B03-4252-B78F-07BBFF1FA355@illinois.edu> Any C-based code should use the same compiler used from whatever perl version you are running. ActiveState supports both VC/C++ (as Mark indicates) or mingw/gcc. I think Strawberry supports mainly the latter. Though you can use CygWin, I think a native Win module is the best way to go if possible. It will likely be a tricky road, so keep us updated and we'll attempt to help out the best we can. chris On Aug 11, 2009, at 3:31 PM, Jonny Dalzell wrote: > > OK, > > so is there any particular C-compiler which I should use? > > Thanks, > jonny > > > > Robert Buels wrote: >> >> Elaborating more, the 'C-compiled engine' error comes because >> Bio::Ext::HMM is not installed, because bioperl-ext is not installed >> (correctly), because Bio::Ext::HMM is an XS extension written in C. >> Which needs to be compiled. With a C compiler. As part of some >> kind of >> installation process, not just copying the files to a machine with >> subversion. >> >> Rob >> >> Robert Buels wrote: >>> Jonny, >>> >>> For quicker help you might want to try #bioperl on freenode. >>> >>> That said, the problem here is that when you get code from >>> subversion, >>> you are not really 'installing' it, you are just copying it to your >>> machine. Part of the installation process is compiling these >>> things, >>> and for that you need a working C compiler. >>> >>> I don't know anything about using BioPerl on Windows, but as a >>> general >>> recommendation I would say go back to the CPAN and/or ppm >>> directions and >>> getting those working. >>> >>> Rob >>> >>> >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Aug 11 20:09:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 20:09:01 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> Message-ID: <69BDE54FD5C943669BCD41A9A607634A@NewLife> [OOps. Sorry about that. The compiler ideas still apply however.] ----- Original Message ----- From: "Chris Fields" To: "Jonny Dalzell" Cc: Sent: Tuesday, August 11, 2009 7:44 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext > that generates HMM's (XS-based bindings I think). I have managed to compile > it successfully on Ubuntu and Mac OS X, but WinVista is a whole different > bag-o-worms altogether (untested AFAIK). > > For the record, I do not recommend using it; I'm unsure about it's > maintenance status, so it may be released separately. It would be best to > use something better supported, such as the HMMER wrapper in bioperl-run and > the hmmer parsers in bioperl-core. We may also have wrappers for similar > code available in biolib at some future point. > > chris > > On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > >> >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ Tools/";" at >> the top of my script, which definately encompasses the directory it should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. >> >> Jonny >> -- >> View this message in context: >> http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Aug 12 12:44:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 Aug 2009 11:44:37 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ED672.20701@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> <4A7ED672.20701@sendu.me.uk> Message-ID: <1F099DCC-073E-470E-873A-608E674375C1@illinois.edu> On Aug 9, 2009, at 9:00 AM, Sendu Bala wrote: > Chris Fields wrote: > ... >> As long as you're moving everything into /lib (which I fully >> support), we should consider hard_coding scripts into bp_foo.PLS >> syntax seeing as we're going through additional trouble of >> converting them over. That is, unless there is a specific purpose >> to keeping them without the 'bp_'. > > (The final suffix is supposed to be .pl - we convert from PLS to pl > in core, no conversion needed in db) Yes, had that reversed in my commit. Thanks. > Yes, for only a handful of scripts, it actually makes sense to > flatten them all into a new bin directory, which is the default > script location for Module::Build. > > So for example I'd do: > svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl > etc. Yes, exactly. It seems we're going out of our way to keep things as they were previously when using ExtUtil::MakeMaker/Makefile.PL. I'm not quite sure why we've bent over backwards to work around these issues when it is much easier to stick to simple standards that 99% of CPAN uses: scripts in bin (or whatever dir is passed to script_files), modules in lib. I'm not complaining, just haven't heard an explanation about that one way or the other. chris From rmb32 at cornell.edu Thu Aug 13 14:59:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 11:59:00 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A79A52E.7000104@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> Message-ID: <4A846274.4000600@cornell.edu> OK, commit 15927 adds some more info about -db options for Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, nuccore, nucgss, nucest, and unigene, and including a link to an (XML) page from NCBI that lists inputs that NCBI accepts. Could somebody who knows more about eUtils than me also review this patch and make corrections if necessary? Rob Robert Buels wrote: > I think you're looking for the -db => 'nucgss' option. > > I'll add a better listing of this (undocumented) options to the > Bio::DB::Query::GenBank docs. > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jdalzell03 at qub.ac.uk Thu Aug 13 15:27:14 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Thu, 13 Aug 2009 12:27:14 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24957222.post@talk.nabble.com> Fellows, thanks very much for the input. However, today I saw fit to dual-boot with ubuntu. I've installed everything, but I still get the same "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package " message! Is it ridiculous of me to expect ubuntu to take care of this for me? How do I go about compiling the HMM? Thanks in advance, Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24957222.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Aug 13 15:41:21 2009 From: jonathanmflowers at gmail.com (Jonathan Flowers) Date: Thu, 13 Aug 2009 12:41:21 -0700 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO Message-ID: Hi, I am trying to parse BLAST reports written in XML using Bio::SearchIO. When running the following code on a set of reports (multiple query results in a single file), I only get one ResultI object. I tried running the same code on a file in 'blast' format and obtained the expected results (ie one ResultI object for each query), suggesting that the issue is with blastxml. I found an old thread on this listserv where someone had had a similar problem, but could not find how it was resolved. I am using Bioperl 1.5.2 and the XML reports were generated using blastall with the -m7 option. my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'blastreport.xml' ); while( my $result = $in->next_result ) { print $result->query_name,"\n"; while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #do something with hsp } } } Thanks Jonathan From rmb32 at cornell.edu Thu Aug 13 17:37:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 14:37:21 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24957222.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> Message-ID: <4A848791.4010402@cornell.edu> Jonny Dalzell wrote: > Is it ridiculous of me to expect ubuntu to take care of this for me? How do > I go about compiling the HMM? Yes. This is a very specialized thing that you're doing, and Ubuntu does not have the resources to package every single thing. Unfortunately, it looks like bioperl-ext package is not installable under Ubuntu 9.04 anyway, which is what I'm running. For others on this list, if somebody is interested in doing maintaining it, I'd be happy to help out by testing on Debian-based Linux platforms. We need to clarify this package's maintenance status: if there is nobody interested in maintaining it, I would recommend that bioperl-ext be removed from distribution. It's not in anybody's interest to have unmaintained software out there causing confusion. So Jonny, in short, I would say "do not use bioperl-ext". Step back. What are you trying to accomplish? Chris already recommended some alternative methods in his email of 8/11 on this subject. Perhaps we can guide you to some software that is actively maintained and will meet your needs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 13 18:06:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:06:29 -0500 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A846274.4000600@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> <4A846274.4000600@cornell.edu> Message-ID: <916D0E26-EBB5-4E28-99AD-F689639BB93A@illinois.edu> It looks fine. As for the databases, you can always get the latest databases using a script from bioperl-live, which uses Bio::DB::EUtilities to access them directly (scripts/DB_EUtilities/ einfo.PLS, which should install as bp_einfo.pl). (looking at the below, what is blastdbinfo?) cjfields4:DB_EUtilities cjfields$ perl einfo.PLS pubmed protein nucleotide nuccore nucgss nucest structure genome biosystems blastdbinfo books cancerchromosomes cdd gap domains gene genomeprj gensat geo gds homologene journals mesh ncbisearch nlmcatalog omia omim pepdome pmc popset probe proteinclusters pcassay pccompound pcsubstance snp sra taxonomy toolkit unigene chris On Aug 13, 2009, at 1:59 PM, Robert Buels wrote: > OK, commit 15927 adds some more info about -db options for > Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, > nuccore, nucgss, nucest, and unigene, and including a link to an > (XML) page from NCBI that lists inputs that NCBI accepts. > > Could somebody who knows more about eUtils than me also review this > patch and make corrections if necessary? > > Rob > > Robert Buels wrote: >> I think you're looking for the -db => 'nucgss' option. >> I'll add a better listing of this (undocumented) options to the >> Bio::DB::Query::GenBank docs. >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:08:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:08:37 -0500 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO In-Reply-To: References: Message-ID: <65CC2787-7F0A-43C1-A840-554A2E4FD76A@illinois.edu> You should update to bioperl 1.6; I believe I fixed this issue after the 1.5.2 release. chris On Aug 13, 2009, at 2:41 PM, Jonathan Flowers wrote: > Hi, > > I am trying to parse BLAST reports written in XML using > Bio::SearchIO. When > running the following code on a set of reports (multiple query > results in a > single file), I only get one ResultI object. I tried running the > same code > on a file in 'blast' format and obtained the expected results (ie one > ResultI object for each query), suggesting that the issue is with > blastxml. > I found an old thread on this listserv where someone had had a similar > problem, but could not find how it was resolved. > > I am using Bioperl 1.5.2 and the XML reports were generated using > blastall > with the -m7 option. > > my $in = new Bio::SearchIO(-format => 'blastxml', -file => > 'blastreport.xml' ); > while( my $result = $in->next_result ) { > print $result->query_name,"\n"; > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #do something with hsp > } > } > } > > Thanks > > Jonathan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:18:57 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:18:57 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A848791.4010402@cornell.edu> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> <4A848791.4010402@cornell.edu> Message-ID: On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > Jonny Dalzell wrote: >> Is it ridiculous of me to expect ubuntu to take care of this for >> me? How do >> I go about compiling the HMM? > Yes. This is a very specialized thing that you're doing, and Ubuntu > does not have the resources to package every single thing. > > Unfortunately, it looks like bioperl-ext package is not installable > under Ubuntu 9.04 anyway, which is what I'm running. For others on > this list, if somebody is interested in doing maintaining it, I'd be > happy to help out by testing on Debian-based Linux platforms. We > need to clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that bioperl- > ext be removed from distribution. It's not in anybody's interest to > have unmaintained software out there causing confusion. I have cc'd Yee Man Chan for this. If there isn't a response or the message bounces, we do one of two things: 1) consider it deprecated (probably safest). 2) spin it out into a separate module. Just tried to comile it myself and am getting errors (using 64bit perl 5.10), so I think, unless someone wants to take this on, option #1 is best. > So Jonny, in short, I would say "do not use bioperl-ext". In general, that's a safe bet. We're moving most of our C/C++ bindings to BioLib. > Step back. What are you trying to accomplish? Chris already > recommended some alternative methods in his email of 8/11 on this > subject. Perhaps we can guide you to some software that is actively > maintained and will meet your needs. > > Rob Exactly. Lots of other (better supported!) options out there. HMMER, SeqAn, and others. chris From cjfields at illinois.edu Thu Aug 13 20:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 19:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <650586.94518.qm@web30407.mail.mud.yahoo.com> References: <650586.94518.qm@web30407.mail.mud.yahoo.com> Message-ID: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> (just to point out to everyone, Yee Man's contact information was in the POD) Yee Man, I have the output in the below link: http://gist.github.com/167542 There are similar problems popping up on 32- and 64-bit perl 5.10.0, Mac OS X 10.5. Haven't had time to debug it unfortunately. I think we should seriously consider spinning this code off into it's own distribution for CPAN. It's unfortunately bit-rotting away in bioperl-ext. If you want to continue supporting it I can help set that up. chris On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > Hi > > So is this an HMM only problem? Or does it apply to other bioperl- > ext modules? > > What exactly are the compilation errors for HMM? I believe my > implementation is just a simple one based on Rabiner's paper. > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > ~murphyk%2FBayes > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > I don't think I did anything fancy that makes it machine > dependent or non-ANSI C. > > Yee Man > > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Jonny Dalzell" , "BioPerl List" > >, "Yee Man Chan" >> Date: Thursday, August 13, 2009, 3:18 PM >> >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >> >>> Jonny Dalzell wrote: >>>> Is it ridiculous of me to expect ubuntu to take >> care of this for me? How do >>>> I go about compiling the HMM? >>> Yes. This is a very specialized thing that >> you're doing, and Ubuntu does not have the resources to >> package every single thing. >>> >>> Unfortunately, it looks like bioperl-ext package is >> not installable under Ubuntu 9.04 anyway, which is what I'm >> running. For others on this list, if somebody is >> interested in doing maintaining it, I'd be happy to help out >> by testing on Debian-based Linux platforms. We need to >> clarify this package's maintenance status: if there is >> nobody interested in maintaining it, I would recommend that >> bioperl-ext be removed from distribution. It's not in >> anybody's interest to have unmaintained software out there >> causing confusion. >> >> I have cc'd Yee Man Chan for this. If there isn't a >> response or the message bounces, we do one of two things: >> >> 1) consider it deprecated (probably safest). >> 2) spin it out into a separate module. >> >> Just tried to comile it myself and am getting errors (using >> 64bit perl 5.10), so I think, unless someone wants to take >> this on, option #1 is best. >> >>> So Jonny, in short, I would say "do not use >> bioperl-ext". >> >> In general, that's a safe bet. We're moving most of >> our C/C++ bindings to BioLib. >> >>> Step back. What are you trying to >> accomplish? Chris already recommended some alternative >> methods in his email of 8/11 on this subject. Perhaps >> we can guide you to some software that is actively >> maintained and will meet your needs. >>> >>> Rob >> >> Exactly. Lots of other (better supported!) options >> out there. HMMER, SeqAn, and others. >> >> chris >> > > > From ymc at yahoo.com Thu Aug 13 19:58:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 16:58:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <650586.94518.qm@web30407.mail.mud.yahoo.com> Hi So is this an HMM only problem? Or does it apply to other bioperl-ext modules? What exactly are the compilation errors for HMM? I believe my implementation is just a simple one based on Rabiner's paper. http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg I don't think I did anything fancy that makes it machine dependent or non-ANSI C. Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Jonny Dalzell" , "BioPerl List" , "Yee Man Chan" > Date: Thursday, August 13, 2009, 3:18 PM > > On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > > > Jonny Dalzell wrote: > >> Is it ridiculous of me to expect ubuntu to take > care of this for me?? How do > >> I go about compiling the HMM? > > Yes.? This is a very specialized thing that > you're doing, and Ubuntu does not have the resources to > package every single thing. > > > > Unfortunately, it looks like bioperl-ext package is > not installable under Ubuntu 9.04 anyway, which is what I'm > running.? For others on this list, if somebody is > interested in doing maintaining it, I'd be happy to help out > by testing on Debian-based Linux platforms.? We need to > clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that > bioperl-ext be removed from distribution.? It's not in > anybody's interest to have unmaintained software out there > causing confusion. > > I have cc'd Yee Man Chan for this.? If there isn't a > response or the message bounces, we do one of two things: > > 1) consider it deprecated (probably safest). > 2) spin it out into a separate module. > > Just tried to comile it myself and am getting errors (using > 64bit perl 5.10), so I think, unless someone wants to take > this on, option #1 is best. > > > So Jonny, in short, I would say "do not use > bioperl-ext". > > In general, that's a safe bet.? We're moving most of > our C/C++ bindings to BioLib. > > > Step back.? What are you trying to > accomplish?? Chris already recommended some alternative > methods in his email of 8/11 on this subject.? Perhaps > we can guide you to some software that is actively > maintained and will meet your needs. > > > > Rob > > Exactly.? Lots of other (better supported!) options > out there.? HMMER, SeqAn, and others. > > chris > From agulyaskov at mail.rockefeller.edu Thu Aug 13 20:40:22 2009 From: agulyaskov at mail.rockefeller.edu (Attila Gulyas-Kovacs) Date: Thu, 13 Aug 2009 20:40:22 -0400 Subject: [Bioperl-l] bus error when indexing large file Message-ID: <4A84B276.2040706@mail.rockefeller.edu> Dear all, I can index the SwissProt database without problem but I get bus error when I try to index the much larger TrEMBL database. Indexing failed with both the swissprot and fasta format (using Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke up TrEMBL into multiple files ('chunks'), about the size of the SwissProt database. Then I could could create separate indeces for each chunk. But I got bus error when I passed all chunks simultaneously to my script (below) to create a single index. Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. What do you suggest? Attila #! /usr/bin/perl use warnings; use strict; use Bio::Index::Swissprot; my $index_file_name = shift; my $inx = Bio::Index::Swissprot->new( -filename => $index_file_name, -write_flag => 1); $inx->make_index(@ARGV); -- Attila Gulyas-Kovacs Postdoctoral Associate Rockefeller University Gadsby Lab (Cardiac/Membrane Physiology) D.W. Bronk Building, Room 307 1230 York Avenue New York, NY, 10065 Tel: (212)327-8617 Fax: (212)327-7589 From ymc at yahoo.com Fri Aug 14 00:15:41 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 21:15:41 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> Message-ID: <528790.13637.qm@web30404.mail.mud.yahoo.com> Hi all Based on my understanding of the warning messages, the problem seems to come from the "typemap" file when I cast the return from SvIV from an integer to a pointer. I suppose this might cause problems in 64-bit machines. But when I look at perlguts and perlxs, it does seem to me that the way I did in typemap is the suggested way to do it because the IV type is "guaranteed to be big enough to hold a pointer". Nevertheless, I modified my typemap file to look exactly like what's in perlxs. (See PS) Does anyone know how to deal with this problem? Or can anyone of you give me access to a 64-bit machine to sort this out? Thank you! Yee Man PS This is a typemap file using exactly the same lines suggested by perlxs. It works in my 32-bit machine. Can someone try it on a 64-bit machine? Thanks ================================================ TYPEMAP HMM * T_HMM INPUT T_HMM if (sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG)) $var = ($type)SvIV((SV*)SvRV( $arg )); else{ warn( \"${Package}::$func_name() -- $var is not a blessed SV referenc e\" ); XSRETURN_UNDEF; } OUTPUT T_HMM sv_setref_pv($arg, "Bio::Ext::HMM::HMM", (void*) $var); ======================================================== --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > From ymc at yahoo.com Fri Aug 14 04:27:11 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 01:27:11 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <168012.97676.qm@web30405.mail.mud.yahoo.com> Ah.. I find that the typemap can become as simple as this ===================== TYPEMAP HMM * T_PTROBJ ===================== Then the generated HMM.c will have a function called INT2PTR to do the pointer conversion. I believe this should solve the warnings. Attached are the updated HMM.xs and typemap. Can someone with a 64-bit machine give it a try? Thank you Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5588 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: typemap Type: application/octet-stream Size: 26 bytes Desc: not available URL: From cjfields at illinois.edu Fri Aug 14 10:20:21 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:20:21 -0500 Subject: [Bioperl-l] bus error when indexing large file In-Reply-To: <4A84B276.2040706@mail.rockefeller.edu> References: <4A84B276.2040706@mail.rockefeller.edu> Message-ID: I can attempt to reproduce this (I have very similar specs). I'm wondering if it has something to do with large file support. Have you tried the perl packaged with Mac OS X? I think it's perl 5.8.8. chris On Aug 13, 2009, at 7:40 PM, Attila Gulyas-Kovacs wrote: > Dear all, > > I can index the SwissProt database without problem but I get bus > error when I try to index the much larger TrEMBL database. Indexing > failed with both the swissprot and fasta format (using > Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke > up TrEMBL into multiple files ('chunks'), about the size of the > SwissProt database. Then I could could create separate indeces for > each chunk. But I got bus error when I passed all chunks > simultaneously to my script (below) to create a single index. > Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. > > What do you suggest? > > Attila > > > #! /usr/bin/perl > use warnings; > use strict; > use Bio::Index::Swissprot; > my $index_file_name = shift; > my $inx = Bio::Index::Swissprot->new( > -filename => $index_file_name, > -write_flag => 1); > $inx->make_index(@ARGV); > > -- > Attila Gulyas-Kovacs > Postdoctoral Associate > > Rockefeller University > Gadsby Lab (Cardiac/Membrane Physiology) > D.W. Bronk Building, Room 307 1230 York Avenue > New York, NY, 10065 > Tel: (212)327-8617 > Fax: (212)327-7589 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Aug 14 10:10:33 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 16:10:33 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence Message-ID: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Hi everyone, I'm using Bio::AlignIO to read in a series of multiple alignments. Occasionally, an alignment will have a sequence which consists entirely of gaps (these are actually trimmed sub-alignments; that's why). Each time I read in such an alignment, an error will be raised when the Bio::LocatableSeq object is created for the all-gap sequence (actually, the error comes from the superclass Bio::PrimarySeq). To my way of thinking, an alignment is not invalid if it contains such all-gap sequences, so there shouldn't be an error. This could be done by having Bio::AlignIO::* passing the -nowarnonempty flag when creating the sequence objects. Any thoughts on this? Is there a better way to suppress the warning than changing the behavior of all the AlignIO modules? Dave From cjfields at illinois.edu Fri Aug 14 10:42:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:42:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Dave, Is this using bioperl-live? I recall this being a problem but I thought it was addressed in svn (and soon in the next point release). chris On Aug 14, 2009, at 9:10 AM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists > entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when > the > Bio::LocatableSeq object is created for the all-gap sequence > (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be > done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating > the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning > than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Fri Aug 14 10:44:42 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 14 Aug 2009 16:44:42 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <716af09c0908140744i4447dffg205ec07daeaaa571@mail.gmail.com> Hi Dave, I have observed the same (with bioperl 1.52) for the same reason. It would be nice not to have these errors as also in my view an all-gaps sequence is a sequence. I also found that sometimes parsing such alignments fails when the all-gaps sequence is the last in the alignment (bug 2744, in Bio::LocatableSeq). Regards, Bernd On Fri, Aug 14, 2009 at 4:10 PM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when the > Bio::LocatableSeq object is created for the all-gap sequence (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Aug 14 11:12:35 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 17:12:35 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Message-ID: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> > > Is this using bioperl-live? Sorry, should've said before. Yes, it's bioperl-live (r15927). I recall this being a problem but I thought it was addressed in svn (and > soon in the next point release). Hmm, the only recent somewhat related change I see (in Bio::AlignIO::*, anyway) is: ------------------------------------------------------------------------ r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 lines deprecate no_sequences/no_residues in main trunk (we can switch the version to 1.7 if deemed necessary) ------------------------------------------------------------------------ Perhaps this is what you were thinking of? Dave From cjfields at illinois.edu Fri Aug 14 11:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <168012.97676.qm@web30405.mail.mud.yahoo.com> References: <168012.97676.qm@web30405.mail.mud.yahoo.com> Message-ID: Yee Man, I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 64-bit) and on dev.open-bio.org (which is perl 5.8.8, appears to be 32-bit). The patch results in cleaning up warnings for 5.10.0 but results in similar warnings for 5.8.8 (linux or OS X). On OS X perl 5.8.8, this sometimes passes (note the first attempt fails, the second succeeds), so it's not entirely a 32-bit issue: http://gist.github.com/167860 OS X and perl 5.10.0, this always fails as the previous gist shows, but demonstrates similar behavior (multiple attempts to test get different responses): http://gist.github.com/167542 On linux, everything passes with or w/o the patched files (patched files have warnings as indicated above): Specs for all three perl executables (they vary a bit): http://gist.github.com/167883 chris On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > Ah.. I find that the typemap can become as simple as this > ===================== > TYPEMAP > HMM * T_PTROBJ > ===================== > > Then the generated HMM.c will have a function called INT2PTR to do > the pointer conversion. I believe this should solve the warnings. > > Attached are the updated HMM.xs and typemap. Can someone with a 64- > bit machine give it a try? > > Thank you > Yee Man > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" > >, "BioPerl List" >> Date: Thursday, August 13, 2009, 5:31 PM >> (just to point out to everyone, Yee >> Man's contact information was in the POD) >> >> Yee Man, >> >> I have the output in the below link: >> >> http://gist.github.com/167542 >> >> There are similar problems popping up on 32- and 64-bit >> perl 5.10.0, Mac OS X 10.5. Haven't had time to debug >> it unfortunately. >> >> I think we should seriously consider spinning this code off >> into it's own distribution for CPAN. It's >> unfortunately bit-rotting away in bioperl-ext. If you >> want to continue supporting it I can help set that up. >> >> chris >> >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >> >>> Hi >>> >>> So is this an HMM only problem? Or does >> it apply to other bioperl-ext modules? >>> >>> What exactly are the compilation errors >> for HMM? I believe my implementation is just a simple one >> based on Rabiner's paper. >>> >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>> ~murphyk%2FBayes >>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>> >>> I don't think I did anything fancy that >> makes it machine dependent or non-ANSI C. >>> >>> Yee Man >>> >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Jonny Dalzell" , >> "BioPerl List" , >> "Yee Man Chan" >>>> Date: Thursday, August 13, 2009, 3:18 PM >>>> >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >>>> >>>>> Jonny Dalzell wrote: >>>>>> Is it ridiculous of me to expect ubuntu to >> take >>>> care of this for me? How do >>>>>> I go about compiling the HMM? >>>>> Yes. This is a very specialized thing >> that >>>> you're doing, and Ubuntu does not have the >> resources to >>>> package every single thing. >>>>> >>>>> Unfortunately, it looks like bioperl-ext >> package is >>>> not installable under Ubuntu 9.04 anyway, which is >> what I'm >>>> running. For others on this list, if >> somebody is >>>> interested in doing maintaining it, I'd be happy >> to help out >>>> by testing on Debian-based Linux platforms. >> We need to >>>> clarify this package's maintenance status: if >> there is >>>> nobody interested in maintaining it, I would >> recommend that >>>> bioperl-ext be removed from distribution. >> It's not in >>>> anybody's interest to have unmaintained software >> out there >>>> causing confusion. >>>> >>>> I have cc'd Yee Man Chan for this. If there >> isn't a >>>> response or the message bounces, we do one of two >> things: >>>> >>>> 1) consider it deprecated (probably safest). >>>> 2) spin it out into a separate module. >>>> >>>> Just tried to comile it myself and am getting >> errors (using >>>> 64bit perl 5.10), so I think, unless someone wants >> to take >>>> this on, option #1 is best. >>>> >>>>> So Jonny, in short, I would say "do not use >>>> bioperl-ext". >>>> >>>> In general, that's a safe bet. We're moving >> most of >>>> our C/C++ bindings to BioLib. >>>> >>>>> Step back. What are you trying to >>>> accomplish? Chris already recommended some >> alternative >>>> methods in his email of 8/11 on this >> subject. Perhaps >>>> we can guide you to some software that is >> actively >>>> maintained and will meet your needs. >>>>> >>>>> Rob >>>> >>>> Exactly. Lots of other (better supported!) >> options >>>> out there. HMMER, SeqAn, and others. >>>> >>>> chris >>>> >>> >>> >>> >> >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Aug 14 11:53:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:53:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> Message-ID: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> On Aug 14, 2009, at 10:12 AM, Dave Messina wrote: > Is this using bioperl-live? > > Sorry, should've said before. Yes, it's bioperl-live (r15927). > > > I recall this being a problem but I thought it was addressed in svn > (and soon in the next point release). > > Hmm, the only recent somewhat related change I see (in > Bio::AlignIO::*, anyway) is: > > ------------------------------------------------------------------------ > r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 > lines > > deprecate no_sequences/no_residues in main trunk (we can switch the > version to 1.7 if deemed necessary) > ------------------------------------------------------------------------ > > > Perhaps this is what you were thinking of? > > Dave Maybe not, then (for some reason I thought this was fixed within LocatableSeq). I know that it is possible to have an all-gap LocatableSeq; this works, but the default start/end/length aren't correct, which is part of Bernd's bug: use Modern::Perl; use Bio::LocatableSeq; my $seq = Bio::LocatableSeq->new( -seq => '-------------', -alphabet => 'dna', ); say $seq->start; # 1 say $seq->end; # undef (?) say $seq->length; # 13, counts the gaps The problem is, to fix all this relies on a whole slew of refactors for LocatableSeq and SimpleAlign. Some of this touches root components as well, so it'll need to be tried on a branch and will very likely result in some API changes (and thus may not be included in 1.6). I'll start a branch to get the process started. chris From jncline at gmail.com Fri Aug 14 15:41:21 2009 From: jncline at gmail.com (Jonathan Cline) Date: Fri, 14 Aug 2009 14:41:21 -0500 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: <99E27D08408340B9B0611751A17DF266@NewLife> References: <99E27D08408340B9B0611751A17DF266@NewLife> Message-ID: <4A85BDE1.5020002@gmail.com> Mark A. Jensen wrote: > Sorry, I cut off the last script. The entire thing follows: > This is exactly what I was looking for - thanks. A method to modify Makefile.PL, install in Activestate, etc is great. Perhaps your method could also be improved for portability by using `cygpath` although few cygwin installs modify this beyond the default (to get rid of hardcoded "/cygdrive/x/"). I will definitely save your code for later. I've implemented another workaround, which is to use Win32::Pipe and other Win32:: methods. This has problems of it's own (support is not 100%) and error-free implementation not as easy as requiring Activestate Perl, however it should work with both Activestate and cygwin-perl (and Unix). ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## > ----- Original Message ----- From: "Jonathan Cline" > To: > Cc: > Sent: Friday, July 31, 2009 11:24 PM > Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl > > >> I recently mentioned working on Bio::Robotics for Tecan. Vendors >> being MS-Win specific, the vendor software allows third-party software >> communication through a named pipe (the literal filename is >> "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific >> and this pseudo-pipe is opened with sysopen() ). This is broken under >> cygwin-perl due to cygwin's method of handling paths -- the sysopen >> fails. However it works under ActiveState Perl and communication >> through the named pipe (to the robot hardware) is OK. The standard >> workaround is usually to use cygwin bash, and force the PATH to use >> ActiveState perl. (Typical MS Windows incompatibility problem.) The >> issue is: Perl module libraries for CPAN work under cygwin-perl >> (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN >> module use, or "make test", result in a bad list of incompatibility >> problems. Yet ActiveState Perl is required for communicating to the >> vendor application (unless there is some workaround to raw filesystem >> access in cygwin-perl that I haven't found in 2 days of working this). >> The stand-alone scripts I have work fine to access the named pipe >> (using ActiveState Perl) since the standalone scripts have no module >> INC dependencies, no CPAN module test harness, etc etc. >> >> This isn't specifically a Bio:: issue, though if anyone has >> suggestions please email. I could try msys and see if it handles the >> named-pipe-special-file better, if msys has an msys-perl distribution. >> >> -- >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Fri Aug 14 19:29:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 18:29:43 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring Message-ID: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> As we have pretty much everything in place for another point release (which I will start merging over this weekend into the 1.6 branch), I have gone ahead and made two branches for refactoring some of the more important pieces of bioperl code. Both refactors may require API changes; if so these will be part of a 1.7 release. 1) GFF - entail refactoring bioperl code to better handle GFF2/3. This is a large section of code, so small incremental changes may be merged to trunk over time (and thus may involve several branches). Included is refactoring of feature typing to be more consistent and lightweight, and will initially involve Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be deprecated in the process). See the following for additional details: http://www.bioperl.org/wiki/GFF_Refactor 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to address significant bugs but will also entail cleaning up SimpleAlign methods (factoring out more utility-like methods into Bio::Align::AlignUtils or similar). This also may involve several branches. See the following for additional details: http://www.bioperl.org/wiki/Align_Refactor Any help/suggestions for the above two would be greatly appreciated! Robert Buels may be heading up the initial FeatureIO work; I will likely start on LocatableSeq/Align (Mark, wanna help?). chris From maj at fortinbras.us Fri Aug 14 19:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 19:45:01 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Hey Chris et al, I'm there on LocatableSeq, definitely. I do have one project to finish this weekend before I move to that: I'm planning to move Chase Miller's excellent NeXML read/write implementation into the trunk, complete with tests. If we can get it to pass the test suite, is there room in the point release for it? MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, August 14, 2009 7:29 PM Subject: [Bioperl-l] GFF and LocatableSeq refactoring > As we have pretty much everything in place for another point release > (which I will start merging over this weekend into the 1.6 branch), I > have gone ahead and made two branches for refactoring some of the more > important pieces of bioperl code. Both refactors may require API > changes; if so these will be part of a 1.7 release. > > 1) GFF - entail refactoring bioperl code to better handle GFF2/3. > > This is a large section of code, so small incremental changes may be > merged to trunk over time (and thus may involve several branches). > Included is refactoring of feature typing to be more consistent and > lightweight, and will initially involve Bio::FeatureIO and > Bio::SeqFeature::Annotated (which may be deprecated in the process). > See the following for additional details: > > http://www.bioperl.org/wiki/GFF_Refactor > > 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI > (SimpleAlign) and LocatableSeq. This is primarily to address > significant bugs but will also entail cleaning up SimpleAlign methods > (factoring out more utility-like methods into Bio::Align::AlignUtils > or similar). This also may involve several branches. See the > following for additional details: > > http://www.bioperl.org/wiki/Align_Refactor > > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will > likely start on LocatableSeq/Align (Mark, wanna help?). > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Fri Aug 14 19:50:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 Aug 2009 16:50:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <4A85F83A.30800@cornell.edu> Chris Fields wrote: > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will likely > start on LocatableSeq/Align (Mark, wanna help?). Sure, I'll head up the gff_refactor branch work. If you're interested in what changes are being planned for Bio::SeqFeature::*, Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the implementation plan Chris and I developed just now on IRC, which is at http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan Now soliciting suggestions, comments, and assistance. Rob From cjfields at illinois.edu Fri Aug 14 21:03:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 20:03:41 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Mark, re: NeXML, yes, of course. There'll be an alpha release or two prior to core 1.6.1 (I need to test the Build.PL/Bio::Root::Build changes Sendu added in). chris On Aug 14, 2009, at 6:45 PM, Mark A. Jensen wrote: > Hey Chris et al, I'm there on LocatableSeq, definitely. I do have > one project to finish this weekend before I move to that: I'm > planning to move Chase Miller's > excellent NeXML read/write implementation into the trunk, complete > with tests. If we can get it to pass the test suite, is there room > in the point release for it? > MAJ > ----- Original Message ----- From: "Chris Fields" > > To: "BioPerl List" > Sent: Friday, August 14, 2009 7:29 PM > Subject: [Bioperl-l] GFF and LocatableSeq refactoring > > >> As we have pretty much everything in place for another point >> release (which I will start merging over this weekend into the 1.6 >> branch), I have gone ahead and made two branches for refactoring >> some of the more important pieces of bioperl code. Both refactors >> may require API changes; if so these will be part of a 1.7 release. >> 1) GFF - entail refactoring bioperl code to better handle GFF2/3. >> This is a large section of code, so small incremental changes may >> be merged to trunk over time (and thus may involve several >> branches). Included is refactoring of feature typing to be more >> consistent and lightweight, and will initially involve >> Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be >> deprecated in the process). See the following for additional >> details: >> http://www.bioperl.org/wiki/GFF_Refactor >> 2) Align/LocatableSeq - dealing with inconsistencies in >> Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to >> address significant bugs but will also entail cleaning up >> SimpleAlign methods (factoring out more utility-like methods into >> Bio::Align::AlignUtils or similar). This also may involve several >> branches. See the following for additional details: >> http://www.bioperl.org/wiki/Align_Refactor >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Fri Aug 14 22:32:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 22:32:01 -0400 Subject: [Bioperl-l] on BP documentation Message-ID: <1F899AA92F94415186CB0B25306F1114@NewLife> Hi All -- Off-list, an old colleague of mine had this insightful, if damning, comment: >I guess that from my perspective, after doing this stuff for >about 10 years, I personally would prefer to see a "summer of >documentation" for the bio* languages (or at least bioperl, as that is >the only one I ever look at). From my own experiences, and from those >of many colleagues, the documentation for bioperl has gone from >mediocre to quite poor in the last few years. I largely think the >wikification of the docs are to blame for this. Even SeqIO is hard >to figure out now--it took me an hour the other day to figure out that >"desc" returns the full Fasta header, and I had to get that from the >module code + trial-and-error, instead of the online docs. There is >far too much inside baseball going on in the documentation scheme. >So I worry more about the constant adding of features at the expense >of documenting what is already there. This is just my 2 cents, and it >is disappointing to see a downward trend for bioperl in this regard. I would be really interested in all responses from the list users. I must agree that BP docs are rather a rat's nest and of varying quality, but taken in toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount of useful and sophisticated information available. I think there are approaches we can take to reorganize and standardize the accession of it to make it more useful and inviting. I disagree with my pal about the wikification, but I wager that the power of the wiki could be leveraged to greater advantage (right, Dan?). I think that what we all as developers love is to code, and detest is to document. Since BP is all-volunteer, and volunteers tend to do what they like -- the beauty of open source, btw -- documentation reorg and cleanup probably must devolve to the Core. I am willing to lead such an effort, which will take some time, and more time the fewer volunteers there are. First let's hear some thoughts, and 'let it all hang out', as they said in my mom's era. cheers Mark From cjfields at illinois.edu Fri Aug 14 23:41:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 22:41:10 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> On Aug 14, 2009, at 9:32 PM, Mark A. Jensen wrote: > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >> I guess that from my perspective, after doing this stuff for >> about 10 years, I personally would prefer to see a "summer of >> documentation" for the bio* languages (or at least bioperl, as that >> is >> the only one I ever look at). From my own experiences, and from >> those >> of many colleagues, the documentation for bioperl has gone from >> mediocre to quite poor in the last few years. I largely think the >> wikification of the docs are to blame for this. Even SeqIO is hard >> to figure out now--it took me an hour the other day to figure out >> that >> "desc" returns the full Fasta header, and I had to get that from the >> module code + trial-and-error, instead of the online docs. There is >> far too much inside baseball going on in the documentation scheme. > >> So I worry more about the constant adding of features at the expense >> of documenting what is already there. This is just my 2 cents, and >> it >> is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I > must agree > that BP docs are rather a rat's nest and of varying quality, but > taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal > about the > wikification, but I wager that the power of the wiki could be > leveraged > to greater advantage (right, Dan?). To me good documentation should be a combination of both wiki docs (HOWTOs, scraps, cookbook-y code) and inline POD. We can't forsake one for the other. If I had a preference, I would take more up-to- date POD over wiki (maybe adding a Status: for the methods), but a good HOWTO goes a long way in helping. It's just too hard to cover every use case. It's unfortunate that documentation is very poor for many modules, but at the same time it's also exceptionally hard to write documentation for modules one has had no part in developing. I think this is the main reason the docs are in the state they are in (not to point the finger of blame at anyone, I'm just as much to blame). > I think that what we all as developers love is to code, and detest > is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it > all hang out', > as they said in my mom's era. > > cheers > Mark Two things: 1) Take advantage of the proposed restructuring effort (as well as some of the refactoring are doing) to add decent documentation where possible. This means updating method docs and updating the HOWTO's as needed, or adding new HOWTO's (Jason has indicated this in the past). 2) Pinpoint areas where docs are desperately needed first. Other wiki docs could also use updating. As an example, the above author's question on FASTA and desc() is actually answered in the FAQ, but the question doesn't make it easy to find: http://www.bioperl.org/wiki/FAQ#I_would_like_to_make_my_own_custom_fasta_header_-_how_do_I_do_this.3F chris From David.Messina at sbc.su.se Sat Aug 15 03:49:59 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 09:49:59 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <628aabb70908150049h64f83b8ewb30d916f0534e40d@mail.gmail.com> > > To me good documentation should be a combination of both wiki docs (HOWTOs, > scraps, cookbook-y code) and inline POD. We can't forsake one for the > other. > I think this notion is already kinda there de facto (inside baseball? :)), but perhaps we should make clear the idea that: - POD is the reference manual, with each method's capabilities described comprehensively and in detail. - The wiki is tutorials (bptutorial, Jason's slides), use cases (HOWTOs and Scrapbook), and FAQ And actually all the POD is accessible online from the wiki at doc.bioperl.org, too (although maybe a little hard to find -- it's under Developer--API Docs). > If I had a preference, I would take more up-to-date POD over wiki (maybe > adding a Status: for the methods), but a good HOWTO goes a long way in > helping. It's just too hard to cover every use case. > I'd agree with this, too, partly because I think the HOWTOs are in pretty good shape, covering the most common stuff pretty well, and partly because I think the reference manual has to be complete, both for a user coming to find out how to use it and for authors ensuring that their internal model of how the code works actually hangs together. Mark, one attack point for a documentation improvement effort would be to take a survey of the PODs and see how well they are fulfilling the role of a reference manual. But part of a good reference manual is knowing how to find what you're looking for, and indeed I think that's maybe the main overall problem with trying to document anything as big and complicated as BioPerl. So for me, the organization of our copious docs might benefit from some attention. The goal of providing a way to find information better handled by the wiki, which does searching and crossreferencing much better than POD. To take your friend's FASTA header example, I might expect to be able to search for 'FASTA' or 'FASTA header' on the wiki and find something which guides me to the answer. A search for 'FASTA' gives a list of pointers, including the 'FASTA sequence format' page. That page almost gives the right answer (see the Note section), but perhaps it might be a nice place to say that in BioPerl, a FASTA sequence is a Bio::Seq, and that the header is $seq->desc and the seq is $seq->seq. And there could be an equivalent page for the other common formats, breaking down how the format maps to an object. [...] it's also exceptionally hard to write documentation for modules one > has had no part in developing. I think this is the main reason the docs are > in the state they are in (not to point the finger of blame at anyone, I'm > just as much to blame). Absolutely, and maybe a first step would be to contact the authors of a module with out-of-date docs and ask for them to fix it, in the same way one would go to the author with a bug in their code. Core+volunteers will certainly be needed for organizing the effort and assessing the state of BioPerl documentation as a whole, but give authors the opportunity to take care of their code, too. Two things: > > 1) Take advantage of the proposed restructuring effort (as well as some of > the refactoring are doing) to add decent documentation where possible. This > means updating method docs and updating the HOWTO's as needed, or adding new > HOWTO's (Jason has indicated this in the past). > This is a great idea. > 2) Pinpoint areas where docs are desperately needed first. > > Other wiki docs could also use updating. As an example, the above author's > question on FASTA and desc() is actually answered in the FAQ, Absolutely. Maybe some of the FAQs could actually be added back to the relevant PODs? Dave From David.Messina at sbc.su.se Sat Aug 15 04:00:50 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 10:00:50 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> Message-ID: <628aabb70908150100ka8c21aahe2bf7d636fa94112@mail.gmail.com> > > I know that it is possible to have an all-gap LocatableSeq You can, but to avoid the "can't guess alphabet" error I'm getting you have to set the alphabet manually (which AlignIO does not). I'll start a branch to get the process started. Terrific! In the meantime, then, I'll just use the -nowarnonempty workaround in my local copy of AlignIO. Dave From bernd.web at gmail.com Sat Aug 15 07:17:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Sat, 15 Aug 2009 13:17:44 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Hi >>? Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. I was a bit surprised about $seq->desc retrieving the entire FASTA header line Actually, in Bioperl 1.52 at least $seq->desc returns the description only, so without the ID. Thus, to get the entire FASTA header line $seq->id . " " $seq->desc would be needed. For the modules I use (mainly related to sequences, such as SeqIO, SimpleAlign), I'd be happy to contribute on docs, checking docs, or examples. Regards, Bernd From sanjaysingh765 at gmail.com Sat Aug 15 09:38:18 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Sat, 15 Aug 2009 19:08:18 +0530 Subject: [Bioperl-l] BLINK PARSER Message-ID: Hi, I want to submit query to NCBI'S BLINK and parsed the result for the best hit. is there anyone have script to do so.i would be very grateful if someone would like to share it with me. regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From jimhu at tamu.edu Sat Aug 15 11:01:15 2009 From: jimhu at tamu.edu (Jim Hu) Date: Sat, 15 Aug 2009 10:01:15 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? Message-ID: Over on the Gbrowse list, Don Gilbert explained to me why genbank2gff3.pl is having problems with prokaryotic genomes. Has anyone written an alternative? Jim Hu ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Sat Aug 15 11:27:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:27:01 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: References: Message-ID: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> We (bioperl devs and users) would be very interested to have something like this included. I ran into a similar problem with genbank2gff3 a year ago with some of our work here on Archaea. I managed to get enough data out to get gbrowse up-and-running, but it required quite a bit of hand-editing. In fact, seeing as we're refactoring GFF and other aspects of Features in bioperl, this may be the best time to add something in. chris On Aug 15, 2009, at 10:01 AM, Jim Hu wrote: > Over on the Gbrowse list, Don Gilbert explained to me why > genbank2gff3.pl is having problems with prokaryotic genomes. Has > anyone written an alternative? > > Jim Hu > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 15 11:55:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:55:44 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Message-ID: On Aug 15, 2009, at 6:17 AM, Bernd Web wrote: > Hi > >>> Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. > I was a bit surprised about $seq->desc retrieving the entire FASTA > header line > Actually, in Bioperl 1.52 at least $seq->desc returns the description > only, so without the ID. Thus, to get the entire FASTA header line > $seq->id . " " $seq->desc would be needed. Odd, not seeing where a change was made that would cause this behavior. Can you post an example? > For the modules I use (mainly related to sequences, such as SeqIO, > SimpleAlign), I'd be happy to contribute on docs, checking docs, or > examples. > > Regards, > Bernd Would be nice to have an Align/SimpleAlign HOWTO, but seeing as we want to refactor large chunks of that code, it might be slightly premature. That is, unless we want to document what behavior we expect to see as a sort of ROADMAP (maybe as part of the refactoring page). That could then be converted over to a HOWTO. Feel free to chip in on this in any way possible. The more documentation the better. chris From rmb32 at cornell.edu Sat Aug 15 12:44:03 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 09:44:03 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <85143.35343.qm@web30404.mail.mud.yahoo.com> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A86E5D3.3030906@cornell.edu> The usual procedure for developing code is to exchange code via commits to a version control system. Yee, do you know how to use Subversion? Does Yee need a commit bit? Rob Yee Man Chan wrote: > Hi Chris > > I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) > > Please let me know if it works for you. > > Sorry for the bug... > Yee Man > > --- On Fri, 8/14/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" >> Date: Friday, August 14, 2009, 8:31 AM >> Yee Man, >> >> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >> appears to be 32-bit). The patch results in cleaning >> up warnings for 5.10.0 but results in similar warnings for >> 5.8.8 (linux or OS X). >> >> On OS X perl 5.8.8, this sometimes passes (note the first >> attempt fails, the second succeeds), so it's not entirely a >> 32-bit issue: >> >> http://gist.github.com/167860 >> >> OS X and perl 5.10.0, this always fails as the previous >> gist shows, but demonstrates similar behavior (multiple >> attempts to test get different responses): >> >> http://gist.github.com/167542 >> >> On linux, everything passes with or w/o the patched files >> (patched files have warnings as indicated above): >> >> Specs for all three perl executables (they vary a bit): >> >> http://gist.github.com/167883 >> >> chris >> >> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >> >>> Ah.. I find that the typemap can become as simple as >> this >>> ===================== >>> TYPEMAP >>> HMM * T_PTROBJ >>> ===================== >>> >>> Then the generated HMM.c will have a function called >> INT2PTR to do the pointer conversion. I believe this should >> solve the warnings. >>> Attached are the updated HMM.xs and typemap. Can >> someone with a 64-bit machine give it a try? >>> Thank you >>> Yee Man >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>> Date: Thursday, August 13, 2009, 5:31 PM >>>> (just to point out to everyone, Yee >>>> Man's contact information was in the POD) >>>> >>>> Yee Man, >>>> >>>> I have the output in the below link: >>>> >>>> http://gist.github.com/167542 >>>> >>>> There are similar problems popping up on 32- and >> 64-bit >>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >> to debug >>>> it unfortunately. >>>> >>>> I think we should seriously consider spinning this >> code off >>>> into it's own distribution for CPAN. It's >>>> unfortunately bit-rotting away in >> bioperl-ext. If you >>>> want to continue supporting it I can help set that >> up. >>>> chris >>>> >>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>> >>>>> Hi >>>>> >>>>> So is this an HMM only >> problem? Or does >>>> it apply to other bioperl-ext modules? >>>>> What exactly are the >> compilation errors >>>> for HMM? I believe my implementation is just a >> simple one >>>> based on Rabiner's paper. >>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>> >>>>> I don't think I did >> anything fancy that >>>> makes it machine dependent or non-ANSI C. >>>>> Yee Man >>>>> >>>>> --- On Thu, 8/13/09, Chris Fields >>>> wrote: >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Jonny Dalzell" , >>>> "BioPerl List" , >>>> "Yee Man Chan" >>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>> >>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >> wrote: >>>>>>> Jonny Dalzell wrote: >>>>>>>> Is it ridiculous of me to expect >> ubuntu to >>>> take >>>>>> care of this for me? How do >>>>>>>> I go about compiling the HMM? >>>>>>> Yes. This is a very specialized >> thing >>>> that >>>>>> you're doing, and Ubuntu does not have >> the >>>> resources to >>>>>> package every single thing. >>>>>>> Unfortunately, it looks like >> bioperl-ext >>>> package is >>>>>> not installable under Ubuntu 9.04 anyway, >> which is >>>> what I'm >>>>>> running. For others on this list, >> if >>>> somebody is >>>>>> interested in doing maintaining it, I'd be >> happy >>>> to help out >>>>>> by testing on Debian-based Linux >> platforms. >>>> We need to >>>>>> clarify this package's maintenance status: >> if >>>> there is >>>>>> nobody interested in maintaining it, I >> would >>>> recommend that >>>>>> bioperl-ext be removed from distribution. >>>> It's not in >>>>>> anybody's interest to have unmaintained >> software >>>> out there >>>>>> causing confusion. >>>>>> >>>>>> I have cc'd Yee Man Chan for this. >> If there >>>> isn't a >>>>>> response or the message bounces, we do one >> of two >>>> things: >>>>>> 1) consider it deprecated (probably >> safest). >>>>>> 2) spin it out into a separate module. >>>>>> >>>>>> Just tried to comile it myself and am >> getting >>>> errors (using >>>>>> 64bit perl 5.10), so I think, unless >> someone wants >>>> to take >>>>>> this on, option #1 is best. >>>>>> >>>>>>> So Jonny, in short, I would say "do >> not use >>>>>> bioperl-ext". >>>>>> >>>>>> In general, that's a safe bet. We're >> moving >>>> most of >>>>>> our C/C++ bindings to BioLib. >>>>>> >>>>>>> Step back. What are you trying >> to >>>>>> accomplish? Chris already >> recommended some >>>> alternative >>>>>> methods in his email of 8/11 on this >>>> subject. Perhaps >>>>>> we can guide you to some software that is >>>> actively >>>>>> maintained and will meet your needs. >>>>>>> Rob >>>>>> Exactly. Lots of other (better >> supported!) >>>> options >>>>>> out there. HMMER, SeqAn, and >> others. >>>>>> chris >>>>>> >>>>> >>>>> >>>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj at fortinbras.us Sat Aug 15 13:40:26 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 13:40:26 -0400 Subject: [Bioperl-l] BLINK PARSER In-Reply-To: References: Message-ID: <34DBCBEA5E2D49A892E5077AA780BA4E@NewLife> Hi Sanjay- I'm not sure BioPerl has an interface specifically for BLINK (I will be corrected if I'm wrong, so stay tuned). If you can obtain the "raw" blast output for the protein you're interested in ( doing [BLINK] then [Other Views: BLAST] then [Format:Show: Alignment as Plain text] ) that text can be parsed using the Bio::SearchIO tools, and you can use Bio::Search::Tiling to obtain the 'best' hsps. This may not be too helpful, I'm afraid, but it is where I would start. Mark ----- Original Message ----- From: "sanjay singh" To: Sent: Saturday, August 15, 2009 9:38 AM Subject: [Bioperl-l] BLINK PARSER > Hi, > I want to submit query to NCBI'S BLINK and parsed the result for the best > hit. is there anyone have script to do so.i would be very grateful if > someone would like to share it with me. > regards > sanjay > > -- > Happy moments , praise God. > Difficult moments, seek God. > Quiet moments, worship God. > Painful moments, trust God. > Every moment, thank God > > Sanjay Kumar Singh > Bose Institute > 93\1,A.P.C.Road > Kolkata-700 009 > West Bengal > India > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 15 15:11:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 14:11:48 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A86E5D3.3030906@cornell.edu> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> <4A86E5D3.3030906@cornell.edu> Message-ID: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> I'm not sure, but it makes more sense to commit these changes directly. Yee, need us to set you up with a commit bit? If so, fill out the information on this page: http://www.bioperl.org/wiki/SVN_Account_Request and forward it to support at open-bio.org. I'll sponsor you. chris On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > The usual procedure for developing code is to exchange code via > commits to a version control system. Yee, do you know how to use > Subversion? Does Yee need a commit bit? > > Rob > > Yee Man Chan wrote: >> Hi Chris >> I find that there is a memory access bug in my code. Attached is >> the fixed HMM.xs. This file together with the simpler typemap >> should fix all problems. (I hope..) >> Please let me know if it works for you. >> Sorry for the bug... >> Yee Man >> --- On Fri, 8/14/09, Chris Fields wrote: >>> From: Chris Fields >>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >>> WinVista? >>> To: "Yee Man Chan" >>> Cc: "Robert Buels" , "Jonny Dalzell" >> >, "BioPerl List" >>> Date: Friday, August 14, 2009, 8:31 AM >>> Yee Man, >>> >>> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >>> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >>> appears to be 32-bit). The patch results in cleaning >>> up warnings for 5.10.0 but results in similar warnings for >>> 5.8.8 (linux or OS X). >>> >>> On OS X perl 5.8.8, this sometimes passes (note the first >>> attempt fails, the second succeeds), so it's not entirely a >>> 32-bit issue: >>> >>> http://gist.github.com/167860 >>> >>> OS X and perl 5.10.0, this always fails as the previous >>> gist shows, but demonstrates similar behavior (multiple >>> attempts to test get different responses): >>> >>> http://gist.github.com/167542 >>> >>> On linux, everything passes with or w/o the patched files >>> (patched files have warnings as indicated above): >>> >>> Specs for all three perl executables (they vary a bit): >>> >>> http://gist.github.com/167883 >>> >>> chris >>> >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >>> >>>> Ah.. I find that the typemap can become as simple as >>> this >>>> ===================== >>>> TYPEMAP >>>> HMM * T_PTROBJ >>>> ===================== >>>> >>>> Then the generated HMM.c will have a function called >>> INT2PTR to do the pointer conversion. I believe this should >>> solve the warnings. >>>> Attached are the updated HMM.xs and typemap. Can >>> someone with a 64-bit machine give it a try? >>>> Thank you >>>> Yee Man >>>> --- On Thu, 8/13/09, Chris Fields >>> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >>> package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >>> "Jonny Dalzell" , >>> "BioPerl List" >>>>> Date: Thursday, August 13, 2009, 5:31 PM >>>>> (just to point out to everyone, Yee >>>>> Man's contact information was in the POD) >>>>> >>>>> Yee Man, >>>>> >>>>> I have the output in the below link: >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> There are similar problems popping up on 32- and >>> 64-bit >>>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >>> to debug >>>>> it unfortunately. >>>>> >>>>> I think we should seriously consider spinning this >>> code off >>>>> into it's own distribution for CPAN. It's >>>>> unfortunately bit-rotting away in >>> bioperl-ext. If you >>>>> want to continue supporting it I can help set that >>> up. >>>>> chris >>>>> >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> So is this an HMM only >>> problem? Or does >>>>> it apply to other bioperl-ext modules? >>>>>> What exactly are the >>> compilation errors >>>>> for HMM? I believe my implementation is just a >>> simple one >>>>> based on Rabiner's paper. >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>> ~murphyk%2FBayes >>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>> >>>>>> I don't think I did >>> anything fancy that >>>>> makes it machine dependent or non-ANSI C. >>>>>> Yee Man >>>>>> >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >>> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Robert Buels" >>>>>>> Cc: "Jonny Dalzell" , >>>>> "BioPerl List" , >>>>> "Yee Man Chan" >>>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>>> >>>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >>> wrote: >>>>>>>> Jonny Dalzell wrote: >>>>>>>>> Is it ridiculous of me to expect >>> ubuntu to >>>>> take >>>>>>> care of this for me? How do >>>>>>>>> I go about compiling the HMM? >>>>>>>> Yes. This is a very specialized >>> thing >>>>> that >>>>>>> you're doing, and Ubuntu does not have >>> the >>>>> resources to >>>>>>> package every single thing. >>>>>>>> Unfortunately, it looks like >>> bioperl-ext >>>>> package is >>>>>>> not installable under Ubuntu 9.04 anyway, >>> which is >>>>> what I'm >>>>>>> running. For others on this list, >>> if >>>>> somebody is >>>>>>> interested in doing maintaining it, I'd be >>> happy >>>>> to help out >>>>>>> by testing on Debian-based Linux >>> platforms. >>>>> We need to >>>>>>> clarify this package's maintenance status: >>> if >>>>> there is >>>>>>> nobody interested in maintaining it, I >>> would >>>>> recommend that >>>>>>> bioperl-ext be removed from distribution. >>>>> It's not in >>>>>>> anybody's interest to have unmaintained >>> software >>>>> out there >>>>>>> causing confusion. >>>>>>> >>>>>>> I have cc'd Yee Man Chan for this. >>> If there >>>>> isn't a >>>>>>> response or the message bounces, we do one >>> of two >>>>> things: >>>>>>> 1) consider it deprecated (probably >>> safest). >>>>>>> 2) spin it out into a separate module. >>>>>>> >>>>>>> Just tried to comile it myself and am >>> getting >>>>> errors (using >>>>>>> 64bit perl 5.10), so I think, unless >>> someone wants >>>>> to take >>>>>>> this on, option #1 is best. >>>>>>> >>>>>>>> So Jonny, in short, I would say "do >>> not use >>>>>>> bioperl-ext". >>>>>>> >>>>>>> In general, that's a safe bet. We're >>> moving >>>>> most of >>>>>>> our C/C++ bindings to BioLib. >>>>>>> >>>>>>>> Step back. What are you trying >>> to >>>>>>> accomplish? Chris already >>> recommended some >>>>> alternative >>>>>>> methods in his email of 8/11 on this >>>>> subject. Perhaps >>>>>>> we can guide you to some software that is >>>>> actively >>>>>>> maintained and will meet your needs. >>>>>>>> Rob >>>>>>> Exactly. Lots of other (better >>> supported!) >>>>> options >>>>>>> out there. HMMER, SeqAn, and >>> others. >>>>>>> chris >>>>>>> >>>>>> >>>>>> >>>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam >>> protection around >>>> http://mail.yahoo.com >>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From hlapp at gmx.net Sat Aug 15 15:41:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 15:41:56 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: On Aug 14, 2009, at 11:41 PM, Chris Fields wrote: > I would take more up-to-date POD over wiki (maybe adding a Status: > for the methods), but a good HOWTO goes a long way in helping. It's > just too hard to cover every use case. I'd very much second this. An API documentation should arguably be written by the developer(s) and hence I would expect to find in the PODs. Use-cases, however, and how to solve those in BioPerl can and should be contributed by everyone, and the wiki is just way better at facilitating this. As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 15:53:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 15:53:31 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: ----- Original Message ----- From: "Hilmar Lapp" ... > As for the FASTA example, I can understand - I've heard repeatedly > from people that one of the things that they are missing is > documentation for every SeqIO format we support (such as GenBank, > UniProt, FASTA, etc) about where to find a particular piece of the > format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ From hlapp at gmx.net Sat Aug 15 16:09:14 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:09:14 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > I'm planning to move Chase Miller's excellent NeXML read/write > implementation into the trunk, complete with tests. If we can get it > to pass the test suite, is there room in the point release for it? We've in the past stayed away from adding new features to stable branches with the exception of new methods in existing classes and that didn't do anything complicated. I'm not sure I remember everything but I think the NeXML support does exceed that level, doesn't it? Can it be rolled into its own pre- release that is a drop-in to an existing 1.6.x installation for those who want to go there? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 15 16:12:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:12:35 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A85F83A.30800@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: Great! Two suggestions: > ? deprecate the get_Annotations(Str) method in favor of > get_annotation(Str), which adheres better to standard perl method > naming Yes, but also is then inconsistent with existing BioPerl naming, with the method name indicating what type of object you get back (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in Bio::SeqI). > ? finally, split Bio::FeatureIO modules off into their own CPAN > distribution Wouldn't one start with this? -hilmar On Aug 14, 2009, at 7:50 PM, Robert Buels wrote: > Chris Fields wrote: >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). > > Sure, I'll head up the gff_refactor branch work. If you're > interested in what changes are being planned for Bio::SeqFeature::*, > Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the > implementation plan Chris and I developed just now on IRC, which is at > > http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan > > Now soliciting suggestions, comments, and assistance. > > Rob > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Sat Aug 15 16:24:35 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 13:24:35 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <4A871983.4010702@cornell.edu> Hilmar Lapp wrote: > I'm not sure I remember everything but I think the NeXML support does > exceed that level, doesn't it? Can it be rolled into its own pre-release > that is a drop-in to an existing 1.6.x installation for those who want > to go there? So split it out into its own CPAN dist. Rob From maj at fortinbras.us Sat Aug 15 16:36:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 16:36:47 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Yes, I'd say the Nexml support exceeds the 'complicated' test. There are no modifications to existing modules (except for the addition of annotation attributes to members of the Bio::PopGen model, which are don't-cares to anything out there currently). The manifest of a NeXML drop-in would look like Bio/NexmlIO.pm Bio/Nexml/Factory.pm Bio/SeqIO/nexml.pm Bio/AlignIO/nexml.pm Bio/TreeIO/nexml.pm and, if I get it completed, support for arbitrary characters via Bio::PopGen Bio/PopGen/IO/nexml.pm (all based on hacks of Chase's code, btw; we thought it would round out the package nicely...) Of course, the big dependency that not everyone will need or want is Rutger's Bio::Phylo, so the Nexml support will have to be optional even in 1.7, I think. I am adding run-time checks for Bio::Phylo in the modules so they die relatively gracefully and informatively, rather than just barf. Also, the tests will have appropriate skip blocks. I do want to get the code into bioperl-live, however, unless there's a gotcha there I'm not seeing-- cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:09 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > >> I'm planning to move Chase Miller's excellent NeXML read/write >> implementation into the trunk, complete with tests. If we can get it to pass >> the test suite, is there room in the point release for it? > > > We've in the past stayed away from adding new features to stable branches > with the exception of new methods in existing classes and that didn't do > anything complicated. > > I'm not sure I remember everything but I think the NeXML support does exceed > that level, doesn't it? Can it be rolled into its own pre- release that is a > drop-in to an existing 1.6.x installation for those who want to go there? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From hlapp at gmx.net Sat Aug 15 16:49:22 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:49:22 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Message-ID: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > I do want to get the code into bioperl-live, however, unless there's > a gotcha there I'm not seeing-- That sounds great to me, though it may make some of Chris' hair stand on end if he wants this to go into a separate module from the start :) Maybe a phylogenetics module can be carved out that this would become part of? Though I recall someone saying recently that Bio::Species and by extension Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to split out. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 17:07:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 17:07:30 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> Message-ID: <659CA35CE3AD464AA516D18B313311BE@NewLife> I'm all for an attempt to split out phylogenetic stuff, it seems natural, and think in terms of a phylo package dependent upon a sequence package, and if necessary vice versa -- although if the Bio::Species - Bio::Tree::Node connection is relatively loose, perhaps we can refactor to make some attributes/methods optional features that carp when the phylo package is not installed. (Roles, anyone?) However, probably 1.6.x doesn't sound like the place to do that! I myself wouldn't have any problem waiting till 1.7 for 'official' Nexml support--but I hope Chase will chime in on that. What does Chris think? MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:49 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > >> I do want to get the code into bioperl-live, however, unless there's a >> gotcha there I'm not seeing-- > > > That sounds great to me, though it may make some of Chris' hair stand on end > if he wants this to go into a separate module from the start :) Maybe a > phylogenetics module can be carved out that this would become part of? Though > I recall someone saying recently that Bio::Species and by extension > Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to > split out. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From rmb32 at cornell.edu Sat Aug 15 17:23:40 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:23:40 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: <4A87275C.5040300@cornell.edu> Hilmar Lapp wrote: >> ? deprecate the get_Annotations(Str) method in favor of >> get_annotation(Str), which adheres better to standard perl method naming > > Yes, but also is then inconsistent with existing BioPerl naming, with > the method name indicating what type of object you get back > (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in > Bio::SeqI). Blech. OK never mind about the method rename then. > >> ? finally, split Bio::FeatureIO modules off into their own CPAN >> distribution > > Wouldn't one start with this? Yeah....I've kind of been vacillating back and forth about whether it would be best to *start* with this, or to end with this. Probably makes more sense to start with it, since it gives more freedom to add dependencies on more CPAN stuff without worrying too much. Like...oh...I don't know...Moose? Thoughts on this? Rob From rmb32 at cornell.edu Sat Aug 15 17:25:51 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:25:51 -0700 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> Message-ID: <4A8727DF.7000204@cornell.edu> Chris Fields wrote: > In fact, seeing as we're refactoring GFF and other aspects of Features > in bioperl, this may be the best time to add something in. Reading that thread, it sounds like most of the issues revolve around when and how to use the unflattener. Perhaps just adding another command line switch or two to the script would be appropriate? Editorializing a bit, it's really disheartening that Genbank stores features in such a lossy way. Rob From cjfields at illinois.edu Sat Aug 15 22:05:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:05:41 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <241652.96493.qm@web30404.mail.mud.yahoo.com> References: <241652.96493.qm@web30404.mail.mud.yahoo.com> Message-ID: I'm still seeing the same errors on Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl (v5.8.8) passes fine now (as well as perl 5.8.8 on dev.open-bio.org). I'm wondering if this is a problem with my local perl build. I'm very tempted to push the HMM-related code into a separate distribution (bioperl-hmm) and make a CPAN release out of it so it gets wider testing via CPAN testers; it would just require a minimum bioperl 1.6 installation for Bio::Tools::HMM and any related modules. Yee, would that be okay with you? chris On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > I just committed HMM.xs and typemap to SVN. Can you test it to > confirm it works in 64-bit machines? > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 12:11 PM >> I'm not sure, but it makes more sense >> to commit these changes directly. Yee, need us to set >> you up with a commit bit? If so, fill out the >> information on this page: >> >> http://www.bioperl.org/wiki/SVN_Account_Request >> >> and forward it to support at open-bio.org. >> I'll sponsor you. >> >> chris >> >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >> >>> The usual procedure for developing code is to exchange >> code via commits to a version control system. Yee, do >> you know how to use Subversion? Does Yee need a commit bit? >>> >>> Rob >>> >>> Yee Man Chan wrote: >>>> Hi Chris >>>> I find that there is a memory >> access bug in my code. Attached is the fixed HMM.xs. This >> file together with the simpler typemap should fix all >> problems. (I hope..) >>>> Please let me know if it works >> for you. >>>> Sorry for the bug... >>>> Yee Man >>>> --- On Fri, 8/14/09, Chris Fields >> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>>> Date: Friday, August 14, 2009, 8:31 AM >>>>> Yee Man, >>>>> >>>>> I tested this out locally (perl 5.8.8 32-bit, >> perl 5.10.0 >>>>> 64-bit) and on dev.open-bio.org (which is perl >> 5.8.8, >>>>> appears to be 32-bit). The patch results >> in cleaning >>>>> up warnings for 5.10.0 but results in similar >> warnings for >>>>> 5.8.8 (linux or OS X). >>>>> >>>>> On OS X perl 5.8.8, this sometimes passes >> (note the first >>>>> attempt fails, the second succeeds), so it's >> not entirely a >>>>> 32-bit issue: >>>>> >>>>> http://gist.github.com/167860 >>>>> >>>>> OS X and perl 5.10.0, this always fails as the >> previous >>>>> gist shows, but demonstrates similar behavior >> (multiple >>>>> attempts to test get different responses): >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> On linux, everything passes with or w/o the >> patched files >>>>> (patched files have warnings as indicated >> above): >>>>> >>>>> Specs for all three perl executables (they >> vary a bit): >>>>> >>>>> http://gist.github.com/167883 >>>>> >>>>> chris >>>>> >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan >> wrote: >>>>> >>>>>> Ah.. I find that the typemap can become as >> simple as >>>>> this >>>>>> ===================== >>>>>> TYPEMAP >>>>>> HMM * T_PTROBJ >>>>>> ===================== >>>>>> >>>>>> Then the generated HMM.c will have a >> function called >>>>> INT2PTR to do the pointer conversion. I >> believe this should >>>>> solve the warnings. >>>>>> Attached are the updated HMM.xs and >> typemap. Can >>>>> someone with a 64-bit machine give it a try? >>>>>> Thank you >>>>>> Yee Man >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>>> "Jonny Dalzell" , >>>>> "BioPerl List" >>>>>>> Date: Thursday, August 13, 2009, 5:31 >> PM >>>>>>> (just to point out to everyone, Yee >>>>>>> Man's contact information was in the >> POD) >>>>>>> >>>>>>> Yee Man, >>>>>>> >>>>>>> I have the output in the below link: >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> There are similar problems popping up >> on 32- and >>>>> 64-bit >>>>>>> perl 5.10.0, Mac OS X 10.5. >> Haven't had time >>>>> to debug >>>>>>> it unfortunately. >>>>>>> >>>>>>> I think we should seriously consider >> spinning this >>>>> code off >>>>>>> into it's own distribution for >> CPAN. It's >>>>>>> unfortunately bit-rotting away in >>>>> bioperl-ext. If you >>>>>>> want to continue supporting it I can >> help set that >>>>> up. >>>>>>> chris >>>>>>> >>>>>>> On Aug 13, 2009, at 6:58 PM, Yee Man >> Chan wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> So is this >> an HMM only >>>>> problem? Or does >>>>>>> it apply to other bioperl-ext >> modules? >>>>>>>> What >> exactly are the >>>>> compilation errors >>>>>>> for HMM? I believe my implementation >> is just a >>>>> simple one >>>>>>> based on Rabiner's paper. >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>> ~murphyk%2FBayes >>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>> >>>>>>>> I don't >> think I did >>>>> anything fancy that >>>>>>> makes it machine dependent or non-ANSI >> C. >>>>>>>> Yee Man >>>>>>>> >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Robert Buels" >>>>>>>>> Cc: "Jonny Dalzell" , >>>>>>> "BioPerl List" , >>>>>>> "Yee Man Chan" >>>>>>>>> Date: Thursday, August 13, >> 2009, 3:18 PM >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 4:37 PM, >> Robert Buels >>>>> wrote: >>>>>>>>>> Jonny Dalzell wrote: >>>>>>>>>>> Is it ridiculous of me >> to expect >>>>> ubuntu to >>>>>>> take >>>>>>>>> care of this for me? How >> do >>>>>>>>>>> I go about compiling >> the HMM? >>>>>>>>>> Yes. This is a very >> specialized >>>>> thing >>>>>>> that >>>>>>>>> you're doing, and Ubuntu does >> not have >>>>> the >>>>>>> resources to >>>>>>>>> package every single thing. >>>>>>>>>> Unfortunately, it looks >> like >>>>> bioperl-ext >>>>>>> package is >>>>>>>>> not installable under Ubuntu >> 9.04 anyway, >>>>> which is >>>>>>> what I'm >>>>>>>>> running. For others on >> this list, >>>>> if >>>>>>> somebody is >>>>>>>>> interested in doing >> maintaining it, I'd be >>>>> happy >>>>>>> to help out >>>>>>>>> by testing on Debian-based >> Linux >>>>> platforms. >>>>>>> We need to >>>>>>>>> clarify this package's >> maintenance status: >>>>> if >>>>>>> there is >>>>>>>>> nobody interested in >> maintaining it, I >>>>> would >>>>>>> recommend that >>>>>>>>> bioperl-ext be removed from >> distribution. >>>>>>> It's not in >>>>>>>>> anybody's interest to have >> unmaintained >>>>> software >>>>>>> out there >>>>>>>>> causing confusion. >>>>>>>>> >>>>>>>>> I have cc'd Yee Man Chan for >> this. >>>>> If there >>>>>>> isn't a >>>>>>>>> response or the message >> bounces, we do one >>>>> of two >>>>>>> things: >>>>>>>>> 1) consider it deprecated >> (probably >>>>> safest). >>>>>>>>> 2) spin it out into a separate >> module. >>>>>>>>> >>>>>>>>> Just tried to comile it myself >> and am >>>>> getting >>>>>>> errors (using >>>>>>>>> 64bit perl 5.10), so I think, >> unless >>>>> someone wants >>>>>>> to take >>>>>>>>> this on, option #1 is best. >>>>>>>>> >>>>>>>>>> So Jonny, in short, I >> would say "do >>>>> not use >>>>>>>>> bioperl-ext". >>>>>>>>> >>>>>>>>> In general, that's a safe >> bet. We're >>>>> moving >>>>>>> most of >>>>>>>>> our C/C++ bindings to BioLib. >>>>>>>>> >>>>>>>>>> Step back. What are >> you trying >>>>> to >>>>>>>>> accomplish? Chris >> already >>>>> recommended some >>>>>>> alternative >>>>>>>>> methods in his email of 8/11 >> on this >>>>>>> subject. Perhaps >>>>>>>>> we can guide you to some >> software that is >>>>>>> actively >>>>>>>>> maintained and will meet your >> needs. >>>>>>>>>> Rob >>>>>>>>> Exactly. Lots of other >> (better >>>>> supported!) >>>>>>> options >>>>>>>>> out there. HMMER, SeqAn, >> and >>>>> others. >>>>>>>>> chris >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >> __________________________________________________ >>>>>> Do You Yahoo!? >>>>>> Tired of spam? Yahoo! Mail has the >> best spam >>>>> protection around >>>>>> http://mail.yahoo.com >>>>> >> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> >>> >>> --Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >> >> > > > From cjfields at illinois.edu Sat Aug 15 22:49:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:49:25 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <659CA35CE3AD464AA516D18B313311BE@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> Message-ID: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> On Aug 15, 2009, at 4:07 PM, Mark A. Jensen wrote: > I'm all for an attempt to split out phylogenetic stuff, it > seems natural, and think in terms of a phylo package > dependent upon a sequence package, and if necessary > vice versa -- although if the Bio::Species - Bio::Tree::Node > connection is relatively loose, perhaps we can refactor to > make some attributes/methods optional features that carp > when the phylo package is not installed. (Roles, anyone?) I'm pretty sure they're linked very tightly (Species is-a Bio::Taxon is-a Bio::Tree::Node). This may be something Sendu needs to chime in on; he refactored much of that code prior to 1.5.2. As a suggestion, maybe we can use a combined strategy: fall back to a very simple Bio::Species container class if a bioperl-phylo isn't installed, but utilize Bio::Taxon when it is. > However, probably 1.6.x doesn't sound like the place to > do that! I myself wouldn't have any problem waiting till > 1.7 for 'official' Nexml support--but I hope Chase will chime > in on that. What does Chris think? > MAJ Robert's suggestion of a separate distribution makes sense; it may be one avenue of slowly migrating out phylo-specific code into it's own distribution. Not sure about calling it bioperl-phylo (which might be confused with Rutger's Bio::Phylo). chris From cjfields at illinois.edu Sat Aug 15 22:47:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:47:36 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <4A8727DF.7000204@cornell.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> <4A8727DF.7000204@cornell.edu> Message-ID: <81C3E545-4F0E-4B1F-9F06-398D1EE7A3CF@illinois.edu> On Aug 15, 2009, at 4:25 PM, Robert Buels wrote: > Chris Fields wrote: > > In fact, seeing as we're refactoring GFF and other aspects of > Features > > in bioperl, this may be the best time to add something in. > > Reading that thread, it sounds like most of the issues revolve > around when and how to use the unflattener. Perhaps just adding > another command line switch or two to the script would be appropriate? > > Editorializing a bit, it's really disheartening that Genbank stores > features in such a lossy way. > > Rob Just remembered: NCBI does supply GFF3 files for bacterial genomes, but I'm not sure how well they correspond to the GFF3 specification. For example: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Aquifex_aeolicus/NC_000918.gff A quick glance looks okay, but they don't include FASTA sequence. I think much of the problem with NCBI/GenBank has to do with lack of curation on how submissions are made (lots of inconsistencies). I'm not sure how easy they will be to deal with, but the only way we can deal with that is looking at examples of problematic data (IIRC the Sulfolobus solfataricus genome GB file was a mess, so maybe that's worth a look). chris From cjfields at illinois.edu Sun Aug 16 01:38:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 00:38:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <846546.73578.qm@web30404.mail.mud.yahoo.com> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> Message-ID: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Yee, I took the liberty of making a few simple changes to Bio::Tools::HMM in svn to point out the problem and possible solutions. Feel free to revert these as needed. I'm seeing two errors, which appear randomly when running 'make test'. The first is easily fixable, the second, I'm not so sure. I'll let you make the decisions on both. 1) There is an assumption in the module that, when adding floating points, you will always get 1.0. You may run into problems: see 'perldoc -q long decimals'. Lines like this (two places in the module): ... if ($sum != 1.0) { $self->throw("Sum of probabilities for each state must be 1.0; got $sum\n"); } ... won't work as expected (note I added a simple diagnostic, just print out the 'bad' sum). With perl 5.8.8, this appears to work fine, but this is what I get with perl 5.10 (64-bit): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== Initial Probability Array: 0.499978 0.500022 Transition Probability Matrix: 0.499978 0.500022 0.499978 0.500022 Emission Probability Matrix: 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 Log Probability of sequence 1: -521.808 Log Probability of sequence 2: -426.057 Statistical Training ==================== Initial Probability Array: 1 0 Transition Probability Matrix: ------------- EXCEPTION ------------- MSG: Sum of probabilities for each from-state must be 1.0; got 0.999999999999999976 STACK Bio::Tools::HMM::transition_prob /Users/cjfields/bioperl/bioperl- live/Bio/Tools/HMM.pm:499 STACK toplevel test.pl:82 ------------------------------------- make: *** [test_dynamic] Error 255 I'm assuming this needs to simply be rounded up to 1.0. That could be accomplished with something like 'if (sprintf("%.2f", $sum) != 1.0) {...}' 2) The second error is a little stranger. I have been randomly getting this: pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 When I add strict and warnings pragmas to Bio::Tools::HMM (with a little additional cleanup to get things running), I get an additional warning (arrow): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Argument "FL" isn't numeric in numeric lt (<) at /Users/cjfields/ bioperl/bioperl-live/Bio/Tools/HMM.pm line 188. <---- Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 So something is not being converted as expected. chris On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > When are you going to release 1.6? Maybe let me work on it before it > releases. If it doesn't resolve the problem, then we can think about > other alternatives. > > Also, please show me the latest errors you have for 5.10.0. > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 7:05 PM >> I'm still seeing the same errors on >> Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl >> (v5.8.8) passes fine now (as well as perl 5.8.8 on >> dev.open-bio.org). >> >> I'm wondering if this is a problem with my local perl >> build. I'm very tempted to push the HMM-related code >> into a separate distribution (bioperl-hmm) and make a CPAN >> release out of it so it gets wider testing via CPAN testers; >> it would just require a minimum bioperl 1.6 installation for >> Bio::Tools::HMM and any related modules. Yee, would >> that be okay with you? >> >> chris >> >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >> >>> >>> I just committed HMM.xs and typemap to SVN. Can you >> test it to confirm it works in 64-bit machines? >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Yee Man Chan" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 12:11 PM >>>> I'm not sure, but it makes more sense >>>> to commit these changes directly. Yee, need >> us to set >>>> you up with a commit bit? If so, fill out >> the >>>> information on this page: >>>> >>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>> >>>> and forward it to support at open-bio.org. >>>> I'll sponsor you. >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >>>> >>>>> The usual procedure for developing code is to >> exchange >>>> code via commits to a version control >> system. Yee, do >>>> you know how to use Subversion? Does Yee need a >> commit bit? >>>>> >>>>> Rob >>>>> >>>>> Yee Man Chan wrote: >>>>>> Hi Chris >>>>>> I find that there is a >> memory >>>> access bug in my code. Attached is the fixed >> HMM.xs. This >>>> file together with the simpler typemap should fix >> all >>>> problems. (I hope..) >>>>>> Please let me know if it >> works >>>> for you. >>>>>> Sorry for the bug... >>>>>> Yee Man >>>>>> --- On Fri, 8/14/09, Chris Fields >>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems >> with >>>> Bioperl-ext package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>> "Jonny Dalzell" , >>>> "BioPerl List" >>>>>>> Date: Friday, August 14, 2009, 8:31 >> AM >>>>>>> Yee Man, >>>>>>> >>>>>>> I tested this out locally (perl 5.8.8 >> 32-bit, >>>> perl 5.10.0 >>>>>>> 64-bit) and on dev.open-bio.org (which >> is perl >>>> 5.8.8, >>>>>>> appears to be 32-bit). The patch >> results >>>> in cleaning >>>>>>> up warnings for 5.10.0 but results in >> similar >>>> warnings for >>>>>>> 5.8.8 (linux or OS X). >>>>>>> >>>>>>> On OS X perl 5.8.8, this sometimes >> passes >>>> (note the first >>>>>>> attempt fails, the second succeeds), >> so it's >>>> not entirely a >>>>>>> 32-bit issue: >>>>>>> >>>>>>> http://gist.github.com/167860 >>>>>>> >>>>>>> OS X and perl 5.10.0, this always >> fails as the >>>> previous >>>>>>> gist shows, but demonstrates similar >> behavior >>>> (multiple >>>>>>> attempts to test get different >> responses): >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> On linux, everything passes with or >> w/o the >>>> patched files >>>>>>> (patched files have warnings as >> indicated >>>> above): >>>>>>> >>>>>>> Specs for all three perl executables >> (they >>>> vary a bit): >>>>>>> >>>>>>> http://gist.github.com/167883 >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Aug 14, 2009, at 3:27 AM, Yee Man >> Chan >>>> wrote: >>>>>>> >>>>>>>> Ah.. I find that the typemap can >> become as >>>> simple as >>>>>>> this >>>>>>>> ===================== >>>>>>>> TYPEMAP >>>>>>>> HMM * T_PTROBJ >>>>>>>> ===================== >>>>>>>> >>>>>>>> Then the generated HMM.c will have >> a >>>> function called >>>>>>> INT2PTR to do the pointer conversion. >> I >>>> believe this should >>>>>>> solve the warnings. >>>>>>>> Attached are the updated HMM.xs >> and >>>> typemap. Can >>>>>>> someone with a 64-bit machine give it >> a try? >>>>>>>> Thank you >>>>>>>> Yee Man >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>>> "Jonny Dalzell" , >>>>>>> "BioPerl List" >>>>>>>>> Date: Thursday, August 13, >> 2009, 5:31 >>>> PM >>>>>>>>> (just to point out to >> everyone, Yee >>>>>>>>> Man's contact information was >> in the >>>> POD) >>>>>>>>> >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I have the output in the below >> link: >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> There are similar problems >> popping up >>>> on 32- and >>>>>>> 64-bit >>>>>>>>> perl 5.10.0, Mac OS X 10.5. >>>> Haven't had time >>>>>>> to debug >>>>>>>>> it unfortunately. >>>>>>>>> >>>>>>>>> I think we should seriously >> consider >>>> spinning this >>>>>>> code off >>>>>>>>> into it's own distribution >> for >>>> CPAN. It's >>>>>>>>> unfortunately bit-rotting away >> in >>>>>>> bioperl-ext. If you >>>>>>>>> want to continue supporting it >> I can >>>> help set that >>>>>>> up. >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 6:58 PM, >> Yee Man >>>> Chan wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> So is >> this >>>> an HMM only >>>>>>> problem? Or does >>>>>>>>> it apply to other bioperl-ext >>>> modules? >>>>>>>>>> What >>>> exactly are the >>>>>>> compilation errors >>>>>>>>> for HMM? I believe my >> implementation >>>> is just a >>>>>>> simple one >>>>>>>>> based on Rabiner's paper. >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>> >>>>>>>>>> I >> don't >>>> think I did >>>>>>> anything fancy that >>>>>>>>> makes it machine dependent or >> non-ANSI >>>> C. >>>>>>>>>> Yee Man >>>>>>>>>> >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Robert Buels" >> >>>>>>>>>>> Cc: "Jonny Dalzell" >> , >>>>>>>>> "BioPerl List" , >>>>>>>>> "Yee Man Chan" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 3:18 PM >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 4:37 PM, >>>> Robert Buels >>>>>>> wrote: >>>>>>>>>>>> Jonny Dalzell >> wrote: >>>>>>>>>>>>> Is it >> ridiculous of me >>>> to expect >>>>>>> ubuntu to >>>>>>>>> take >>>>>>>>>>> care of this for >> me? How >>>> do >>>>>>>>>>>>> I go about >> compiling >>>> the HMM? >>>>>>>>>>>> Yes. This is >> a very >>>> specialized >>>>>>> thing >>>>>>>>> that >>>>>>>>>>> you're doing, and >> Ubuntu does >>>> not have >>>>>>> the >>>>>>>>> resources to >>>>>>>>>>> package every single >> thing. >>>>>>>>>>>> Unfortunately, it >> looks >>>> like >>>>>>> bioperl-ext >>>>>>>>> package is >>>>>>>>>>> not installable under >> Ubuntu >>>> 9.04 anyway, >>>>>>> which is >>>>>>>>> what I'm >>>>>>>>>>> running. For >> others on >>>> this list, >>>>>>> if >>>>>>>>> somebody is >>>>>>>>>>> interested in doing >>>> maintaining it, I'd be >>>>>>> happy >>>>>>>>> to help out >>>>>>>>>>> by testing on >> Debian-based >>>> Linux >>>>>>> platforms. >>>>>>>>> We need to >>>>>>>>>>> clarify this >> package's >>>> maintenance status: >>>>>>> if >>>>>>>>> there is >>>>>>>>>>> nobody interested in >>>> maintaining it, I >>>>>>> would >>>>>>>>> recommend that >>>>>>>>>>> bioperl-ext be removed >> from >>>> distribution. >>>>>>>>> It's not in >>>>>>>>>>> anybody's interest to >> have >>>> unmaintained >>>>>>> software >>>>>>>>> out there >>>>>>>>>>> causing confusion. >>>>>>>>>>> >>>>>>>>>>> I have cc'd Yee Man >> Chan for >>>> this. >>>>>>> If there >>>>>>>>> isn't a >>>>>>>>>>> response or the >> message >>>> bounces, we do one >>>>>>> of two >>>>>>>>> things: >>>>>>>>>>> 1) consider it >> deprecated >>>> (probably >>>>>>> safest). >>>>>>>>>>> 2) spin it out into a >> separate >>>> module. >>>>>>>>>>> >>>>>>>>>>> Just tried to comile >> it myself >>>> and am >>>>>>> getting >>>>>>>>> errors (using >>>>>>>>>>> 64bit perl 5.10), so I >> think, >>>> unless >>>>>>> someone wants >>>>>>>>> to take >>>>>>>>>>> this on, option #1 is >> best. >>>>>>>>>>> >>>>>>>>>>>> So Jonny, in >> short, I >>>> would say "do >>>>>>> not use >>>>>>>>>>> bioperl-ext". >>>>>>>>>>> >>>>>>>>>>> In general, that's a >> safe >>>> bet. We're >>>>>>> moving >>>>>>>>> most of >>>>>>>>>>> our C/C++ bindings to >> BioLib. >>>>>>>>>>> >>>>>>>>>>>> Step back. >> What are >>>> you trying >>>>>>> to >>>>>>>>>>> accomplish? >> Chris >>>> already >>>>>>> recommended some >>>>>>>>> alternative >>>>>>>>>>> methods in his email >> of 8/11 >>>> on this >>>>>>>>> subject. Perhaps >>>>>>>>>>> we can guide you to >> some >>>> software that is >>>>>>>>> actively >>>>>>>>>>> maintained and will >> meet your >>>> needs. >>>>>>>>>>>> Rob >>>>>>>>>>> Exactly. Lots of >> other >>>> (better >>>>>>> supported!) >>>>>>>>> options >>>>>>>>>>> out there. >> HMMER, SeqAn, >>>> and >>>>>>> others. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> __________________________________________________ >>>>>>>> Do You Yahoo!? >>>>>>>> Tired of spam? Yahoo! Mail >> has the >>>> best spam >>>>>>> protection around >>>>>>>> http://mail.yahoo.com >>>>>>> >>>> >> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>> >>>>> >>>>> --Robert Buels >>>>> Bioinformatics Analyst, Sol Genomics Network >>>>> Boyce Thompson Institute for Plant Research >>>>> Tower Rd >>>>> Ithaca, NY 14853 >>>>> Tel: 503-889-8539 >>>>> rmb32 at cornell.edu >>>>> http://www.sgn.cornell.edu >>>> >>>> >>> >>> >>> >> >> > > > From abhishek.vit at gmail.com Sun Aug 16 04:06:49 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 04:06:49 -0400 Subject: [Bioperl-l] About binning data for histograms Message-ID: Hi All After a lot of look up on forums I could google, I am finally posting my question here. I think it may not be appropriate for this mailing list. I apologize for this first up. The question is regarding dynamic binning of data points for histogram plots. So I have many hashes, each having a "numerical" coverage data obtained from Next generation sequencing data analysis. Now each hash may have couple of hundred to thousands entry "contig_name => coverage". What I want to do is to plot a histogram for each hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N has to be binned according to the data size). I am using Chart::Gnuplot for this but I am not able to figure out how to bin the data points to fit nicely on a screen. Is there any smart/quick method to do this. Any pointers will help a great deal. Best Regards, -Abhi From bix at sendu.me.uk Sun Aug 16 05:21:11 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 16 Aug 2009 10:21:11 +0100 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <4A87CF87.7030803@sendu.me.uk> Abhishek Pratap wrote: > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width Like it says, it depends on the data, but it's worth trying them out to see if one of them gives you anything sensible. From sdavis2 at mail.nih.gov Sun Aug 16 07:48:23 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 16 Aug 2009 07:48:23 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <264855a00908160448i2691fc08t472fc0d83afbb356@mail.gmail.com> On Sun, Aug 16, 2009 at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > Hi, Abhi. You could use R, but you got that already. ; ) However, you might look here for a perl solution. http://search.cpan.org/~whizdog/GDGraph-histogram-1.1/lib/GD/Graph/histogram.pm Sean From cjfields at illinois.edu Sun Aug 16 08:53:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 07:53:29 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <217259.7083.qm@web30408.mail.mud.yahoo.com> References: <217259.7083.qm@web30408.mail.mud.yahoo.com> Message-ID: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> That worked! Thanks Yee Man! chris ps - let me know how you want to deal with a release. On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > Hi Chris > > Thanks for your suggestions. I think it is indeed better to check > sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm > > I also fixed codes that will lead to warnings with use warnings. > > So now the only problem left is that "monotonic increasing" error. > For that part of the code, I was trying to perform an expectation > maximization step. Theoretically, the expectation should > monotonically increase in every step. But I suppose this is not > necessarily true when double precision floating point numbers are > involved. I don't know why I used a 1e-100 tolerance for this. > Therefore I "fixed" it by using the same tolerance to terminate the > maximization step (ie .000001). I suppose this "fix" will make it > much more unlikely to throw exception with your 5.10.0 perl. > > Can you give that a try again and see if it works now. > > Thank you > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 10:38 PM >> Yee, >> >> I took the liberty of making a few simple changes to >> Bio::Tools::HMM in svn to point out the problem and possible >> solutions. Feel free to revert these as needed. >> >> I'm seeing two errors, which appear randomly when running >> 'make test'. The first is easily fixable, the second, >> I'm not so sure. I'll let you make the decisions on >> both. >> >> 1) There is an assumption in the module that, when >> adding floating points, you will always get 1.0. You >> may run into problems: see 'perldoc -q long decimals'. >> Lines like this (two places in the module): >> ... >> if ($sum != 1.0) { >> $self->throw("Sum of >> probabilities for each state must be 1.0; got $sum\n"); >> } >> ... >> >> won't work as expected (note I added a simple diagnostic, >> just print out the 'bad' sum). With perl 5.8.8, this >> appears to work fine, but this is what I get with perl 5.10 >> (64-bit): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> Initial Probability Array: >> 0.499978 0.500022 >> Transition Probability Matrix: >> 0.499978 0.500022 >> 0.499978 0.500022 >> Emission Probability Matrix: >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> >> Log Probability of sequence 1: -521.808 >> Log Probability of sequence 2: -426.057 >> >> Statistical Training >> ==================== >> Initial Probability Array: >> 1 0 >> Transition Probability Matrix: >> >> ------------- EXCEPTION ------------- >> MSG: Sum of probabilities for each from-state must be 1.0; >> got 0.999999999999999976 >> >> STACK Bio::Tools::HMM::transition_prob >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 >> STACK toplevel test.pl:82 >> ------------------------------------- >> >> make: *** [test_dynamic] Error 255 >> >> I'm assuming this needs to simply be rounded up to >> 1.0. That could be accomplished with something like >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' >> >> 2) The second error is a little stranger. I have been >> randomly getting this: >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> When I add strict and warnings pragmas to Bio::Tools::HMM >> (with a little additional cleanup to get things running), I >> get an additional warning (arrow): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Argument "FL" isn't numeric in numeric lt (<) at >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line >> 188. <---- >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> So something is not being converted as expected. >> >> chris >> >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: >> >>> When are you going to release 1.6? Maybe let me work >> on it before it releases. If it doesn't resolve the problem, >> then we can think about other alternatives. >>> >>> Also, please show me the latest errors you have for >> 5.10.0. >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 7:05 PM >>>> I'm still seeing the same errors on >>>> Mac OS X for 64-bit perl 5.10.0. Mac OS X, >> native perl >>>> (v5.8.8) passes fine now (as well as perl 5.8.8 >> on >>>> dev.open-bio.org). >>>> >>>> I'm wondering if this is a problem with my local >> perl >>>> build. I'm very tempted to push the >> HMM-related code >>>> into a separate distribution (bioperl-hmm) and >> make a CPAN >>>> release out of it so it gets wider testing via >> CPAN testers; >>>> it would just require a minimum bioperl 1.6 >> installation for >>>> Bio::Tools::HMM and any related modules. >> Yee, would >>>> that be okay with you? >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >>>> >>>>> >>>>> I just committed HMM.xs and typemap to SVN. >> Can you >>>> test it to confirm it works in 64-bit machines? >>>>> >>>>> Thanks >>>>> Yee Man >>>>> >>>>> --- On Sat, 8/15/09, Chris Fields >>>> wrote: >>>>> >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Yee Man Chan" , >>>> "BioPerl List" >>>>>> Date: Saturday, August 15, 2009, 12:11 PM >>>>>> I'm not sure, but it makes more sense >>>>>> to commit these changes directly. >> Yee, need >>>> us to set >>>>>> you up with a commit bit? If so, >> fill out >>>> the >>>>>> information on this page: >>>>>> >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>>>> >>>>>> and forward it to support at open-bio.org. >>>>>> I'll sponsor you. >>>>>> >>>>>> chris >>>>>> >>>>>> On Aug 15, 2009, at 11:44 AM, Robert Buels >> wrote: >>>>>> >>>>>>> The usual procedure for developing >> code is to >>>> exchange >>>>>> code via commits to a version control >>>> system. Yee, do >>>>>> you know how to use Subversion? Does Yee >> need a >>>> commit bit? >>>>>>> >>>>>>> Rob >>>>>>> >>>>>>> Yee Man Chan wrote: >>>>>>>> Hi Chris >>>>>>>> I find >> that there is a >>>> memory >>>>>> access bug in my code. Attached is the >> fixed >>>> HMM.xs. This >>>>>> file together with the simpler typemap >> should fix >>>> all >>>>>> problems. (I hope..) >>>>>>>> Please let >> me know if it >>>> works >>>>>> for you. >>>>>>>> Sorry for the bug... >>>>>>>> Yee Man >>>>>>>> --- On Fri, 8/14/09, Chris Fields >> >>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems >>>> with >>>>>> Bioperl-ext package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>> "Jonny Dalzell" , >>>>>> "BioPerl List" >>>>>>>>> Date: Friday, August 14, 2009, >> 8:31 >>>> AM >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I tested this out locally >> (perl 5.8.8 >>>> 32-bit, >>>>>> perl 5.10.0 >>>>>>>>> 64-bit) and on >> dev.open-bio.org (which >>>> is perl >>>>>> 5.8.8, >>>>>>>>> appears to be 32-bit). >> The patch >>>> results >>>>>> in cleaning >>>>>>>>> up warnings for 5.10.0 but >> results in >>>> similar >>>>>> warnings for >>>>>>>>> 5.8.8 (linux or OS X). >>>>>>>>> >>>>>>>>> On OS X perl 5.8.8, this >> sometimes >>>> passes >>>>>> (note the first >>>>>>>>> attempt fails, the second >> succeeds), >>>> so it's >>>>>> not entirely a >>>>>>>>> 32-bit issue: >>>>>>>>> >>>>>>>>> http://gist.github.com/167860 >>>>>>>>> >>>>>>>>> OS X and perl 5.10.0, this >> always >>>> fails as the >>>>>> previous >>>>>>>>> gist shows, but demonstrates >> similar >>>> behavior >>>>>> (multiple >>>>>>>>> attempts to test get >> different >>>> responses): >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> On linux, everything passes >> with or >>>> w/o the >>>>>> patched files >>>>>>>>> (patched files have warnings >> as >>>> indicated >>>>>> above): >>>>>>>>> >>>>>>>>> Specs for all three perl >> executables >>>> (they >>>>>> vary a bit): >>>>>>>>> >>>>>>>>> http://gist.github.com/167883 >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 14, 2009, at 3:27 AM, >> Yee Man >>>> Chan >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Ah.. I find that the >> typemap can >>>> become as >>>>>> simple as >>>>>>>>> this >>>>>>>>>> ===================== >>>>>>>>>> TYPEMAP >>>>>>>>>> HMM * >> T_PTROBJ >>>>>>>>>> ===================== >>>>>>>>>> >>>>>>>>>> Then the generated HMM.c >> will have >>>> a >>>>>> function called >>>>>>>>> INT2PTR to do the pointer >> conversion. >>>> I >>>>>> believe this should >>>>>>>>> solve the warnings. >>>>>>>>>> Attached are the updated >> HMM.xs >>>> and >>>>>> typemap. Can >>>>>>>>> someone with a 64-bit machine >> give it >>>> a try? >>>>>>>>>> Thank you >>>>>>>>>> Yee Man >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Yee Man Chan" >> >>>>>>>>>>> Cc: "Robert Buels" >> , >>>>>>>>> "Jonny Dalzell" , >>>>>>>>> "BioPerl List" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 5:31 >>>>>> PM >>>>>>>>>>> (just to point out to >>>> everyone, Yee >>>>>>>>>>> Man's contact >> information was >>>> in the >>>>>> POD) >>>>>>>>>>> >>>>>>>>>>> Yee Man, >>>>>>>>>>> >>>>>>>>>>> I have the output in >> the below >>>> link: >>>>>>>>>>> >>>>>>>>>>> http://gist.github.com/167542 >>>>>>>>>>> >>>>>>>>>>> There are similar >> problems >>>> popping up >>>>>> on 32- and >>>>>>>>> 64-bit >>>>>>>>>>> perl 5.10.0, Mac OS X >> 10.5. >>>>>> Haven't had time >>>>>>>>> to debug >>>>>>>>>>> it unfortunately. >>>>>>>>>>> >>>>>>>>>>> I think we should >> seriously >>>> consider >>>>>> spinning this >>>>>>>>> code off >>>>>>>>>>> into it's own >> distribution >>>> for >>>>>> CPAN. It's >>>>>>>>>>> unfortunately >> bit-rotting away >>>> in >>>>>>>>> bioperl-ext. If you >>>>>>>>>>> want to continue >> supporting it >>>> I can >>>>>> help set that >>>>>>>>> up. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 6:58 PM, >>>> Yee Man >>>>>> Chan wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> >> So is >>>> this >>>>>> an HMM only >>>>>>>>> problem? Or does >>>>>>>>>>> it apply to other >> bioperl-ext >>>>>> modules? >>>>>>>>>>>> >> What >>>>>> exactly are the >>>>>>>>> compilation errors >>>>>>>>>>> for HMM? I believe my >>>> implementation >>>>>> is just a >>>>>>>>> simple one >>>>>>>>>>> based on Rabiner's >> paper. >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>>>> >>>>>>>>>>>> >> I >>>> don't >>>>>> think I did >>>>>>>>> anything fancy that >>>>>>>>>>> makes it machine >> dependent or >>>> non-ANSI >>>>>> C. >>>>>>>>>>>> Yee Man >>>>>>>>>>>> >>>>>>>>>>>> --- On Thu, >> 8/13/09, Chris >>>> Fields >>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> From: Chris >> Fields >>>> >>>>>>>>>>>>> Subject: Re: >>>> [Bioperl-l] >>>>>> Problems with >>>>>>>>> Bioperl-ext >>>>>>>>>>> package on WinVista? >>>>>>>>>>>>> To: "Robert >> Buels" >>>> >>>>>>>>>>>>> Cc: "Jonny >> Dalzell" >>>> , >>>>>>>>>>> "BioPerl List" , >>>>>>>>>>> "Yee Man Chan" >>>>>>>>>>>>> Date: >> Thursday, August >>>> 13, >>>>>> 2009, 3:18 PM >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 13, >> 2009, at >>>> 4:37 PM, >>>>>> Robert Buels >>>>>>>>> wrote: >>>>>>>>>>>>>> Jonny >> Dalzell >>>> wrote: >>>>>>>>>>>>>>> Is it >>>> ridiculous of me >>>>>> to expect >>>>>>>>> ubuntu to >>>>>>>>>>> take >>>>>>>>>>>>> care of this >> for >>>> me? How >>>>>> do >>>>>>>>>>>>>>> I go >> about >>>> compiling >>>>>> the HMM? >>>>>>>>>>>>>> Yes. >> This is >>>> a very >>>>>> specialized >>>>>>>>> thing >>>>>>>>>>> that >>>>>>>>>>>>> you're doing, >> and >>>> Ubuntu does >>>>>> not have >>>>>>>>> the >>>>>>>>>>> resources to >>>>>>>>>>>>> package every >> single >>>> thing. >>>>>>>>>>>>>> >> Unfortunately, it >>>> looks >>>>>> like >>>>>>>>> bioperl-ext >>>>>>>>>>> package is >>>>>>>>>>>>> not >> installable under >>>> Ubuntu >>>>>> 9.04 anyway, >>>>>>>>> which is >>>>>>>>>>> what I'm >>>>>>>>>>>>> running. >> For >>>> others on >>>>>> this list, >>>>>>>>> if >>>>>>>>>>> somebody is >>>>>>>>>>>>> interested in >> doing >>>>>> maintaining it, I'd be >>>>>>>>> happy >>>>>>>>>>> to help out >>>>>>>>>>>>> by testing on >>>> Debian-based >>>>>> Linux >>>>>>>>> platforms. >>>>>>>>>>> We need to >>>>>>>>>>>>> clarify this >>>> package's >>>>>> maintenance status: >>>>>>>>> if >>>>>>>>>>> there is >>>>>>>>>>>>> nobody >> interested in >>>>>> maintaining it, I >>>>>>>>> would >>>>>>>>>>> recommend that >>>>>>>>>>>>> bioperl-ext be >> removed >>>> from >>>>>> distribution. >>>>>>>>>>> It's not in >>>>>>>>>>>>> anybody's >> interest to >>>> have >>>>>> unmaintained >>>>>>>>> software >>>>>>>>>>> out there >>>>>>>>>>>>> causing >> confusion. >>>>>>>>>>>>> >>>>>>>>>>>>> I have cc'd >> Yee Man >>>> Chan for >>>>>> this. >>>>>>>>> If there >>>>>>>>>>> isn't a >>>>>>>>>>>>> response or >> the >>>> message >>>>>> bounces, we do one >>>>>>>>> of two >>>>>>>>>>> things: >>>>>>>>>>>>> 1) consider >> it >>>> deprecated >>>>>> (probably >>>>>>>>> safest). >>>>>>>>>>>>> 2) spin it out >> into a >>>> separate >>>>>> module. >>>>>>>>>>>>> >>>>>>>>>>>>> Just tried to >> comile >>>> it myself >>>>>> and am >>>>>>>>> getting >>>>>>>>>>> errors (using >>>>>>>>>>>>> 64bit perl >> 5.10), so I >>>> think, >>>>>> unless >>>>>>>>> someone wants >>>>>>>>>>> to take >>>>>>>>>>>>> this on, >> option #1 is >>>> best. >>>>>>>>>>>>> >>>>>>>>>>>>>> So Jonny, >> in >>>> short, I >>>>>> would say "do >>>>>>>>> not use >>>>>>>>>>>>> bioperl-ext". >>>>>>>>>>>>> >>>>>>>>>>>>> In general, >> that's a >>>> safe >>>>>> bet. We're >>>>>>>>> moving >>>>>>>>>>> most of >>>>>>>>>>>>> our C/C++ >> bindings to >>>> BioLib. >>>>>>>>>>>>> >>>>>>>>>>>>>> Step >> back. >>>> What are >>>>>> you trying >>>>>>>>> to >>>>>>>>>>>>> accomplish? >>>> Chris >>>>>> already >>>>>>>>> recommended some >>>>>>>>>>> alternative >>>>>>>>>>>>> methods in his >> email >>>> of 8/11 >>>>>> on this >>>>>>>>>>> subject. >> Perhaps >>>>>>>>>>>>> we can guide >> you to >>>> some >>>>>> software that is >>>>>>>>>>> actively >>>>>>>>>>>>> maintained and >> will >>>> meet your >>>>>> needs. >>>>>>>>>>>>>> Rob >>>>>>>>>>>>> Exactly. >> Lots of >>>> other >>>>>> (better >>>>>>>>> supported!) >>>>>>>>>>> options >>>>>>>>>>>>> out there. >>>> HMMER, SeqAn, >>>>>> and >>>>>>>>> others. >>>>>>>>>>>>> chris >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> __________________________________________________ >>>>>>>>>> Do You Yahoo!? >>>>>>>>>> Tired of spam? >> Yahoo! Mail >>>> has the >>>>>> best spam >>>>>>>>> protection around >>>>>>>>>> http://mail.yahoo.com >>>>>>>>> >>>>>> >>>> >> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --Robert Buels >>>>>>> Bioinformatics Analyst, Sol Genomics >> Network >>>>>>> Boyce Thompson Institute for Plant >> Research >>>>>>> Tower Rd >>>>>>> Ithaca, NY 14853 >>>>>>> Tel: 503-889-8539 >>>>>>> rmb32 at cornell.edu >>>>>>> http://www.sgn.cornell.edu >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > From hlapp at gmx.net Sun Aug 16 11:07:39 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:07:39 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > I'm assuming this needs to simply be rounded up to 1.0. That could > be accomplished with something like 'if (sprintf("%.2f", $sum) != > 1.0) {...}' Couldn't you just test for the absolute difference being smaller than some reasonable epsilon? That might be more efficient (and more explicit) than printing to a string. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 16 11:13:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:13:54 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > Not sure about calling it bioperl-phylo (which might be confused > with Rutger's Bio::Phylo). Frankly, it seems to me that either is more powerful in combination with the other, so I don't quite see how the name suggesting some linkage isn't a Good Thing rather than bad. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Sun Aug 16 11:42:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:42:50 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> Message-ID: On Aug 16, 2009, at 10:07 AM, Hilmar Lapp wrote: > > On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > >> I'm assuming this needs to simply be rounded up to 1.0. That could >> be accomplished with something like 'if (sprintf("%.2f", $sum) != >> 1.0) {...}' > > > Couldn't you just test for the absolute difference being smaller > than some reasonable epsilon? That might be more efficient (and more > explicit) than printing to a string. > > -hilmar Yes, either way is fine. Re: floating point and sprintf, acc. to the perlfaq4, as perl doesn't have a round() function the sprintf() idiom is suggested (and commonly used). chris From cjfields at illinois.edu Sun Aug 16 11:48:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:48:52 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > >> Not sure about calling it bioperl-phylo (which might be confused >> with Rutger's Bio::Phylo). > > > Frankly, it seems to me that either is more powerful in combination > with the other, so I don't quite see how the name suggesting some > linkage isn't a Good Thing rather than bad. > > -hilmar I don't have a problem as long as there is some emphasis they are two separate, but related, projects. There is quite a bit of crossover between the two (particularly with the last few bioperl-related GSoC projects), but I would rather not have to worry about users emailing the list wondering why something in bioperl-phylo doesn't work when they installed Bio::Phylo instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended module with bioperl-phylo to alleviate that? chris From maj at fortinbras.us Sun Aug 16 12:59:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 16 Aug 2009 12:59:40 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: <44D32BE895F446A9917A5550485AB102@NewLife> I see both points- I think Chris's suggestion is good. The nexml support won't work without Bio::Phylo, but not everyone will need that support, so if the install can be chatty about this that would be great- ----- Original Message ----- From: "Chris Fields" To: "Hilmar Lapp" Cc: "BioPerl List" ; "Mark A. Jensen" ; "chase Miller" Sent: Sunday, August 16, 2009 11:48 AM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > >> On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: >> >>> Not sure about calling it bioperl-phylo (which might be confused with >>> Rutger's Bio::Phylo). >> >> >> Frankly, it seems to me that either is more powerful in combination with the >> other, so I don't quite see how the name suggesting some linkage isn't a >> Good Thing rather than bad. >> >> -hilmar > > I don't have a problem as long as there is some emphasis they are two > separate, but related, projects. There is quite a bit of crossover between > the two (particularly with the last few bioperl-related GSoC projects), but I > would rather not have to worry about users emailing the list wondering why > something in bioperl-phylo doesn't work when they installed Bio::Phylo > instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended > module with bioperl-phylo to alleviate that? > > chris > > From rmb32 at cornell.edu Sun Aug 16 13:16:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 16 Aug 2009 10:16:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <44D32BE895F446A9917A5550485AB102@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> Message-ID: <4A883EE2.3060101@cornell.edu> Mark A. Jensen wrote: > I see both points- I think Chris's suggestion is good. The nexml support > won't work without Bio::Phylo, but not everyone will need that support, > so if the install can be chatty about this that would be great- Maybe the parts that have differing dependencies should be in different distros then? Rob From jason at bioperl.org Sun Aug 16 13:25:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 13:25:08 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> For binning of a distribution see the perl module Statistics::Descriptive - http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm function: frequency_distritibution I would also look at R histogram function for the plotting. This would be one of the easiest ways - I would just make a perl script that generates the correct R code that can be used to make the plots. On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > > Best Regards, > -Abhi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From abhishek.vit at gmail.com Sun Aug 16 13:34:54 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 13:34:54 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> References: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> Message-ID: Thanks All. I completely forgot and dint realize that histogram function in R could auto bin based on the data. Cheers, -Abhi On Sun, Aug 16, 2009 at 1:25 PM, Jason Stajich wrote: > For binning of a distribution see the perl module Statistics::Descriptive - > http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm?function: > frequency_distritibution > > I would also look at R histogram function for the plotting. ?This would be > one of the easiest ways - I would just make a perl script that generates the > correct R code that can be used to make the plots. > > > On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > >> Hi All >> >> After a lot of look up on forums I could google, I am finally posting >> my question here. I think it may not be appropriate for this mailing >> list. I apologize for this first up. The question is regarding dynamic >> binning of data points for histogram plots. >> >> So I have many hashes, each having a "numerical" coverage data >> obtained from Next generation sequencing data analysis. Now each hash >> may have couple of hundred to thousands entry "contig_name => >> coverage". ?What I want to do is to plot a histogram for each >> hash/dataset. ?"Coverage v/s Count of contigs with coverage > #N " ( N >> has to be binned according to the data size). >> >> I am using Chart::Gnuplot for this but I am not able to figure out how >> to bin the data points to fit nicely on a screen. Is there any >> smart/quick method to do this. >> >> Any pointers will help a great deal. >> >> Best Regards, >> -Abhi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From robert.bradbury at gmail.com Sun Aug 16 15:16:09 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 16 Aug 2009 15:16:09 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? Message-ID: Hello, I am trying to use get_sequence() to fetch the sequence NS_000198 for the fungus *Podospora anserina* with the databases "GenBank" and when that didn't work "Gene". This is a simple script which fetches the sequence then writes out the fasta and genbank files from the data structure. The errors I got suggested that the system was running out of memory which I thought was unlikely since I've got something like 3GB of main memory and 9GB of swap space. After running strace on the script (which takes a while) I determined that the brk() calls were generating ENOMEM at ~3GB. This turns out to be due to the limit of the Linux memory model I am using (3GB/1GB) on a Pentium IV (Prescott). Now, I think the total genome size for the fungus is ~70MB but haven't verified this so I "should" be able to fetch it unless Bioperl (or perl itself) is doing extremely poor memory management (perhaps not coalescing memory segments into one large sequence) as the reads take place? [1]. Has anyone encountered this problem (fetching say large mammalian chromosomes)? Does anyone know what the limits are for "fetching" sequence files (on 32/64 bit machines?. The reason I am using get_sequence and BioPerl is that I can't seem to find the *Podospora anserina* sequence in a FTP database anywhere (so I can't use "wget or ftp"). I haven't tested accessing the GenBank file in a browser (I don't know what browsers would do with a HTML file that large but suspect it would not be pretty). Thanks in advance, Robert Bradbury 1. The strace seems to indicate periodic brk() calls to expand the process data segment size between which there are lots of read() calls of size 4096, presumably reading the socket from NCBI. I don't know if there is an easy way to trace perl's memory allocation/manipulation at a higher level. From jason at bioperl.org Sun Aug 16 15:22:35 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 15:22:35 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? In-Reply-To: References: Message-ID: <93672502-26EB-4C30-A37E-F3B593E57279@bioperl.org> Robert - Posting your script will help us replicate and diagnose - I am not sure which GenBank fetch option you are using. I have a feeling it is trying to do recursive calls to stitch together the pseudoscaffold. I presume it works find though if you request the each chromosome scaffold like CU607053,CU633438, ... I guess posting it via a bugzilla bug is the best way unless you have a git account and wanted to post it as a 'gist'. -jason -- Jason Stajich jason at bioperl.org http://fungalgenomes.org/ On Aug 16, 2009, at 3:16 PM, Robert Bradbury wrote: > Hello, > > I am trying to use get_sequence() to fetch the sequence NS_000198 > for the > fungus *Podospora anserina* with the databases "GenBank" and when that > didn't work "Gene". This is a simple script which fetches the > sequence then > writes out the fasta and genbank files from the data structure. > > The errors I got suggested that the system was running out of memory > which I > thought was unlikely since I've got something like 3GB of main > memory and > 9GB of swap space. After running strace on the script (which takes > a while) > I determined that the brk() calls were generating ENOMEM at ~3GB. > This > turns out to be due to the limit of the Linux memory model I am using > (3GB/1GB) on a Pentium IV (Prescott). > > Now, I think the total genome size for the fungus is ~70MB but haven't > verified this so I "should" be able to fetch it unless Bioperl (or > perl > itself) is doing extremely poor memory management (perhaps not > coalescing > memory segments into one large sequence) as the reads take place? [1]. > > Has anyone encountered this problem (fetching say large mammalian > chromosomes)? Does anyone know what the limits are for "fetching" > sequence > files (on 32/64 bit machines?. The reason I am using get_sequence and > BioPerl is that I can't seem to find the *Podospora anserina* > sequence in a > FTP database anywhere (so I can't use "wget or ftp"). I haven't > tested > accessing the GenBank file in a browser (I don't know what browsers > would do > with a HTML file that large but suspect it would not be pretty). > > Thanks in advance, > Robert Bradbury > > 1. The strace seems to indicate periodic brk() calls to expand the > process > data segment size between which there are lots of read() calls of > size 4096, > presumably reading the socket from NCBI. I don't know if there is > an easy > way to trace perl's memory allocation/manipulation at a higher level. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Aug 16 15:42:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 14:42:56 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A883EE2.3060101@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> <4A883EE2.3060101@cornell.edu> Message-ID: <69B8C887-1C5E-47B4-9168-8509BB0A5528@illinois.edu> On Aug 16, 2009, at 12:16 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> I see both points- I think Chris's suggestion is good. The nexml >> support >> won't work without Bio::Phylo, but not everyone will need that >> support, >> so if the install can be chatty about this that would be great- > > Maybe the parts that have differing dependencies should be in > different distros then? > > Rob I'm guessing large chunks of that code would have Bio::Root::Root as a base, so I think maintaining related code split into two distributions too problematic. Simple to indicate that Bio::Phylo is required only for NeXML (so listing it as a 'recommends') and keep everything NeXML- related and requiring Bio::Root::Root in one spot. It's possible something inheriting from Bio::Phylo could go there, but that's up to Rutger. chris From maj at fortinbras.us Mon Aug 17 08:43:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 08:43:33 -0400 Subject: [Bioperl-l] new NeXML I/O modules Message-ID: Hi All- I'm pleased to announce that my Google Summer of Code student Chase Miller and I have successfully migrated his modules for NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is Rutger Vos' highly flexible, highly annotable standard for evolutionary data exchange, that is catching on in the evolutionary DB world. We hope these modules will help move that process along. I also want to say that Chase has been a terrific student and collaborator. He learned the not only the complexities of BioPerl IO from scratch, but also grokked Rutger's Bio::Phylo internals, and became familiar with and applied modern OO concepts. He also wrote tests (which pass!), complete POD, and a HOWTO (at http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this work. Best of all, he finished! (Well, as much as anything is ever finished around here.) I for one hope he will continue to use his commit bit for good and not evil. cheers, Mark From deequan at gmail.com Mon Aug 17 09:06:44 2009 From: deequan at gmail.com (David Quan) Date: Mon, 17 Aug 2009 09:06:44 -0400 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? Message-ID: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Hello there, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Hints and recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From akarger at CGR.Harvard.edu Mon Aug 17 09:04:29 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 17 Aug 2009 09:04:29 -0400 Subject: [Bioperl-l] on BP documentation References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger From cjfields at illinois.edu Mon Aug 17 10:06:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:06:15 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: Congrats Chase! chris On Aug 17, 2009, at 7:43 AM, Mark A. Jensen wrote: > Hi All- > > I'm pleased to announce that my Google Summer of Code student > Chase Miller and I have successfully migrated his modules for > NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is > Rutger Vos' highly flexible, highly annotable standard for > evolutionary data exchange, that is catching on in the > evolutionary DB world. We hope these modules will help move that > process along. > > I also want to say that Chase has been a terrific student and > collaborator. He learned the not only the complexities of BioPerl > IO from scratch, but also grokked Rutger's Bio::Phylo internals, > and became familiar with and applied modern OO concepts. He also > wrote tests (which pass!), complete POD, and a HOWTO (at > http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this > work. Best of all, he finished! (Well, as much as anything is > ever finished around here.) I for one hope he will continue to > use his commit bit for good and not evil. > > cheers, > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:22:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:22:26 -0500 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? In-Reply-To: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> References: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Message-ID: <74D10663-5770-43DA-ABDB-27FA5D532497@illinois.edu> That's possible, yes. Use the hit information and use Bio::DB::GenBank to pull the sequence out, in the below example. Note that strand is different than BioPerl's -1/0/1; efetch strand: 1 = normal (default), 2 = comp. ================================ my $factory = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $seqstart, -seq_stop => $seqend, -strand => $strand, # 1=plus, 2=minus ); $factory->get_Seq_by_id($id); # should be UID, use get_Seq_by_acc() for accessions ================================ This pulls everything into a Bio::Seq, though, so you'll need to push it out to a SeqIO output stream. You can also use Bio::DB::EUtilities to get the raw sequence via efetch, something like (untested): ================================ my $fetcher = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'gb'); # loop: for each hit/HSP, grab sequence... my $fetcher->set_parameters( -id => $id # UID or accession -seq_start => $seqstart, # hit start -seq_stop => $seqend, # hit end -strand => $strand # 1=plus, 2=minus ); # then get raw content $fetcher->get_Response(-file => ">$id.gb"); ================================ You could probably plug into ENSembl similarly if the db versions match; see: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences chris On Aug 17, 2009, at 8:06 AM, David Quan wrote: > Hello there, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Hints and recommendations for > elements that would be capable of behaving in such a way would be > greatly appreciated. Thanks very much. > > David N. Quan > > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:47:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:47:31 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: On Aug 17, 2009, at 8:04 AM, Amir Karger wrote: >> -----Original Message----- >> From: Mark A. Jensen [mailto:maj at fortinbras.us] >> >> From: "Hilmar Lapp" >> ... >>> As for the FASTA example, I can understand - I've heard >> repeatedly >>> from people that one of the things that they are missing is >>> documentation for every SeqIO format we support (such as >> GenBank, >>> UniProt, FASTA, etc) about where to find a particular piece of >> the >>> format in the object model. >> >> This is the right thread for list lurkers to contribute their betes >> noires >> such as this one. I encourage ALL to post these issues and help >> create >> our list of action items. >> MAJ > > I wish you the best of luck on this ambitious and crucial project. I > teach intro Perl classes to biologists and always tell them that > Bioperl > is amazingly useful, but only if you can figure out how to use it. If > what you want to do isn't in the howtos, you can be in big trouble. > > I was trying to remember specific examples of where I've gotten lost, > and unfortunately can't give any. But I can tell you that often I've > run > into trouble because the particular method I'm looking for is three > parent classes away from the module I'm actually looking at. The > deobfuscator helps some, but only for people who know about that. Do > you > think you could automate a tool that would add the following to the > bottom of each module? > > =head2 Inherited methods > > =over 4 > > =item desc > > See Bio::Seq::Basic > > =back > > This would make browsing through the docs on bioperl.org more fun too. > > -Amir Karger For many modules this is already in place, but yes this could be improved. One of the problems I suggest we avoid when doing this is placing these interspersed within code. It has been demonstrated that doing so actually slows down the perl interpreter slightly; it has to slog through lots of POD to find the code at the compilation step. This occurs only upon on initial compilation, but it is significant enough that the overall recommendation by most perl brethren (and in Perl Best Practices) has been to place any POD after an __END__ marker. This way the compiler doesn't have to look at it at all, but perldoc can still find it. Also, acc to PBP, although the inline POD would seemingly be easier to take care of, apparently the opposite is true in most cases (though it can come down to styling differences). Interspersed code is much harder to maintain in a consistent state, tends to be choppier, and can be laid out in odd ways due to being scattered throughout the file. I know this can come down to a difference in style, but the arguments do make sense enough to me that in Biome I am pushing to have all docs after the __END__ marker. Lincoln already practices this within bioperl and Bio::Graphics, and I plan on moving much on my documentation similarly within my code in BioPerl. The additional comments in the PBP chapter "Documentation" are well- worth reading if you can get your hands on it. chris From rmb32 at cornell.edu Mon Aug 17 11:21:08 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:21:08 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A897564.2090203@cornell.edu> Hurrah! GSoC strikes again! Rob From rmb32 at cornell.edu Mon Aug 17 11:45:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:45:18 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <474354.59886.qm@web30408.mail.mud.yahoo.com> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A897B0E.7060208@cornell.edu> Yee Man Chan wrote: > As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. We're not pushing to spin it off because of code quality, we're pushing to spin it off because we're spinning everything off. The plan is to break BioPerl up into many discrete distributions on CPAN with the dependencies between them well-known and codified. This will make maintenance of BioPerl *much* easier in the long run. So this means that the plan of action should be 1.) get the code so that it's working on all platforms, 2.) create a CPAN distribution for it and put it on CPAN, 3.) remove it from bioperl-ext Also, doing a search for bioperl-ext on CPAN brings to light a couple of issues that probably need to be dealt with. To wit: 1.) there is an ancient version of bioperl-ext that probably needs to be removed, it's under ~birney's account. Thoughts on this? 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on bioperl-ext, which suggests that these really need to be split off, each with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the first case of this: * make a dir in the repos called Bio-Tools-HMM alongside bioperl-live, having trunk/, and branches/ subdirs * move Bio::Tools::HMM out of bioperl-live into that * move Bio::Ext::HMM stuff out of bioperl-ext into that * repeat with Bio::Tools::dpAlign and pSW, which would probably go together into a Bio-Tools-Align distro, I think Sounds like this is moving along nicely. Rob From rmb32 at cornell.edu Mon Aug 17 11:48:10 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:48:10 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <4A897BBA.2070204@cornell.edu> Also, I volunteer to make this branch and module machinery and such if you want. I just don't want to step on any ongoing development you guys are going in the bioperl-ext trunk. If you want me to do it, just say the word, either here or in #bioperl. Rob Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So if >> the HMM doesn't meet that standard, I am ok with it being spinned off. > > We're not pushing to spin it off because of code quality, we're pushing > to spin it off because we're spinning everything off. The plan is to > break BioPerl up into many discrete distributions on CPAN with the > dependencies between them well-known and codified. This will make > maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a couple of > issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs to be > removed, it's under ~birney's account. Thoughts on this? > > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on > bioperl-ext, which suggests that these really need to be split off, each > with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the > first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside > bioperl-live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Mon Aug 17 12:58:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 11:58:24 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> On Aug 17, 2009, at 10:45 AM, Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So >> if the HMM doesn't meet that standard, I am ok with it being >> spinned off. > > We're not pushing to spin it off because of code quality, we're > pushing to spin it off because we're spinning everything off. The > plan is to break BioPerl up into many discrete distributions on CPAN > with the dependencies between them well-known and codified. This > will make maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a > couple of issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs > to be removed, it's under ~birney's account. Thoughts on this? This subject just recently popped up on perl.module.authors, more in relation to abandonware, but a similar thing. Andreas has indicate there is an abandoned flag that can be set so it's worth looking into, but using it requires another release. I have been in contact with that group on ideas for the split; libwin32 did the same thing, so I'll contact Jan Dubois on the matter for some pointers. > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend > on bioperl-ext, which suggests that these really need to be split > off, each with the Bio::Ext::Modules they depend on. > Bio::Tools::HMM could be the first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside bioperl- > live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob Yes, that's essentially the idea. The more significant impact of this (both here and in core) is allowing updates to be made as needed, and not be blocked due to issues in unrelated modules. We have been waiting years for fixes to pSW, Staden::read, Align w/o progress, which has hindered overall releases of bioperl-ext. Similar problems exist in bp-core. Re: bioperl-ext, BioLib has rendered some of those implementations obsolete. I would rather do that incrementally (individual implementations) vs. wait for a full-blown bioperl-ext release, so splitting these up makes that possible. chris From robert.bradbury at gmail.com Mon Aug 17 13:14:57 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 17 Aug 2009 13:14:57 -0400 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers Message-ID: One of the questions facing people working in bioinformatics is "How do we present information so that it can be effectively interpreted by non-informatics specialists?" Now, my expertise lies in computer science (esp. O.S. & databases) and as a second vocation the biology of aging (DNA damage & repair, to a lesser extent cancer and pathologies of aging, etc.). Now by my estimate there are perhaps 5 people in the world who are able to effectively discuss computer science X aging (gerontology) [3]. There are perhaps several dozen people where those areas, esp aging, may overlap with DNA damage & repair. But then there is a wider audience of perhaps a few hundred members of AGE, and maybe a thousand or so who are members of the scientific subgroup of GSA. But most of those individuals are "old school" scientists who know relatively little about bioinformatics. So one has barriers to presenting bioinformatics information in ways that they can use usefully. I have found in my limited experience that homology graphs of conserved protein domains, such as those displayed in HomloGene or those in Ensembl (including phylogeny graphs) can be quite useful in reaching interesting conclusions. For example, double strand break repair processes which may involve 8-10 relatively conserved proteins, may have a critical role in the mechanisms of aging. In particular two of those proteins, WRN & DCLRE1C (Artemis) contain complementary exonuclease activities which chew up the DNA in order to prepare the strands for ligation. Of course, programmers may appreciate better than gerontologists the significance of deleting random bytes from instruction sequences in ones code. At the recent AGE meeting in June several discussions arose as to possible differences in "aging" in yeast, *C. elegans* and mammals. [1]. A quick database search showed that *C. elegans* seems to be lacking the exonuclease domain on the WRN homologue and may be missing a DCLRE1C homologue entirely (which if true would lead to conclusions that aging in *C. elegans* may be fundamentally different from aging in vertebrates). Explaining this to researchers can best be done using pictures. I've been through PubMed and have several papers (NAR / BMC Bioinformatics) regarding programs to do homology comparisons and phylogeny trees. However these seem to lean towards producing less condensed bioinformatics-ish information. I do not know however whether the outputs from databases like PubMed HomoloGene or Ensembl have been packaged in tools that might be part of BioPerl. I am interested in programs that can be run on a regular basis to draw "pretty pictures" that can be used for publication and/or internet browsing. In particular I'm interested in running such programs on species of interest to various gerontological communities [2] which involves subsets of databases which seem to be scattered around the world. Thanks. 1. Of course there has been lots of discussion and rationalization over the last 15+ years about how "aging" is largely the same in more complex and simpler organisms -- in part to justify sequencing some organisms and in part to justify funding research at certain laboratories. A closer examination based on some of the complete and emerging genome sequences may suggest this is a very swampy discussion. 2. For example, nematode DNA repair gene comparisons would be interesting to nematode researchers, insect DNA repair gene comparisons to insect researchers, both to invertebrate researchers, etc. 3. The recently published textbooks *Aging of the Genome* by Jan Vijg and the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg *et al*, go a long way towards moving these areas from the stacks of research libraries into areas for more general discussion. Both volumes deal extensively with the ~150 DNA repair genes. From cjfields at illinois.edu Mon Aug 17 13:15:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 12:15:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897BBA.2070204@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <4A897BBA.2070204@cornell.edu> Message-ID: I say go for it if Yee Man is okay with the idea. It gets the code out there that much faster. This also doesn't depend on core being split up (only need a 'requires' bioperl 1.6.0). chris On Aug 17, 2009, at 10:48 AM, Robert Buels wrote: > Also, I volunteer to make this branch and module machinery and such > if you want. I just don't want to step on any ongoing development > you guys are going in the bioperl-ext trunk. > > If you want me to do it, just say the word, either here or in > #bioperl. > > Rob > > Robert Buels wrote: >> Yee Man Chan wrote: >>> As to the release, my thinking is that I do understand that >>> your desire to maintain a high level of quality in BioPerl code >>> base. So if the HMM doesn't meet that standard, I am ok with it >>> being spinned off. >> We're not pushing to spin it off because of code quality, we're >> pushing to spin it off because we're spinning everything off. The >> plan is to break BioPerl up into many discrete distributions on >> CPAN with the dependencies between them well-known and codified. >> This will make maintenance of BioPerl *much* easier in the long run. >> So this means that the plan of action should be >> 1.) get the code so that it's working on all platforms, >> 2.) create a CPAN distribution for it and put it on CPAN, >> 3.) remove it from bioperl-ext >> Also, doing a search for bioperl-ext on CPAN brings to light a >> couple of issues that probably need to be dealt with. To wit: >> 1.) there is an ancient version of bioperl-ext that probably needs >> to be removed, it's under ~birney's account. Thoughts on this? >> 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend >> on bioperl-ext, which suggests that these really need to be split >> off, each with the Bio::Ext::Modules they depend on. >> Bio::Tools::HMM could be the first case of this: >> * make a dir in the repos called Bio-Tools-HMM alongside bioperl- >> live, having trunk/, and branches/ subdirs >> * move Bio::Tools::HMM out of bioperl-live into that >> * move Bio::Ext::HMM stuff out of bioperl-ext into that >> * repeat with Bio::Tools::dpAlign and pSW, which would probably >> go together into a Bio-Tools-Align distro, I think >> Sounds like this is moving along nicely. >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chmille4 at gmail.com Mon Aug 17 14:44:09 2009 From: chmille4 at gmail.com (Chase Miller) Date: Mon, 17 Aug 2009 14:44:09 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A897564.2090203@cornell.edu> References: <4A897564.2090203@cornell.edu> Message-ID: <991fb8210908171144t3f7107f0ldaf02dfdc762ae27@mail.gmail.com> Thanks! It was a great experience. I couldn't have done it without Mark who was a fantastic mentor. cheers, Chase On Mon, Aug 17, 2009 at 11:21 AM, Robert Buels wrote: > Hurrah! GSoC strikes again! > > Rob > From rmb32 at cornell.edu Mon Aug 17 16:32:14 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:32:14 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> Message-ID: <4A89BE4E.7090901@cornell.edu> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro at Bio-Tools-HMM in the repo. The tests are not passing, I think that some bugs need to be fixed in the logic of things. Yee Man, could you have a look? To download the newly repackaged code: svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM perl Build.PL; ./Build test Please check that things are compiling OK, check the test logic, upgrade the tests to use Test::More, and get the tests to the point where they are passing. At that point, it should be ready for CPAN, but we need to decide how we want to coordinate that with releases of bioperl-live and bioperl-ext. Rob From rmb32 at cornell.edu Mon Aug 17 16:45:42 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:45:42 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A89C176.3050109@cornell.edu> Mark A. Jensen wrote: > wrote tests (which pass!), complete POD, and a HOWTO (at The tests for this are depending on Bio::Phylo and fail if it's not installed. Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a "recommended" module, or what? Gotta clarify our dependencies. Rob From cjfields at illinois.edu Mon Aug 17 16:54:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 15:54:05 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: On Aug 17, 2009, at 3:45 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not > installed. Are we going to add Bio::Phylo as a bioperl dependency, > or band-aid it as a "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob 'recommends', should skip all tests as a 'pass' with message that 'Bio::Phylo is required' or somesuch. chris From maj at fortinbras.us Mon Aug 17 16:55:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 16:55:19 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: <3D65CA5234EB4BDF892F280D575FB01D@NewLife> I meant to add a skip tests on a runtime check for bio::phylo. Gotta do that. It's necessary only for these modules. ----- Original Message ----- From: "Robert Buels" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Rutger Vos" ; "Chase Miller" Sent: Monday, August 17, 2009 4:45 PM Subject: Re: [Bioperl-l] new NeXML I/O modules > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not installed. > Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a > "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Aug 17 17:22:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:22:00 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89BE4E.7090901@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> Message-ID: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Still seeing that odd warning popping up: cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line 185. Have you tried using Yee Man's original Makefile.PL to see if it works better? There appear to be some differences in the compilation, including a linking warning popping up. chris On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro > at Bio-Tools-HMM in the repo. The tests are not passing, I think > that some bugs need to be fixed in the logic of things. > > Yee Man, could you have a look? To download the newly repackaged > code: > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ > bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM > > perl Build.PL; ./Build test > > Please check that things are compiling OK, check the test logic, > upgrade the tests to use Test::More, and get the tests to the point > where they are passing. > > At that point, it should be ready for CPAN, but we need to decide > how we want to coordinate that with releases of bioperl-live and > bioperl-ext. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 17:28:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:28:05 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Take that back. Yes the 'FL' warning is still there, but no tests are run b/c (simply put) there are no regression tests (no use of Test or Test::More). If you run './Build test --verbose' you can see the run, but no test output. That should be easy to fix, though. chris On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > Still seeing that odd warning popping up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / > Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line > 185. > > Have you tried using Yee Man's original Makefile.PL to see if it > works better? There appear to be some differences in the > compilation, including a linking warning popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro >> at Bio-Tools-HMM in the repo. The tests are not passing, I think >> that some bugs need to be fixed in the logic of things. >> >> Yee Man, could you have a look? To download the newly repackaged >> code: >> >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >> bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM >> >> perl Build.PL; ./Build test >> >> Please check that things are compiling OK, check the test logic, >> upgrade the tests to use Test::More, and get the tests to the point >> where they are passing. >> >> At that point, it should be ready for CPAN, but we need to decide >> how we want to coordinate that with releases of bioperl-live and >> bioperl-ext. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 18:26:19 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 17:26:19 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <419432.62970.qm@web30403.mail.mud.yahoo.com> References: <419432.62970.qm@web30403.mail.mud.yahoo.com> Message-ID: <227EADF3-D769-413D-B1BF-22C919C8D097@illinois.edu> Yee Man, Will look into that. I do recall that disappearing last night, so I'll go look at the commit log. I have committed some regression tests using Bio::Root::Test. This'll need to be extensively tested b/c we're comparing floating point numbers, though I do use our custom float_is() test to run these (so we only compare first six signif). These are passing for me on 64bit perl 5.10.0; I may try these on a local 64bit linux (I need to set up bioperl on it first). chris On Aug 17, 2009, at 5:19 PM, Yee Man Chan wrote: > I believe this warnings should have been fixed with the latest Bio/ > Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/ > HMM.pm? I noticed that there are two pairs of "use strict" and "use > warnings" in this version. :P > > Yee Man > > --- On Mon, 8/17/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "BioPerl List" , "Yee Man Chan" > > >> Date: Monday, August 17, 2009, 2:22 PM >> Still seeing that odd warning popping >> up: >> >> cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose >> t/001_basics.t .. Argument "FL" isn't numeric in numeric lt >> (<) at >> /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm >> line 185. >> >> Have you tried using Yee Man's original Makefile.PL to see >> if it works better? There appear to be some >> differences in the compilation, including a linking warning >> popping up. >> >> chris >> >> On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: >> >>> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into >> a new distro at Bio-Tools-HMM in the repo. The tests >> are not passing, I think that some bugs need to be fixed in >> the logic of things. >>> >>> Yee Man, could you have a look? To download the >> newly repackaged code: >>> >>> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >>> bioperl/Bio-Tools-HMM/trunk >> Bio-Tools-HMM >>> >>> perl Build.PL; ./Build test >>> >>> Please check that things are compiling OK, check the >> test logic, upgrade the tests to use Test::More, and get the >> tests to the point where they are passing. >>> >>> At that point, it should be ready for CPAN, but we >> need to decide how we want to coordinate that with releases >> of bioperl-live and bioperl-ext. >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From abhishek.vit at gmail.com Mon Aug 17 18:53:19 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 18:53:19 -0400 Subject: [Bioperl-l] Error Copying Hashes Message-ID: Hi Guys I think this one should be appropriate for here. I am trying to copy a hash (spaced out below for the sake of readability} % { $OUTPUT->{$dir}->{'file'}->{$file}->{'additive'} } =%ADDITIVE_COUNT; ## Where %ADDITIVE_COUNT is a simple hash. (key/value) No references : I am getting this error :- Odd number of elements in hash assignment at ./assessCoverage.pl line 258 Seeing the dump of hash I see this $VAR1 = { '/local/seq/' => { 'read_len' => 36, 'file' => { 's_3_sorted.txt' => { 'additive' => { '8979/16384' => undef #### I dont understand this behavior. Something unusual is going on ????? }}}}} From rmb32 at cornell.edu Mon Aug 17 19:00:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:00:00 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <360578.66990.qm@web30403.mail.mud.yahoo.com> References: <360578.66990.qm@web30403.mail.mud.yahoo.com> Message-ID: <4A89E0F0.8010307@cornell.edu> Yee Man Chan wrote: > I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Please read my email above with instructions for checkout out the new Bio-Tools-HMM component, where Bio::Tools::HMM has been moved. Please do not add the Bio::Tools::HMM module back into bioperl-live. I think you might be confused about the functions of 'svn add', 'svn commit', etc, because I don't see any actual addition of the module in the commit logs. Please read through the SVN manual at http://svnbook.red-bean.com/ if you need clarification. Rob From rmb32 at cornell.edu Mon Aug 17 19:30:07 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:30:07 -0700 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: References: Message-ID: <4A89E7FF.1020603@cornell.edu> Well for one thing, it looks like somewhere a hash is getting accidentally evaluated in scalar context. '8979/16384' is a typical result of doing, for example, my $x = %some_hash; This might not be the proximate cause of your problem, it would be better to post your whole script somewhere so people can look over it. That said, this isn't the right list for this, this list is specifically for discussing the BioPerl toolkit, not just perl that is used in biology. IRC probably the quickest place to get perl help, try the #perl-help channel on the server irc.perl.org. Otherwise, you might try asking on a general perl mailing list, there seem to be some listed at http://perl-begin.org/mailing-lists/ Best of luck! Rob From abhishek.vit at gmail.com Mon Aug 17 19:33:41 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 19:33:41 -0400 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: <4A89E7FF.1020603@cornell.edu> References: <4A89E7FF.1020603@cornell.edu> Message-ID: Ok great. Thanks for pointing me to the right places to post later. best, -Abhi On Mon, Aug 17, 2009 at 7:30 PM, Robert Buels wrote: > Well for one thing, it looks like somewhere a hash is getting accidentally > evaluated in scalar context. '8979/16384' is a typical result of doing, for > example, my $x = %some_hash; This might not be the proximate cause of your > problem, it would be better to post your whole script somewhere so people > can look over it. > > That said, this isn't the right list for this, this list is specifically > for discussing the BioPerl toolkit, not just perl that is used in biology. > > IRC probably the quickest place to get perl help, try the #perl-help > channel on the server irc.perl.org. > > Otherwise, you might try asking on a general perl mailing list, there seem > to be some listed at > http://perl-begin.org/mailing-lists/ > > Best of luck! > > Rob > From rmb32 at cornell.edu Mon Aug 17 19:42:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:42:21 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A87275C.5040300@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> Message-ID: <4A89EADD.9050509@cornell.edu> I'm digging into the second item on implementation plan, having mostly finished splitting off Bio::FeatureIO (in a branch): * Rename some TypedSeqFeatureI methods as suggested in Hilmar's post Where Hilmar's post is at http://article.gmane.org/gmane.comp.lang.perl.bio.general/15846 Now, he refers to an interesting thing in there that I haven't heard discussed before, which is the concept of having the feature's source_tag by typed with an ontology term also, as source_term(). I can see how this might be a good idea, or it might be overkill. Anybody have thoughts on having feature _sources_ strongly typed with ontology terms? Rob From Kevin.M.Brown at asu.edu Mon Aug 17 20:36:34 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 17 Aug 2009 17:36:34 -0700 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> The obfuscator does help, but even it is a little sparse on data for modules. Especially information on the realities of the returned data from a method call. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger Sent: Monday, August 17, 2009 6:04 AM To: Mark A. Jensen; BioPerl List Subject: Re: [Bioperl-l] on BP documentation > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From sidd.basu at gmail.com Tue Aug 18 07:01:03 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 18 Aug 2009 06:01:03 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> Message-ID: <20090818110102.GA27010@seinfeld> Putting it in the bioperl list, makes more sense here, On Wed, 12 Aug 2009, Chris Fields wrote: > (BTW, this is re: the reimplementation of major chunks of BioPerl using > Moose, Biome: http://github.com/cjfields/biome/tree/) > > Locations should use a Role (specifically, Biome::Role::Range), so > start/end/strand should be attributes, not methods. With attributes the > best way to do this is probably with a builder, and lazily (start > requires end, and vice versa). Factor out the common code as Tomas > indicates. BTW, the $self->throw() is akin to BioPerl's $self->throw() > exception handling; it simply catches any exceptions and passes them to > the metaclass exception handling. > > I've been thinking about making the Range role abstract for this very > reason (or defining very basic attributes); something like: > > ---------------------------- > > package Bio::Role::Range; > > requires qw(_build_start _build_end _build_strand); > > # also require other methods which need to be defined in implementation > > has 'start' => ( > isa => 'Int', > is => 'rw', > builder => '_build_start', > lazy => 1 > ); > > # same for end, strand (except strand has a different isa via > MooseX::Types) > .... > > package Bio::Location::Foo; > > with 'Bio::Role::Range'; > > sub _build_start { > # for location-specific start > } > > sub _build_end { > # for location-specific end > } > > sub _build_strand { > # for location-specific strand > } > > sub _common_build_method { > # factor out common code here, call from other builders > } > > ---------------------------- This plan makes things much clearer. Currently the BioMe::Role::Location has a 'requires' keyword and rest of the location modules consume that role to have its own implementation. At this point on BioMe::Location::Atomic has attribute based 'start' and 'end' implememtation. I got a bit confused because in current bioperl 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when i am trying to follow that path in BioMe it has to override that method. So, my question is do all the location modules really needs to inherits from each other. I am totally aware about the origianl design ideas but it would be better to have a flatten hierarchy if possible. One more thing, what about putting the 'start', 'end' and the other common base attributes in BioMe::Role::Location instead of BioMe::Role::Range. I am not sure which would be correct from bioperl stand of view, just throwing out an idea. > > Also, I think the Coordinate-related stuff should be simplified down to a > trait or an attribute; they bring in way too much overhead in bioperl w/o > much added value. You mean instead of having 'builder' method, having a specialized traits handling those. That sounds like even better. -siddhartha > > And now back to your regular Moose-related broadcast... > > chris > > On Aug 11, 2009, at 9:27 PM, Siddhartha Basu wrote: > > > Hi, > > In one my classes i have this boilerplate code block that is repeated > > all > > over .... > > > > sub start { > > my ( $self, $value ) = @_; > > $self->{'_start'} = $value if defined $value; > > > > ## -- from here > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > return $self->{'_start'}; > > ## -- here > > > > } > > > > then again .... > > > > sub end { > > my ( $self, $value ) = @_; > > > > $self->{'_end'} = $value if defined $value; > > > > #assume end is the same as start if not defined > > if ( !defined $self->{'_end'} ) { > > if ( !defined $self->{'_start'} ) { > > $self->warn('Calling end without a defined start > > position'); > > return; > > } > > $self->warn('Setting start equal to end'); > > $self->{'_end'} = $self->{'_start'}; > > } > > > > ## ---- > > > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > > > return $self->{'_end'}; > > #--------- > > } > > > > > > Is there any way moose can be used here for more code resuage. I > > thought > > about converted it to a type but still couldn't figure out how that > > can > > be done. > > > > > > thanks, > > -siddhartha > From deequan at gmail.com Fri Aug 14 15:02:06 2009 From: deequan at gmail.com (David Quan) Date: Fri, 14 Aug 2009 15:02:06 -0400 Subject: [Bioperl-l] bioperl capability Message-ID: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Hello, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From ymc at yahoo.com Fri Aug 14 22:57:15 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 19:57:15 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <85143.35343.qm@web30404.mail.mud.yahoo.com> Hi Chris I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) Please let me know if it works for you. Sorry for the bug... Yee Man --- On Fri, 8/14/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Friday, August 14, 2009, 8:31 AM > Yee Man, > > I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 > 64-bit) and on dev.open-bio.org (which is perl 5.8.8, > appears to be 32-bit).? The patch results in cleaning > up warnings for 5.10.0 but results in similar warnings for > 5.8.8 (linux or OS X). > > On OS X perl 5.8.8, this sometimes passes (note the first > attempt fails, the second succeeds), so it's not entirely a > 32-bit issue: > > http://gist.github.com/167860 > > OS X and perl 5.10.0, this always fails as the previous > gist shows, but demonstrates similar behavior (multiple > attempts to test get different responses): > > http://gist.github.com/167542 > > On linux, everything passes with or w/o the patched files > (patched files have warnings as indicated above): > > Specs for all three perl executables (they vary a bit): > > http://gist.github.com/167883 > > chris > > On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > > > Ah.. I find that the typemap can become as simple as > this > > ===================== > > TYPEMAP > > HMM *? ? T_PTROBJ > > ===================== > > > > Then the generated HMM.c will have a function called > INT2PTR to do the pointer conversion. I believe this should > solve the warnings. > > > > Attached are the updated HMM.xs and typemap. Can > someone with a 64-bit machine give it a try? > > > > Thank you > > Yee Man > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >> Date: Thursday, August 13, 2009, 5:31 PM > >> (just to point out to everyone, Yee > >> Man's contact information was in the POD) > >> > >> Yee Man, > >> > >> I have the output in the below link: > >> > >> http://gist.github.com/167542 > >> > >> There are similar problems popping up on 32- and > 64-bit > >> perl 5.10.0, Mac OS X 10.5.? Haven't had time > to debug > >> it unfortunately. > >> > >> I think we should seriously consider spinning this > code off > >> into it's own distribution for CPAN.? It's > >> unfortunately bit-rotting away in > bioperl-ext.? If you > >> want to continue supporting it I can help set that > up. > >> > >> chris > >> > >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > >> > >>> Hi > >>> > >>>? ???So is this an HMM only > problem? Or does > >> it apply to other bioperl-ext modules? > >>> > >>>? ???What exactly are the > compilation errors > >> for HMM? I believe my implementation is just a > simple one > >> based on Rabiner's paper. > >>> > >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>> > >>>? ???I don't think I did > anything fancy that > >> makes it machine dependent or non-ANSI C. > >>> > >>> Yee Man > >>> > >>> --- On Thu, 8/13/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Jonny Dalzell" , > >> "BioPerl List" , > >> "Yee Man Chan" > >>>> Date: Thursday, August 13, 2009, 3:18 PM > >>>> > >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels > wrote: > >>>> > >>>>> Jonny Dalzell wrote: > >>>>>> Is it ridiculous of me to expect > ubuntu to > >> take > >>>> care of this for me?? How do > >>>>>> I go about compiling the HMM? > >>>>> Yes.? This is a very specialized > thing > >> that > >>>> you're doing, and Ubuntu does not have > the > >> resources to > >>>> package every single thing. > >>>>> > >>>>> Unfortunately, it looks like > bioperl-ext > >> package is > >>>> not installable under Ubuntu 9.04 anyway, > which is > >> what I'm > >>>> running.? For others on this list, > if > >> somebody is > >>>> interested in doing maintaining it, I'd be > happy > >> to help out > >>>> by testing on Debian-based Linux > platforms. > >> We need to > >>>> clarify this package's maintenance status: > if > >> there is > >>>> nobody interested in maintaining it, I > would > >> recommend that > >>>> bioperl-ext be removed from distribution. > >> It's not in > >>>> anybody's interest to have unmaintained > software > >> out there > >>>> causing confusion. > >>>> > >>>> I have cc'd Yee Man Chan for this.? > If there > >> isn't a > >>>> response or the message bounces, we do one > of two > >> things: > >>>> > >>>> 1) consider it deprecated (probably > safest). > >>>> 2) spin it out into a separate module. > >>>> > >>>> Just tried to comile it myself and am > getting > >> errors (using > >>>> 64bit perl 5.10), so I think, unless > someone wants > >> to take > >>>> this on, option #1 is best. > >>>> > >>>>> So Jonny, in short, I would say "do > not use > >>>> bioperl-ext". > >>>> > >>>> In general, that's a safe bet.? We're > moving > >> most of > >>>> our C/C++ bindings to BioLib. > >>>> > >>>>> Step back.? What are you trying > to > >>>> accomplish?? Chris already > recommended some > >> alternative > >>>> methods in his email of 8/11 on this > >> subject.? Perhaps > >>>> we can guide you to some software that is > >> actively > >>>> maintained and will meet your needs. > >>>>> > >>>>> Rob > >>>> > >>>> Exactly.? Lots of other (better > supported!) > >> options > >>>> out there.? HMMER, SeqAn, and > others. > >>>> > >>>> chris > >>>> > >>> > >>> > >>> > >> > >> > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam?? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5614 bytes Desc: not available URL: From ymc at yahoo.com Sat Aug 15 21:23:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 18:23:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> Message-ID: <241652.96493.qm@web30404.mail.mud.yahoo.com> I just committed HMM.xs and typemap to SVN. Can you test it to confirm it works in 64-bit machines? Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Saturday, August 15, 2009, 12:11 PM > I'm not sure, but it makes more sense > to commit these changes directly.? Yee, need us to set > you up with a commit bit?? If so, fill out the > information on this page: > > http://www.bioperl.org/wiki/SVN_Account_Request > > and forward it to support at open-bio.org.? > I'll sponsor you. > > chris > > On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > > > The usual procedure for developing code is to exchange > code via commits to a version control system.? Yee, do > you know how to use Subversion? Does Yee need a commit bit? > > > > Rob > > > > Yee Man Chan wrote: > >> Hi Chris > >>???I find that there is a memory > access bug in my code. Attached is the fixed HMM.xs. This > file together with the simpler typemap should fix all > problems. (I hope..) > >>???Please let me know if it works > for you. > >> Sorry for the bug... > >> Yee Man > >> --- On Fri, 8/14/09, Chris Fields > wrote: > >>> From: Chris Fields > >>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext package on WinVista? > >>> To: "Yee Man Chan" > >>> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >>> Date: Friday, August 14, 2009, 8:31 AM > >>> Yee Man, > >>> > >>> I tested this out locally (perl 5.8.8 32-bit, > perl 5.10.0 > >>> 64-bit) and on dev.open-bio.org (which is perl > 5.8.8, > >>> appears to be 32-bit).? The patch results > in cleaning > >>> up warnings for 5.10.0 but results in similar > warnings for > >>> 5.8.8 (linux or OS X). > >>> > >>> On OS X perl 5.8.8, this sometimes passes > (note the first > >>> attempt fails, the second succeeds), so it's > not entirely a > >>> 32-bit issue: > >>> > >>> http://gist.github.com/167860 > >>> > >>> OS X and perl 5.10.0, this always fails as the > previous > >>> gist shows, but demonstrates similar behavior > (multiple > >>> attempts to test get different responses): > >>> > >>> http://gist.github.com/167542 > >>> > >>> On linux, everything passes with or w/o the > patched files > >>> (patched files have warnings as indicated > above): > >>> > >>> Specs for all three perl executables (they > vary a bit): > >>> > >>> http://gist.github.com/167883 > >>> > >>> chris > >>> > >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan > wrote: > >>> > >>>> Ah.. I find that the typemap can become as > simple as > >>> this > >>>> ===================== > >>>> TYPEMAP > >>>> HMM *? ? T_PTROBJ > >>>> ===================== > >>>> > >>>> Then the generated HMM.c will have a > function called > >>> INT2PTR to do the pointer conversion. I > believe this should > >>> solve the warnings. > >>>> Attached are the updated HMM.xs and > typemap. Can > >>> someone with a 64-bit machine give it a try? > >>>> Thank you > >>>> Yee Man > >>>> --- On Thu, 8/13/09, Chris Fields > >>> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >>> package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >>> "Jonny Dalzell" , > >>> "BioPerl List" > >>>>> Date: Thursday, August 13, 2009, 5:31 > PM > >>>>> (just to point out to everyone, Yee > >>>>> Man's contact information was in the > POD) > >>>>> > >>>>> Yee Man, > >>>>> > >>>>> I have the output in the below link: > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> There are similar problems popping up > on 32- and > >>> 64-bit > >>>>> perl 5.10.0, Mac OS X 10.5.? > Haven't had time > >>> to debug > >>>>> it unfortunately. > >>>>> > >>>>> I think we should seriously consider > spinning this > >>> code off > >>>>> into it's own distribution for > CPAN.? It's > >>>>> unfortunately bit-rotting away in > >>> bioperl-ext.? If you > >>>>> want to continue supporting it I can > help set that > >>> up. > >>>>> chris > >>>>> > >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man > Chan wrote: > >>>>> > >>>>>> Hi > >>>>>> > >>>>>>? ???So is this > an HMM only > >>> problem? Or does > >>>>> it apply to other bioperl-ext > modules? > >>>>>>? ???What > exactly are the > >>> compilation errors > >>>>> for HMM? I believe my implementation > is just a > >>> simple one > >>>>> based on Rabiner's paper. > >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>> > >>>>>>? ???I don't > think I did > >>> anything fancy that > >>>>> makes it machine dependent or non-ANSI > C. > >>>>>> Yee Man > >>>>>> > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >>> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Robert Buels" > >>>>>>> Cc: "Jonny Dalzell" , > >>>>> "BioPerl List" , > >>>>> "Yee Man Chan" > >>>>>>> Date: Thursday, August 13, > 2009, 3:18 PM > >>>>>>> > >>>>>>> On Aug 13, 2009, at 4:37 PM, > Robert Buels > >>> wrote: > >>>>>>>> Jonny Dalzell wrote: > >>>>>>>>> Is it ridiculous of me > to expect > >>> ubuntu to > >>>>> take > >>>>>>> care of this for me?? How > do > >>>>>>>>> I go about compiling > the HMM? > >>>>>>>> Yes.? This is a very > specialized > >>> thing > >>>>> that > >>>>>>> you're doing, and Ubuntu does > not have > >>> the > >>>>> resources to > >>>>>>> package every single thing. > >>>>>>>> Unfortunately, it looks > like > >>> bioperl-ext > >>>>> package is > >>>>>>> not installable under Ubuntu > 9.04 anyway, > >>> which is > >>>>> what I'm > >>>>>>> running.? For others on > this list, > >>> if > >>>>> somebody is > >>>>>>> interested in doing > maintaining it, I'd be > >>> happy > >>>>> to help out > >>>>>>> by testing on Debian-based > Linux > >>> platforms. > >>>>> We need to > >>>>>>> clarify this package's > maintenance status: > >>> if > >>>>> there is > >>>>>>> nobody interested in > maintaining it, I > >>> would > >>>>> recommend that > >>>>>>> bioperl-ext be removed from > distribution. > >>>>> It's not in > >>>>>>> anybody's interest to have > unmaintained > >>> software > >>>>> out there > >>>>>>> causing confusion. > >>>>>>> > >>>>>>> I have cc'd Yee Man Chan for > this. > >>> If there > >>>>> isn't a > >>>>>>> response or the message > bounces, we do one > >>> of two > >>>>> things: > >>>>>>> 1) consider it deprecated > (probably > >>> safest). > >>>>>>> 2) spin it out into a separate > module. > >>>>>>> > >>>>>>> Just tried to comile it myself > and am > >>> getting > >>>>> errors (using > >>>>>>> 64bit perl 5.10), so I think, > unless > >>> someone wants > >>>>> to take > >>>>>>> this on, option #1 is best. > >>>>>>> > >>>>>>>> So Jonny, in short, I > would say "do > >>> not use > >>>>>>> bioperl-ext". > >>>>>>> > >>>>>>> In general, that's a safe > bet.? We're > >>> moving > >>>>> most of > >>>>>>> our C/C++ bindings to BioLib. > >>>>>>> > >>>>>>>> Step back.? What are > you trying > >>> to > >>>>>>> accomplish?? Chris > already > >>> recommended some > >>>>> alternative > >>>>>>> methods in his email of 8/11 > on this > >>>>> subject.? Perhaps > >>>>>>> we can guide you to some > software that is > >>>>> actively > >>>>>>> maintained and will meet your > needs. > >>>>>>>> Rob > >>>>>>> Exactly.? Lots of other > (better > >>> supported!) > >>>>> options > >>>>>>> out there.? HMMER, SeqAn, > and > >>> others. > >>>>>>> chris > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > __________________________________________________ > >>>> Do You Yahoo!? > >>>> Tired of spam?? Yahoo! Mail has the > best spam > >>> protection around > >>>> http://mail.yahoo.com > >>> > _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > > > > > > --Robert Buels > > Bioinformatics Analyst, Sol Genomics Network > > Boyce Thompson Institute for Plant Research > > Tower Rd > > Ithaca, NY? 14853 > > Tel: 503-889-8539 > > rmb32 at cornell.edu > > http://www.sgn.cornell.edu > > From ymc at yahoo.com Sun Aug 16 00:32:19 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 21:32:19 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <846546.73578.qm@web30404.mail.mud.yahoo.com> When are you going to release 1.6? Maybe let me work on it before it releases. If it doesn't resolve the problem, then we can think about other alternatives. Also, please show me the latest errors you have for 5.10.0. Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 7:05 PM > I'm still seeing the same errors on > Mac OS X for 64-bit perl 5.10.0.? Mac OS X, native perl > (v5.8.8) passes fine now (as well as perl 5.8.8 on > dev.open-bio.org). > > I'm wondering if this is a problem with my local perl > build.? I'm very tempted to push the HMM-related code > into a separate distribution (bioperl-hmm) and make a CPAN > release out of it so it gets wider testing via CPAN testers; > it would just require a minimum bioperl 1.6 installation for > Bio::Tools::HMM and any related modules.? Yee, would > that be okay with you? > > chris > > On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > > > > I just committed HMM.xs and typemap to SVN. Can you > test it to confirm it works in 64-bit machines? > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 12:11 PM > >> I'm not sure, but it makes more sense > >> to commit these changes directly.? Yee, need > us to set > >> you up with a commit bit?? If so, fill out > the > >> information on this page: > >> > >> http://www.bioperl.org/wiki/SVN_Account_Request > >> > >> and forward it to support at open-bio.org. > >> I'll sponsor you. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > >> > >>> The usual procedure for developing code is to > exchange > >> code via commits to a version control > system.? Yee, do > >> you know how to use Subversion? Does Yee need a > commit bit? > >>> > >>> Rob > >>> > >>> Yee Man Chan wrote: > >>>> Hi Chris > >>>>? ? I find that there is a > memory > >> access bug in my code. Attached is the fixed > HMM.xs. This > >> file together with the simpler typemap should fix > all > >> problems. (I hope..) > >>>>? ? Please let me know if it > works > >> for you. > >>>> Sorry for the bug... > >>>> Yee Man > >>>> --- On Fri, 8/14/09, Chris Fields > >> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >> "Jonny Dalzell" , > >> "BioPerl List" > >>>>> Date: Friday, August 14, 2009, 8:31 > AM > >>>>> Yee Man, > >>>>> > >>>>> I tested this out locally (perl 5.8.8 > 32-bit, > >> perl 5.10.0 > >>>>> 64-bit) and on dev.open-bio.org (which > is perl > >> 5.8.8, > >>>>> appears to be 32-bit).? The patch > results > >> in cleaning > >>>>> up warnings for 5.10.0 but results in > similar > >> warnings for > >>>>> 5.8.8 (linux or OS X). > >>>>> > >>>>> On OS X perl 5.8.8, this sometimes > passes > >> (note the first > >>>>> attempt fails, the second succeeds), > so it's > >> not entirely a > >>>>> 32-bit issue: > >>>>> > >>>>> http://gist.github.com/167860 > >>>>> > >>>>> OS X and perl 5.10.0, this always > fails as the > >> previous > >>>>> gist shows, but demonstrates similar > behavior > >> (multiple > >>>>> attempts to test get different > responses): > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> On linux, everything passes with or > w/o the > >> patched files > >>>>> (patched files have warnings as > indicated > >> above): > >>>>> > >>>>> Specs for all three perl executables > (they > >> vary a bit): > >>>>> > >>>>> http://gist.github.com/167883 > >>>>> > >>>>> chris > >>>>> > >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man > Chan > >> wrote: > >>>>> > >>>>>> Ah.. I find that the typemap can > become as > >> simple as > >>>>> this > >>>>>> ===================== > >>>>>> TYPEMAP > >>>>>> HMM *? ? T_PTROBJ > >>>>>> ===================== > >>>>>> > >>>>>> Then the generated HMM.c will have > a > >> function called > >>>>> INT2PTR to do the pointer conversion. > I > >> believe this should > >>>>> solve the warnings. > >>>>>> Attached are the updated HMM.xs > and > >> typemap. Can > >>>>> someone with a 64-bit machine give it > a try? > >>>>>> Thank you > >>>>>> Yee Man > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>>> "Jonny Dalzell" , > >>>>> "BioPerl List" > >>>>>>> Date: Thursday, August 13, > 2009, 5:31 > >> PM > >>>>>>> (just to point out to > everyone, Yee > >>>>>>> Man's contact information was > in the > >> POD) > >>>>>>> > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I have the output in the below > link: > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> There are similar problems > popping up > >> on 32- and > >>>>> 64-bit > >>>>>>> perl 5.10.0, Mac OS X 10.5. > >> Haven't had time > >>>>> to debug > >>>>>>> it unfortunately. > >>>>>>> > >>>>>>> I think we should seriously > consider > >> spinning this > >>>>> code off > >>>>>>> into it's own distribution > for > >> CPAN.? It's > >>>>>>> unfortunately bit-rotting away > in > >>>>> bioperl-ext.? If you > >>>>>>> want to continue supporting it > I can > >> help set that > >>>>> up. > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 13, 2009, at 6:58 PM, > Yee Man > >> Chan wrote: > >>>>>>> > >>>>>>>> Hi > >>>>>>>> > >>>>>>>>? ? ? So is > this > >> an HMM only > >>>>> problem? Or does > >>>>>>> it apply to other bioperl-ext > >> modules? > >>>>>>>>? ? ? What > >> exactly are the > >>>>> compilation errors > >>>>>>> for HMM? I believe my > implementation > >> is just a > >>>>> simple one > >>>>>>> based on Rabiner's paper. > >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>> > >>>>>>>>? ? ? I > don't > >> think I did > >>>>> anything fancy that > >>>>>>> makes it machine dependent or > non-ANSI > >> C. > >>>>>>>> Yee Man > >>>>>>>> > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Robert Buels" > > >>>>>>>>> Cc: "Jonny Dalzell" > , > >>>>>>> "BioPerl List" , > >>>>>>> "Yee Man Chan" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 3:18 PM > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 4:37 PM, > >> Robert Buels > >>>>> wrote: > >>>>>>>>>> Jonny Dalzell > wrote: > >>>>>>>>>>> Is it > ridiculous of me > >> to expect > >>>>> ubuntu to > >>>>>>> take > >>>>>>>>> care of this for > me?? How > >> do > >>>>>>>>>>> I go about > compiling > >> the HMM? > >>>>>>>>>> Yes.? This is > a very > >> specialized > >>>>> thing > >>>>>>> that > >>>>>>>>> you're doing, and > Ubuntu does > >> not have > >>>>> the > >>>>>>> resources to > >>>>>>>>> package every single > thing. > >>>>>>>>>> Unfortunately, it > looks > >> like > >>>>> bioperl-ext > >>>>>>> package is > >>>>>>>>> not installable under > Ubuntu > >> 9.04 anyway, > >>>>> which is > >>>>>>> what I'm > >>>>>>>>> running.? For > others on > >> this list, > >>>>> if > >>>>>>> somebody is > >>>>>>>>> interested in doing > >> maintaining it, I'd be > >>>>> happy > >>>>>>> to help out > >>>>>>>>> by testing on > Debian-based > >> Linux > >>>>> platforms. > >>>>>>> We need to > >>>>>>>>> clarify this > package's > >> maintenance status: > >>>>> if > >>>>>>> there is > >>>>>>>>> nobody interested in > >> maintaining it, I > >>>>> would > >>>>>>> recommend that > >>>>>>>>> bioperl-ext be removed > from > >> distribution. > >>>>>>> It's not in > >>>>>>>>> anybody's interest to > have > >> unmaintained > >>>>> software > >>>>>>> out there > >>>>>>>>> causing confusion. > >>>>>>>>> > >>>>>>>>> I have cc'd Yee Man > Chan for > >> this. > >>>>> If there > >>>>>>> isn't a > >>>>>>>>> response or the > message > >> bounces, we do one > >>>>> of two > >>>>>>> things: > >>>>>>>>> 1) consider it > deprecated > >> (probably > >>>>> safest). > >>>>>>>>> 2) spin it out into a > separate > >> module. > >>>>>>>>> > >>>>>>>>> Just tried to comile > it myself > >> and am > >>>>> getting > >>>>>>> errors (using > >>>>>>>>> 64bit perl 5.10), so I > think, > >> unless > >>>>> someone wants > >>>>>>> to take > >>>>>>>>> this on, option #1 is > best. > >>>>>>>>> > >>>>>>>>>> So Jonny, in > short, I > >> would say "do > >>>>> not use > >>>>>>>>> bioperl-ext". > >>>>>>>>> > >>>>>>>>> In general, that's a > safe > >> bet.? We're > >>>>> moving > >>>>>>> most of > >>>>>>>>> our C/C++ bindings to > BioLib. > >>>>>>>>> > >>>>>>>>>> Step back.? > What are > >> you trying > >>>>> to > >>>>>>>>> accomplish?? > Chris > >> already > >>>>> recommended some > >>>>>>> alternative > >>>>>>>>> methods in his email > of 8/11 > >> on this > >>>>>>> subject.? Perhaps > >>>>>>>>> we can guide you to > some > >> software that is > >>>>>>> actively > >>>>>>>>> maintained and will > meet your > >> needs. > >>>>>>>>>> Rob > >>>>>>>>> Exactly.? Lots of > other > >> (better > >>>>> supported!) > >>>>>>> options > >>>>>>>>> out there.? > HMMER, SeqAn, > >> and > >>>>> others. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >> > __________________________________________________ > >>>>>> Do You Yahoo!? > >>>>>> Tired of spam?? Yahoo! Mail > has the > >> best spam > >>>>> protection around > >>>>>> http://mail.yahoo.com > >>>>> > >> > _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>> > >>> > >>> --Robert Buels > >>> Bioinformatics Analyst, Sol Genomics Network > >>> Boyce Thompson Institute for Plant Research > >>> Tower Rd > >>> Ithaca, NY? 14853 > >>> Tel: 503-889-8539 > >>> rmb32 at cornell.edu > >>> http://www.sgn.cornell.edu > >> > >> > > > > > > > > From ymc at yahoo.com Sun Aug 16 05:36:59 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 02:36:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <217259.7083.qm@web30408.mail.mud.yahoo.com> Hi Chris Thanks for your suggestions. I think it is indeed better to check sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm I also fixed codes that will lead to warnings with use warnings. So now the only problem left is that "monotonic increasing" error. For that part of the code, I was trying to perform an expectation maximization step. Theoretically, the expectation should monotonically increase in every step. But I suppose this is not necessarily true when double precision floating point numbers are involved. I don't know why I used a 1e-100 tolerance for this. Therefore I "fixed" it by using the same tolerance to terminate the maximization step (ie .000001). I suppose this "fix" will make it much more unlikely to throw exception with your 5.10.0 perl. Can you give that a try again and see if it works now. Thank you Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 10:38 PM > Yee, > > I took the liberty of making a few simple changes to > Bio::Tools::HMM in svn to point out the problem and possible > solutions.? Feel free to revert these as needed. > > I'm seeing two errors, which appear randomly when running > 'make test'.? The first is easily fixable, the second, > I'm not so sure.? I'll let you make the decisions on > both. > > 1)? There is an assumption in the module that, when > adding floating points, you will always get 1.0.? You > may run into problems: see 'perldoc -q long decimals'.? > Lines like this (two places in the module): > ? ... > ? if ($sum != 1.0) { > ? ???$self->throw("Sum of > probabilities for each state must be 1.0; got $sum\n"); > ? } > ? ... > > won't work as expected (note I added a simple diagnostic, > just print out the 'bad' sum).? With perl 5.8.8, this > appears to work fine, but this is what I get with perl 5.10 > (64-bit): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > Initial Probability Array: > 0.499978??? 0.500022??? > Transition Probability Matrix: > 0.499978??? 0.500022??? > 0.499978??? 0.500022??? > Emission Probability Matrix: > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > > Log Probability of sequence 1: -521.808 > Log Probability of sequence 2: -426.057 > > Statistical Training > ==================== > Initial Probability Array: > 1??? 0??? > Transition Probability Matrix: > > ------------- EXCEPTION ------------- > MSG: Sum of probabilities for each from-state must be 1.0; > got 0.999999999999999976 > > STACK Bio::Tools::HMM::transition_prob > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > STACK toplevel test.pl:82 > ------------------------------------- > > make: *** [test_dynamic] Error 255 > > I'm assuming this needs to simply be rounded up to > 1.0.? That could be accomplished with something like > 'if (sprintf("%.2f", $sum) != 1.0) {...}' > > 2) The second error is a little stranger.? I have been > randomly getting this: > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > When I add strict and warnings pragmas to Bio::Tools::HMM > (with a little additional cleanup to get things running), I > get an additional warning (arrow): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Argument "FL" isn't numeric in numeric lt (<) at > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > 188. <---- > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > So something is not being converted as expected. > > chris > > On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > > > When are you going to release 1.6? Maybe let me work > on it before it releases. If it doesn't resolve the problem, > then we can think about other alternatives. > > > > Also, please show me the latest errors you have for > 5.10.0. > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 7:05 PM > >> I'm still seeing the same errors on > >> Mac OS X for 64-bit perl 5.10.0.? Mac OS X, > native perl > >> (v5.8.8) passes fine now (as well as perl 5.8.8 > on > >> dev.open-bio.org). > >> > >> I'm wondering if this is a problem with my local > perl > >> build.? I'm very tempted to push the > HMM-related code > >> into a separate distribution (bioperl-hmm) and > make a CPAN > >> release out of it so it gets wider testing via > CPAN testers; > >> it would just require a minimum bioperl 1.6 > installation for > >> Bio::Tools::HMM and any related modules.? > Yee, would > >> that be okay with you? > >> > >> chris > >> > >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > >> > >>> > >>> I just committed HMM.xs and typemap to SVN. > Can you > >> test it to confirm it works in 64-bit machines? > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Yee Man Chan" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 12:11 PM > >>>> I'm not sure, but it makes more sense > >>>> to commit these changes directly.? > Yee, need > >> us to set > >>>> you up with a commit bit?? If so, > fill out > >> the > >>>> information on this page: > >>>> > >>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>> > >>>> and forward it to support at open-bio.org. > >>>> I'll sponsor you. > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels > wrote: > >>>> > >>>>> The usual procedure for developing > code is to > >> exchange > >>>> code via commits to a version control > >> system.? Yee, do > >>>> you know how to use Subversion? Does Yee > need a > >> commit bit? > >>>>> > >>>>> Rob > >>>>> > >>>>> Yee Man Chan wrote: > >>>>>> Hi Chris > >>>>>>? ???I find > that there is a > >> memory > >>>> access bug in my code. Attached is the > fixed > >> HMM.xs. This > >>>> file together with the simpler typemap > should fix > >> all > >>>> problems. (I hope..) > >>>>>>? ???Please let > me know if it > >> works > >>>> for you. > >>>>>> Sorry for the bug... > >>>>>> Yee Man > >>>>>> --- On Fri, 8/14/09, Chris Fields > > >>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems > >> with > >>>> Bioperl-ext package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>> "Jonny Dalzell" , > >>>> "BioPerl List" > >>>>>>> Date: Friday, August 14, 2009, > 8:31 > >> AM > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I tested this out locally > (perl 5.8.8 > >> 32-bit, > >>>> perl 5.10.0 > >>>>>>> 64-bit) and on > dev.open-bio.org (which > >> is perl > >>>> 5.8.8, > >>>>>>> appears to be 32-bit).? > The patch > >> results > >>>> in cleaning > >>>>>>> up warnings for 5.10.0 but > results in > >> similar > >>>> warnings for > >>>>>>> 5.8.8 (linux or OS X). > >>>>>>> > >>>>>>> On OS X perl 5.8.8, this > sometimes > >> passes > >>>> (note the first > >>>>>>> attempt fails, the second > succeeds), > >> so it's > >>>> not entirely a > >>>>>>> 32-bit issue: > >>>>>>> > >>>>>>> http://gist.github.com/167860 > >>>>>>> > >>>>>>> OS X and perl 5.10.0, this > always > >> fails as the > >>>> previous > >>>>>>> gist shows, but demonstrates > similar > >> behavior > >>>> (multiple > >>>>>>> attempts to test get > different > >> responses): > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> On linux, everything passes > with or > >> w/o the > >>>> patched files > >>>>>>> (patched files have warnings > as > >> indicated > >>>> above): > >>>>>>> > >>>>>>> Specs for all three perl > executables > >> (they > >>>> vary a bit): > >>>>>>> > >>>>>>> http://gist.github.com/167883 > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 14, 2009, at 3:27 AM, > Yee Man > >> Chan > >>>> wrote: > >>>>>>> > >>>>>>>> Ah.. I find that the > typemap can > >> become as > >>>> simple as > >>>>>>> this > >>>>>>>> ===================== > >>>>>>>> TYPEMAP > >>>>>>>> HMM *? ? > T_PTROBJ > >>>>>>>> ===================== > >>>>>>>> > >>>>>>>> Then the generated HMM.c > will have > >> a > >>>> function called > >>>>>>> INT2PTR to do the pointer > conversion. > >> I > >>>> believe this should > >>>>>>> solve the warnings. > >>>>>>>> Attached are the updated > HMM.xs > >> and > >>>> typemap. Can > >>>>>>> someone with a 64-bit machine > give it > >> a try? > >>>>>>>> Thank you > >>>>>>>> Yee Man > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>>> "Jonny Dalzell" , > >>>>>>> "BioPerl List" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 5:31 > >>>> PM > >>>>>>>>> (just to point out to > >> everyone, Yee > >>>>>>>>> Man's contact > information was > >> in the > >>>> POD) > >>>>>>>>> > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I have the output in > the below > >> link: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> There are similar > problems > >> popping up > >>>> on 32- and > >>>>>>> 64-bit > >>>>>>>>> perl 5.10.0, Mac OS X > 10.5. > >>>> Haven't had time > >>>>>>> to debug > >>>>>>>>> it unfortunately. > >>>>>>>>> > >>>>>>>>> I think we should > seriously > >> consider > >>>> spinning this > >>>>>>> code off > >>>>>>>>> into it's own > distribution > >> for > >>>> CPAN.? It's > >>>>>>>>> unfortunately > bit-rotting away > >> in > >>>>>>> bioperl-ext.? If you > >>>>>>>>> want to continue > supporting it > >> I can > >>>> help set that > >>>>>>> up. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 6:58 PM, > >> Yee Man > >>>> Chan wrote: > >>>>>>>>> > >>>>>>>>>> Hi > >>>>>>>>>> > >>>>>>>>>>? ? > ???So is > >> this > >>>> an HMM only > >>>>>>> problem? Or does > >>>>>>>>> it apply to other > bioperl-ext > >>>> modules? > >>>>>>>>>>? ? > ???What > >>>> exactly are the > >>>>>>> compilation errors > >>>>>>>>> for HMM? I believe my > >> implementation > >>>> is just a > >>>>>>> simple one > >>>>>>>>> based on Rabiner's > paper. > >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>> > >>>>>>>>>>? ? > ???I > >> don't > >>>> think I did > >>>>>>> anything fancy that > >>>>>>>>> makes it machine > dependent or > >> non-ANSI > >>>> C. > >>>>>>>>>> Yee Man > >>>>>>>>>> > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Robert > Buels" > >> > >>>>>>>>>>> Cc: "Jonny > Dalzell" > >> , > >>>>>>>>> "BioPerl List" , > >>>>>>>>> "Yee Man Chan" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 3:18 PM > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 4:37 PM, > >>>> Robert Buels > >>>>>>> wrote: > >>>>>>>>>>>> Jonny > Dalzell > >> wrote: > >>>>>>>>>>>>> Is it > >> ridiculous of me > >>>> to expect > >>>>>>> ubuntu to > >>>>>>>>> take > >>>>>>>>>>> care of this > for > >> me?? How > >>>> do > >>>>>>>>>>>>> I go > about > >> compiling > >>>> the HMM? > >>>>>>>>>>>> Yes.? > This is > >> a very > >>>> specialized > >>>>>>> thing > >>>>>>>>> that > >>>>>>>>>>> you're doing, > and > >> Ubuntu does > >>>> not have > >>>>>>> the > >>>>>>>>> resources to > >>>>>>>>>>> package every > single > >> thing. > >>>>>>>>>>>> > Unfortunately, it > >> looks > >>>> like > >>>>>>> bioperl-ext > >>>>>>>>> package is > >>>>>>>>>>> not > installable under > >> Ubuntu > >>>> 9.04 anyway, > >>>>>>> which is > >>>>>>>>> what I'm > >>>>>>>>>>> running.? > For > >> others on > >>>> this list, > >>>>>>> if > >>>>>>>>> somebody is > >>>>>>>>>>> interested in > doing > >>>> maintaining it, I'd be > >>>>>>> happy > >>>>>>>>> to help out > >>>>>>>>>>> by testing on > >> Debian-based > >>>> Linux > >>>>>>> platforms. > >>>>>>>>> We need to > >>>>>>>>>>> clarify this > >> package's > >>>> maintenance status: > >>>>>>> if > >>>>>>>>> there is > >>>>>>>>>>> nobody > interested in > >>>> maintaining it, I > >>>>>>> would > >>>>>>>>> recommend that > >>>>>>>>>>> bioperl-ext be > removed > >> from > >>>> distribution. > >>>>>>>>> It's not in > >>>>>>>>>>> anybody's > interest to > >> have > >>>> unmaintained > >>>>>>> software > >>>>>>>>> out there > >>>>>>>>>>> causing > confusion. > >>>>>>>>>>> > >>>>>>>>>>> I have cc'd > Yee Man > >> Chan for > >>>> this. > >>>>>>> If there > >>>>>>>>> isn't a > >>>>>>>>>>> response or > the > >> message > >>>> bounces, we do one > >>>>>>> of two > >>>>>>>>> things: > >>>>>>>>>>> 1) consider > it > >> deprecated > >>>> (probably > >>>>>>> safest). > >>>>>>>>>>> 2) spin it out > into a > >> separate > >>>> module. > >>>>>>>>>>> > >>>>>>>>>>> Just tried to > comile > >> it myself > >>>> and am > >>>>>>> getting > >>>>>>>>> errors (using > >>>>>>>>>>> 64bit perl > 5.10), so I > >> think, > >>>> unless > >>>>>>> someone wants > >>>>>>>>> to take > >>>>>>>>>>> this on, > option #1 is > >> best. > >>>>>>>>>>> > >>>>>>>>>>>> So Jonny, > in > >> short, I > >>>> would say "do > >>>>>>> not use > >>>>>>>>>>> bioperl-ext". > >>>>>>>>>>> > >>>>>>>>>>> In general, > that's a > >> safe > >>>> bet.? We're > >>>>>>> moving > >>>>>>>>> most of > >>>>>>>>>>> our C/C++ > bindings to > >> BioLib. > >>>>>>>>>>> > >>>>>>>>>>>> Step > back. > >> What are > >>>> you trying > >>>>>>> to > >>>>>>>>>>> accomplish? > >> Chris > >>>> already > >>>>>>> recommended some > >>>>>>>>> alternative > >>>>>>>>>>> methods in his > email > >> of 8/11 > >>>> on this > >>>>>>>>> subject.? > Perhaps > >>>>>>>>>>> we can guide > you to > >> some > >>>> software that is > >>>>>>>>> actively > >>>>>>>>>>> maintained and > will > >> meet your > >>>> needs. > >>>>>>>>>>>> Rob > >>>>>>>>>>> Exactly.? > Lots of > >> other > >>>> (better > >>>>>>> supported!) > >>>>>>>>> options > >>>>>>>>>>> out there. > >> HMMER, SeqAn, > >>>> and > >>>>>>> others. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>> Do You Yahoo!? > >>>>>>>> Tired of spam?? > Yahoo! Mail > >> has the > >>>> best spam > >>>>>>> protection around > >>>>>>>> http://mail.yahoo.com > >>>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> --Robert Buels > >>>>> Bioinformatics Analyst, Sol Genomics > Network > >>>>> Boyce Thompson Institute for Plant > Research > >>>>> Tower Rd > >>>>> Ithaca, NY? 14853 > >>>>> Tel: 503-889-8539 > >>>>> rmb32 at cornell.edu > >>>>> http://www.sgn.cornell.edu > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Sun Aug 16 23:34:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 20:34:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> Message-ID: <474354.59886.qm@web30408.mail.mud.yahoo.com> Hi Chris Good to hear that it is working and thanks for testing. As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. So please pass around the updated code and test it extensively, if no one complains about the new code by the time of release, I would think it should go into the next bioperl-ext release. If people uncover new errors with the new code and the errors can't be fixed on time, then it should be spinned off. What do you think? Best Regards, Yee Man --- On Sun, 8/16/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Sunday, August 16, 2009, 5:53 AM > That worked!? Thanks Yee Man! > > chris > > ps - let me know how you want to deal with a release. > > On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > > > Hi Chris > > > >???Thanks for your suggestions. I think > it is indeed better to check? > > sum to 1.0 using sprintf. I fixed this in the newly > committed HMM.pm > > > >???I also fixed codes that will lead to > warnings with use warnings. > > > >???So now the only problem left is that > "monotonic increasing" error.? > > For that part of the code, I was trying to perform an > expectation? > > maximization step. Theoretically, the expectation > should? > > monotonically increase in every step. But I suppose > this is not? > > necessarily true when double precision floating point > numbers are? > > involved. I don't know why I used a 1e-100 tolerance > for this.? > > Therefore I "fixed" it by using the same tolerance to > terminate the? > > maximization step (ie .000001). I suppose this "fix" > will make it? > > much more unlikely to throw exception with your 5.10.0 > perl. > > > >???Can you give that a try again and see > if it works now. > > > > Thank you > > Yee Man > > > > > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on? > >> WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> > > >> Date: Saturday, August 15, 2009, 10:38 PM > >> Yee, > >> > >> I took the liberty of making a few simple changes > to > >> Bio::Tools::HMM in svn to point out the problem > and possible > >> solutions.? Feel free to revert these as > needed. > >> > >> I'm seeing two errors, which appear randomly when > running > >> 'make test'.? The first is easily fixable, > the second, > >> I'm not so sure.? I'll let you make the > decisions on > >> both. > >> > >> 1)? There is an assumption in the module > that, when > >> adding floating points, you will always get > 1.0.? You > >> may run into problems: see 'perldoc -q long > decimals'. > >> Lines like this (two places in the module): > >>???... > >>???if ($sum != 1.0) { > >>? ? ? $self->throw("Sum of > >> probabilities for each state must be 1.0; got > $sum\n"); > >>???} > >>???... > >> > >> won't work as expected (note I added a simple > diagnostic, > >> just print out the 'bad' sum).? With perl > 5.8.8, this > >> appears to work fine, but this is what I get with > perl 5.10 > >> (64-bit): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> Initial Probability Array: > >> 0.499978? ? 0.500022 > >> Transition Probability Matrix: > >> 0.499978? ? 0.500022 > >> 0.499978? ? 0.500022 > >> Emission Probability Matrix: > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> > >> Log Probability of sequence 1: -521.808 > >> Log Probability of sequence 2: -426.057 > >> > >> Statistical Training > >> ==================== > >> Initial Probability Array: > >> 1? ? 0 > >> Transition Probability Matrix: > >> > >> ------------- EXCEPTION ------------- > >> MSG: Sum of probabilities for each from-state must > be 1.0; > >> got 0.999999999999999976 > >> > >> STACK Bio::Tools::HMM::transition_prob > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > >> STACK toplevel test.pl:82 > >> ------------------------------------- > >> > >> make: *** [test_dynamic] Error 255 > >> > >> I'm assuming this needs to simply be rounded up > to > >> 1.0.? That could be accomplished with > something like > >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' > >> > >> 2) The second error is a little stranger.? I > have been > >> randomly getting this: > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> When I add strict and warnings pragmas to > Bio::Tools::HMM > >> (with a little additional cleanup to get things > running), I > >> get an additional warning (arrow): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Argument "FL" isn't numeric in numeric lt (<) > at > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > >> 188. <---- > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> So something is not being converted as expected. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > >> > >>> When are you going to release 1.6? Maybe let > me work > >> on it before it releases. If it doesn't resolve > the problem, > >> then we can think about other alternatives. > >>> > >>> Also, please show me the latest errors you > have for > >> 5.10.0. > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Yee Man Chan" > >>>> Cc: "Robert Buels" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 7:05 PM > >>>> I'm still seeing the same errors on > >>>> Mac OS X for 64-bit perl 5.10.0.? Mac > OS X, > >> native perl > >>>> (v5.8.8) passes fine now (as well as perl > 5.8.8 > >> on > >>>> dev.open-bio.org). > >>>> > >>>> I'm wondering if this is a problem with my > local > >> perl > >>>> build.? I'm very tempted to push the > >> HMM-related code > >>>> into a separate distribution (bioperl-hmm) > and > >> make a CPAN > >>>> release out of it so it gets wider testing > via > >> CPAN testers; > >>>> it would just require a minimum bioperl > 1.6 > >> installation for > >>>> Bio::Tools::HMM and any related modules. > >> Yee, would > >>>> that be okay with you? > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan > wrote: > >>>> > >>>>> > >>>>> I just committed HMM.xs and typemap to > SVN. > >> Can you > >>>> test it to confirm it works in 64-bit > machines? > >>>>> > >>>>> Thanks > >>>>> Yee Man > >>>>> > >>>>> --- On Sat, 8/15/09, Chris Fields > > >>>> wrote: > >>>>> > >>>>>> From: Chris Fields > >>>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext > >>>> package on WinVista? > >>>>>> To: "Robert Buels" > >>>>>> Cc: "Yee Man Chan" , > >>>> "BioPerl List" > >>>>>> Date: Saturday, August 15, 2009, > 12:11 PM > >>>>>> I'm not sure, but it makes more > sense > >>>>>> to commit these changes directly. > >> Yee, need > >>>> us to set > >>>>>> you up with a commit bit?? If > so, > >> fill out > >>>> the > >>>>>> information on this page: > >>>>>> > >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>>>> > >>>>>> and forward it to support at open-bio.org. > >>>>>> I'll sponsor you. > >>>>>> > >>>>>> chris > >>>>>> > >>>>>> On Aug 15, 2009, at 11:44 AM, > Robert Buels > >> wrote: > >>>>>> > >>>>>>> The usual procedure for > developing > >> code is to > >>>> exchange > >>>>>> code via commits to a version > control > >>>> system.? Yee, do > >>>>>> you know how to use Subversion? > Does Yee > >> need a > >>>> commit bit? > >>>>>>> > >>>>>>> Rob > >>>>>>> > >>>>>>> Yee Man Chan wrote: > >>>>>>>> Hi Chris > >>>>>>>>? ? ? I > find > >> that there is a > >>>> memory > >>>>>> access bug in my code. Attached is > the > >> fixed > >>>> HMM.xs. This > >>>>>> file together with the simpler > typemap > >> should fix > >>>> all > >>>>>> problems. (I hope..) > >>>>>>>>? ? ? Please > let > >> me know if it > >>>> works > >>>>>> for you. > >>>>>>>> Sorry for the bug... > >>>>>>>> Yee Man > >>>>>>>> --- On Fri, 8/14/09, Chris > Fields > >> > >>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems > >>>> with > >>>>>> Bioperl-ext package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>> "Jonny Dalzell" , > >>>>>> "BioPerl List" > >>>>>>>>> Date: Friday, August > 14, 2009, > >> 8:31 > >>>> AM > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I tested this out > locally > >> (perl 5.8.8 > >>>> 32-bit, > >>>>>> perl 5.10.0 > >>>>>>>>> 64-bit) and on > >> dev.open-bio.org (which > >>>> is perl > >>>>>> 5.8.8, > >>>>>>>>> appears to be > 32-bit). > >> The patch > >>>> results > >>>>>> in cleaning > >>>>>>>>> up warnings for 5.10.0 > but > >> results in > >>>> similar > >>>>>> warnings for > >>>>>>>>> 5.8.8 (linux or OS > X). > >>>>>>>>> > >>>>>>>>> On OS X perl 5.8.8, > this > >> sometimes > >>>> passes > >>>>>> (note the first > >>>>>>>>> attempt fails, the > second > >> succeeds), > >>>> so it's > >>>>>> not entirely a > >>>>>>>>> 32-bit issue: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167860 > >>>>>>>>> > >>>>>>>>> OS X and perl 5.10.0, > this > >> always > >>>> fails as the > >>>>>> previous > >>>>>>>>> gist shows, but > demonstrates > >> similar > >>>> behavior > >>>>>> (multiple > >>>>>>>>> attempts to test get > >> different > >>>> responses): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> On linux, everything > passes > >> with or > >>>> w/o the > >>>>>> patched files > >>>>>>>>> (patched files have > warnings > >> as > >>>> indicated > >>>>>> above): > >>>>>>>>> > >>>>>>>>> Specs for all three > perl > >> executables > >>>> (they > >>>>>> vary a bit): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167883 > >>>>>>>>> > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 14, 2009, at > 3:27 AM, > >> Yee Man > >>>> Chan > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Ah.. I find that > the > >> typemap can > >>>> become as > >>>>>> simple as > >>>>>>>>> this > >>>>>>>>>> > ===================== > >>>>>>>>>> TYPEMAP > >>>>>>>>>> HMM * > >> T_PTROBJ > >>>>>>>>>> > ===================== > >>>>>>>>>> > >>>>>>>>>> Then the generated > HMM.c > >> will have > >>>> a > >>>>>> function called > >>>>>>>>> INT2PTR to do the > pointer > >> conversion. > >>>> I > >>>>>> believe this should > >>>>>>>>> solve the warnings. > >>>>>>>>>> Attached are the > updated > >> HMM.xs > >>>> and > >>>>>> typemap. Can > >>>>>>>>> someone with a 64-bit > machine > >> give it > >>>> a try? > >>>>>>>>>> Thank you > >>>>>>>>>> Yee Man > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Yee Man > Chan" > >> > >>>>>>>>>>> Cc: "Robert > Buels" > >> , > >>>>>>>>> "Jonny Dalzell" , > >>>>>>>>> "BioPerl List" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 5:31 > >>>>>> PM > >>>>>>>>>>> (just to point > out to > >>>> everyone, Yee > >>>>>>>>>>> Man's contact > >> information was > >>>> in the > >>>>>> POD) > >>>>>>>>>>> > >>>>>>>>>>> Yee Man, > >>>>>>>>>>> > >>>>>>>>>>> I have the > output in > >> the below > >>>> link: > >>>>>>>>>>> > >>>>>>>>>>> http://gist.github.com/167542 > >>>>>>>>>>> > >>>>>>>>>>> There are > similar > >> problems > >>>> popping up > >>>>>> on 32- and > >>>>>>>>> 64-bit > >>>>>>>>>>> perl 5.10.0, > Mac OS X > >> 10.5. > >>>>>> Haven't had time > >>>>>>>>> to debug > >>>>>>>>>>> it > unfortunately. > >>>>>>>>>>> > >>>>>>>>>>> I think we > should > >> seriously > >>>> consider > >>>>>> spinning this > >>>>>>>>> code off > >>>>>>>>>>> into it's own > >> distribution > >>>> for > >>>>>> CPAN.? It's > >>>>>>>>>>> unfortunately > >> bit-rotting away > >>>> in > >>>>>>>>> bioperl-ext.? If > you > >>>>>>>>>>> want to > continue > >> supporting it > >>>> I can > >>>>>> help set that > >>>>>>>>> up. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 6:58 PM, > >>>> Yee Man > >>>>>> Chan wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? So is > >>>> this > >>>>>> an HMM only > >>>>>>>>> problem? Or does > >>>>>>>>>>> it apply to > other > >> bioperl-ext > >>>>>> modules? > >>>>>>>>>>>> > >>? ? What > >>>>>> exactly are the > >>>>>>>>> compilation errors > >>>>>>>>>>> for HMM? I > believe my > >>>> implementation > >>>>>> is just a > >>>>>>>>> simple one > >>>>>>>>>>> based on > Rabiner's > >> paper. > >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > > >>>>>>>>>>>> > ~murphyk%2FBayes > >>>>>>>>>>>> > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > > >>>>>>>>>>>> > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? I > >>>> don't > >>>>>> think I did > >>>>>>>>> anything fancy that > >>>>>>>>>>> makes it > machine > >> dependent or > >>>> non-ANSI > >>>>>> C. > >>>>>>>>>>>> Yee Man > >>>>>>>>>>>> > >>>>>>>>>>>> --- On > Thu, > >> 8/13/09, Chris > >>>> Fields > >>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> From: > Chris > >> Fields > >>>> > >>>>>>>>>>>>> > Subject: Re: > >>>> [Bioperl-l] > >>>>>> Problems with > >>>>>>>>> Bioperl-ext > >>>>>>>>>>> package on > WinVista? > >>>>>>>>>>>>> To: > "Robert > >> Buels" > >>>> > >>>>>>>>>>>>> Cc: > "Jonny > >> Dalzell" > >>>> , > >>>>>>>>>>> "BioPerl List" > , > >>>>>>>>>>> "Yee Man Chan" > > >>>>>>>>>>>>> Date: > >> Thursday, August > >>>> 13, > >>>>>> 2009, 3:18 PM > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Aug > 13, > >> 2009, at > >>>> 4:37 PM, > >>>>>> Robert Buels > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > Jonny > >> Dalzell > >>>> wrote: > >>>>>>>>>>>>>>> > Is it > >>>> ridiculous of me > >>>>>> to expect > >>>>>>>>> ubuntu to > >>>>>>>>>>> take > >>>>>>>>>>>>> care > of this > >> for > >>>> me?? How > >>>>>> do > >>>>>>>>>>>>>>> > I go > >> about > >>>> compiling > >>>>>> the HMM? > >>>>>>>>>>>>>> > Yes. > >> This is > >>>> a very > >>>>>> specialized > >>>>>>>>> thing > >>>>>>>>>>> that > >>>>>>>>>>>>> you're > doing, > >> and > >>>> Ubuntu does > >>>>>> not have > >>>>>>>>> the > >>>>>>>>>>> resources to > >>>>>>>>>>>>> > package every > >> single > >>>> thing. > >>>>>>>>>>>>>> > >> Unfortunately, it > >>>> looks > >>>>>> like > >>>>>>>>> bioperl-ext > >>>>>>>>>>> package is > >>>>>>>>>>>>> not > >> installable under > >>>> Ubuntu > >>>>>> 9.04 anyway, > >>>>>>>>> which is > >>>>>>>>>>> what I'm > >>>>>>>>>>>>> > running. > >> For > >>>> others on > >>>>>> this list, > >>>>>>>>> if > >>>>>>>>>>> somebody is > >>>>>>>>>>>>> > interested in > >> doing > >>>>>> maintaining it, I'd be > >>>>>>>>> happy > >>>>>>>>>>> to help out > >>>>>>>>>>>>> by > testing on > >>>> Debian-based > >>>>>> Linux > >>>>>>>>> platforms. > >>>>>>>>>>> We need to > >>>>>>>>>>>>> > clarify this > >>>> package's > >>>>>> maintenance status: > >>>>>>>>> if > >>>>>>>>>>> there is > >>>>>>>>>>>>> > nobody > >> interested in > >>>>>> maintaining it, I > >>>>>>>>> would > >>>>>>>>>>> recommend > that > >>>>>>>>>>>>> > bioperl-ext be > >> removed > >>>> from > >>>>>> distribution. > >>>>>>>>>>> It's not in > >>>>>>>>>>>>> > anybody's > >> interest to > >>>> have > >>>>>> unmaintained > >>>>>>>>> software > >>>>>>>>>>> out there > >>>>>>>>>>>>> > causing > >> confusion. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have > cc'd > >> Yee Man > >>>> Chan for > >>>>>> this. > >>>>>>>>> If there > >>>>>>>>>>> isn't a > >>>>>>>>>>>>> > response or > >> the > >>>> message > >>>>>> bounces, we do one > >>>>>>>>> of two > >>>>>>>>>>> things: > >>>>>>>>>>>>> 1) > consider > >> it > >>>> deprecated > >>>>>> (probably > >>>>>>>>> safest). > >>>>>>>>>>>>> 2) > spin it out > >> into a > >>>> separate > >>>>>> module. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Just > tried to > >> comile > >>>> it myself > >>>>>> and am > >>>>>>>>> getting > >>>>>>>>>>> errors (using > >>>>>>>>>>>>> 64bit > perl > >> 5.10), so I > >>>> think, > >>>>>> unless > >>>>>>>>> someone wants > >>>>>>>>>>> to take > >>>>>>>>>>>>> this > on, > >> option #1 is > >>>> best. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> So > Jonny, > >> in > >>>> short, I > >>>>>> would say "do > >>>>>>>>> not use > >>>>>>>>>>>>> > bioperl-ext". > >>>>>>>>>>>>> > >>>>>>>>>>>>> In > general, > >> that's a > >>>> safe > >>>>>> bet.? We're > >>>>>>>>> moving > >>>>>>>>>>> most of > >>>>>>>>>>>>> our > C/C++ > >> bindings to > >>>> BioLib. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > Step > >> back. > >>>> What are > >>>>>> you trying > >>>>>>>>> to > >>>>>>>>>>>>> > accomplish? > >>>> Chris > >>>>>> already > >>>>>>>>> recommended some > >>>>>>>>>>> alternative > >>>>>>>>>>>>> > methods in his > >> email > >>>> of 8/11 > >>>>>> on this > >>>>>>>>>>> subject. > >> Perhaps > >>>>>>>>>>>>> we can > guide > >> you to > >>>> some > >>>>>> software that is > >>>>>>>>>>> actively > >>>>>>>>>>>>> > maintained and > >> will > >>>> meet your > >>>>>> needs. > >>>>>>>>>>>>>> > Rob > >>>>>>>>>>>>> > Exactly. > >> Lots of > >>>> other > >>>>>> (better > >>>>>>>>> supported!) > >>>>>>>>>>> options > >>>>>>>>>>>>> out > there. > >>>> HMMER, SeqAn, > >>>>>> and > >>>>>>>>> others. > >>>>>>>>>>>>> chris > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>>>> Do You Yahoo!? > >>>>>>>>>> Tired of spam? > >> Yahoo! Mail > >>>> has the > >>>>>> best spam > >>>>>>>>> protection around > >>>>>>>>>> http://mail.yahoo.com > >>>>>>>>> > >>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>>>> Bioperl-l mailing > list > >>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> --Robert Buels > >>>>>>> Bioinformatics Analyst, Sol > Genomics > >> Network > >>>>>>> Boyce Thompson Institute for > Plant > >> Research > >>>>>>> Tower Rd > >>>>>>> Ithaca, NY? 14853 > >>>>>>> Tel: 503-889-8539 > >>>>>>> rmb32 at cornell.edu > >>>>>>> http://www.sgn.cornell.edu > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Mon Aug 17 18:19:27 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:19:27 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <419432.62970.qm@web30403.mail.mud.yahoo.com> I believe this warnings should have been fixed with the latest Bio/Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/HMM.pm? I noticed that there are two pairs of "use strict" and "use warnings" in this version. :P Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:22 PM > Still seeing that odd warning popping > up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt > (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > Have you tried using Yee Man's original Makefile.PL to see > if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into > a new distro at Bio-Tools-HMM in the repo.? The tests > are not passing, I think that some bugs need to be fixed in > the logic of things. > > > > Yee Man, could you have a look?? To download the > newly repackaged code: > > > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > > > > perl Build.PL; ./Build test > > > > Please check that things are compiling OK, check the > test logic, upgrade the tests to use Test::More, and get the > tests to the point where they are passing. > > > > At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > > > > Rob > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Mon Aug 17 18:28:50 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:28:50 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Message-ID: <360578.66990.qm@web30403.mail.mud.yahoo.com> I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:28 PM > Take that back.? Yes the 'FL' > warning is still there, but no tests are run b/c (simply > put) there are no regression tests (no use of Test or > Test::More).? If you run './Build test --verbose' you > can see the run, but no test output.? That should be > easy to fix, though. > > chris > > On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > > > Still seeing that odd warning popping up: > > > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test > --verbose > > t/001_basics.t .. Argument "FL" isn't numeric in > numeric lt (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > > > Have you tried using Yee Man's original Makefile.PL to > see if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > > > chris > > > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off > into a new distro at Bio-Tools-HMM in the repo.? The > tests are not passing, I think that some bugs need to be > fixed in the logic of things. > >> > >> Yee Man, could you have a look?? To download > the newly repackaged code: > >> > >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > >> > >> perl Build.PL; ./Build test > >> > >> Please check that things are compiling OK, check > the test logic, upgrade the tests to use Test::More, and get > the tests to the point where they are passing. > >> > >> At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > >> > >> Rob > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Mon Aug 17 20:24:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 17:24:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89E0F0.8010307@cornell.edu> Message-ID: <62126.74727.qm@web30401.mail.mud.yahoo.com> I get it now. So it is now spinned off. Anyway, I updated the HMM.pm in Bio-Tools-HMM with the latest version. I think it should work. Yee Man --- On Mon, 8/17/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Monday, August 17, 2009, 4:00 PM > Yee Man Chan wrote: > > I noticed that Bio/Tools/HMM.pm was removed from the > trunk. So I added it back in. I think you shouldn't get the > warnings with this version. > > Please read my email above with instructions for checkout > out the new Bio-Tools-HMM component, where Bio::Tools::HMM > has been moved.? Please do not add the Bio::Tools::HMM > module back into bioperl-live. > > I think you might be confused about the functions of 'svn > add', 'svn commit', etc, because I don't see any actual > addition of the module in the commit logs.? Please read > through the SVN manual at http://svnbook.red-bean.com/ if you need > clarification. > > Rob > > From whs at eaglegenomics.com Tue Aug 18 05:14:48 2009 From: whs at eaglegenomics.com (Will Spooner) Date: Tue, 18 Aug 2009 10:14:48 +0100 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers In-Reply-To: References: Message-ID: Hi Robert, Speaking for Ensembl, the GeneTree display code is deeply embedded in the API and web code, and refactoring as a standalone package would be exceedingly difficult. Jalview (http://www.jalview.org) may be a good alternative, albeit a Java one. There is code available for driving Jalview from the Ensembl database, and something similar for BioPerl seems reasonable. Will On 17 Aug 2009, at 18:14, Robert Bradbury wrote: > One of the questions facing people working in bioinformatics is "How > do we > present information so that it can be effectively interpreted by > non-informatics specialists?" > > Now, my expertise lies in computer science (esp. O.S. & databases) > and as a > second vocation the biology of aging (DNA damage & repair, to a lesser > extent cancer and pathologies of aging, etc.). Now by my estimate > there are > perhaps 5 people in the world who are able to effectively discuss > computer > science X aging (gerontology) [3]. There are perhaps several dozen > people > where those areas, esp aging, may overlap with DNA damage & repair. > But > then there is a wider audience of perhaps a few hundred members of > AGE, and > maybe a thousand or so who are members of the scientific subgroup of > GSA. > But most of those individuals are "old school" scientists who know > relatively little about bioinformatics. So one has barriers to > presenting > bioinformatics information in ways that they can use usefully. > > I have found in my limited experience that homology graphs of > conserved > protein domains, such as those displayed in HomloGene or those in > Ensembl > (including phylogeny graphs) can be quite useful in reaching > interesting > conclusions. For example, double strand break repair processes > which may > involve 8-10 relatively conserved proteins, may have a critical role > in the > mechanisms of aging. In particular two of those proteins, WRN & > DCLRE1C > (Artemis) contain complementary exonuclease activities which chew up > the DNA > in order to prepare the strands for ligation. Of course, > programmers may > appreciate better than gerontologists the significance of deleting > random > bytes from instruction sequences in ones code. At the recent AGE > meeting in > June several discussions arose as to possible differences in "aging" > in > yeast, *C. elegans* and mammals. [1]. A quick database search > showed that *C. > elegans* seems to be lacking the exonuclease domain on the WRN > homologue and > may be missing a DCLRE1C homologue entirely (which if true would > lead to > conclusions that aging in *C. elegans* may be fundamentally > different from > aging in vertebrates). Explaining this to researchers can best be > done > using pictures. > > I've been through PubMed and have several papers (NAR / BMC > Bioinformatics) > regarding programs to do homology comparisons and phylogeny trees. > However > these seem to lean towards producing less condensed bioinformatics-ish > information. I do not know however whether the outputs from > databases like > PubMed HomoloGene or Ensembl have been packaged in tools that might > be part > of BioPerl. I am interested in programs that can be run on a > regular basis > to draw "pretty pictures" that can be used for publication and/or > internet > browsing. In particular I'm interested in running such programs on > species > of interest to various gerontological communities [2] which involves > subsets > of databases which seem to be scattered around the world. > > Thanks. > > 1. Of course there has been lots of discussion and rationalization > over the > last 15+ years about how "aging" is largely the same in more complex > and > simpler organisms -- in part to justify sequencing some organisms > and in > part to justify funding research at certain laboratories. A closer > examination based on some of the complete and emerging genome > sequences may > suggest this is a very swampy discussion. > 2. For example, nematode DNA repair gene comparisons would be > interesting to > nematode researchers, insect DNA repair gene comparisons to insect > researchers, both to invertebrate researchers, etc. > 3. The recently published textbooks *Aging of the Genome* by Jan > Vijg and > the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg > *et al*, > go a long way towards moving these areas from the stacks of research > libraries into areas for more general discussion. Both volumes deal > extensively with the ~150 DNA repair genes. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- William Spooner whs at eaglegenomics.com http://www.eaglegenomics.com From cjfields at illinois.edu Tue Aug 18 10:35:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 09:35:49 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Message-ID: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> I think I already answered this: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 chris On Aug 14, 2009, at 2:02 PM, David Quan wrote: > Hello, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Recommendations for elements > that would be capable of behaving in such a way would be greatly > appreciated. Thanks very much. > > David N. Quan > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 18 10:42:09 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 Aug 2009 16:42:09 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> Message-ID: <628aabb70908180742o4bf93d21tab0b90c328323efa@mail.gmail.com> On Tue, Aug 18, 2009 at 02:36, Kevin Brown wrote: > The obfuscator does help, but even it is a little sparse on data for > modules. Especially information on the realities of the returned data > from a method call. Yep, sorry about that, Kevin. I'm way overdue in devoting a little attention to cleaning up those Deobfuscator bugs and -- just maybe -- putting a prettier face on it. Hoping to find some time in the near future for that. Dave From cjfields at illinois.edu Tue Aug 18 11:04:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:04:40 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: <20090818110102.GA27010@seinfeld> References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > Putting it in the bioperl list, makes more sense here, > > On Wed, 12 Aug 2009, Chris Fields wrote: > >> (BTW, this is re: the reimplementation of major chunks of BioPerl >> using >> Moose, Biome: http://github.com/cjfields/biome/tree/) >> >> Locations should use a Role (specifically, Biome::Role::Range), so >> start/end/strand should be attributes, not methods. With >> attributes the >> best way to do this is probably with a builder, and lazily (start >> requires end, and vice versa). Factor out the common code as Tomas >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- >> >throw() >> exception handling; it simply catches any exceptions and passes >> them to >> the metaclass exception handling. >> >> I've been thinking about making the Range role abstract for this very >> reason (or defining very basic attributes); something like: >> >> ---------------------------- >> >> package Bio::Role::Range; >> >> requires qw(_build_start _build_end _build_strand); >> >> # also require other methods which need to be defined in >> implementation >> >> has 'start' => ( >> isa => 'Int', >> is => 'rw', >> builder => '_build_start', >> lazy => 1 >> ); >> >> # same for end, strand (except strand has a different isa via >> MooseX::Types) >> .... >> >> package Bio::Location::Foo; >> >> with 'Bio::Role::Range'; >> >> sub _build_start { >> # for location-specific start >> } >> >> sub _build_end { >> # for location-specific end >> } >> >> sub _build_strand { >> # for location-specific strand >> } >> >> sub _common_build_method { >> # factor out common code here, call from other builders >> } >> >> ---------------------------- > > This plan makes things much clearer. Currently the > BioMe::Role::Location has a 'requires' keyword and rest of the > location modules consume that role to have its own implementation. At > this point on BioMe::Location::Atomic has attribute based 'start' and > 'end' implememtation. I got a bit confused because in current bioperl > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > i am trying to follow that path in BioMe it has to override that > method. > So, my question is do all the location modules really needs to > inherits > from each other. I am totally aware about the origianl design ideas > but > it would be better to have a flatten hierarchy if possible. Flattening with roles is always a good idea, yes. I wouldn't worry as much about the way it was originally implemented as the general API (and ways in which we can simplify it). > One more thing, what about putting the 'start', 'end' and the other > common base attributes in BioMe::Role::Location instead of > BioMe::Role::Range. I am not sure which would be correct from bioperl > stand of view, just throwing out an idea. That's a possibility. To me Locations are just Ranges with different behavior (hence the below comment...) >> Also, I think the Coordinate-related stuff should be simplified >> down to a >> trait or an attribute; they bring in way too much overhead in >> bioperl w/o >> much added value. > > You mean instead of having 'builder' method, having a specialized > traits handling those. That sounds like even better. > > -siddhartha Yes, that's essentially it. Location behavior could be changed by having CoordinatePolicy as a trait. Similarly, fuzziness for start/ end could also be thought of as a trait. In essence, you could probably role most behavior into attribute traits (which, in Moose, are just roles that are composed into the attribute meta class, Moose::Meta::Attribute). I had started up a Biome::Meta::Attribute class in case we were to go down this path, then we could start registering specific traits within that namespace. Just to note, it might be easier to try the simplest approach first and get tests passing, then layer in traits to see how they act performance-wise. My guess is they will speed things up, but you never know. Locations will be a performance bottleneck as they are used in generic Features. chris From cjfields at illinois.edu Tue Aug 18 11:10:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:10:08 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <62126.74727.qm@web30401.mail.mud.yahoo.com> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: Yee Man, Robert, All tests are passing; there was a small change in the expected floating point, but no warning now. Re: passing this on to CPAN, I think it needs a distinct version from BioPerl (something that should probably happen with any spinoffs). I foresee two options (and a possible conflict): 1) Use the same versioning scheme, starting with 1.6.1. 2) Use a simpler scheme a'la Bio::Graphics, which I suggest. Tripartite versions are a PITA, we'll only need to keep that in core. Conflict: Bio::Tools::HMM is currently part of the 1.6 branch (in 1.6.0). If this stays in 1.6.1 then we have two versions of the module floating out there. I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 is out. After that, I could then add Yee Man as PAUSE co- maintainer for those modules (which means Yee Man needs to sign up for a PAUSE account). Any objections to that? chris On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > I get it now. So it is now spinned off. Anyway, I updated the HMM.pm > in Bio-Tools-HMM with the latest version. I think it should work. > > Yee Man > > --- On Mon, 8/17/09, Robert Buels wrote: > >> From: Robert Buels >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Chris Fields" , "BioPerl List" > > >> Date: Monday, August 17, 2009, 4:00 PM >> Yee Man Chan wrote: >>> I noticed that Bio/Tools/HMM.pm was removed from the >> trunk. So I added it back in. I think you shouldn't get the >> warnings with this version. >> >> Please read my email above with instructions for checkout >> out the new Bio-Tools-HMM component, where Bio::Tools::HMM >> has been moved. Please do not add the Bio::Tools::HMM >> module back into bioperl-live. >> >> I think you might be confused about the functions of 'svn >> add', 'svn commit', etc, because I don't see any actual >> addition of the module in the commit logs. Please read >> through the SVN manual at http://svnbook.red-bean.com/ if you need >> clarification. >> >> Rob >> >> > > > From hlapp at gmx.net Tue Aug 18 11:46:55 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 11:46:55 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A89EADD.9050509@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> Message-ID: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > I can see how this might be a good idea, or it might be overkill. > Anybody have thoughts on having feature _sources_ strongly typed > with ontology terms? It's how BioSQL and Chado would store it anyway. I'm not sure whether GFF3 requires it, possibly not. But when you make everything else ontology-typed, why exempt one property that also stands to benefit from more predictable values? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Tue Aug 18 11:49:32 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 08:49:32 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: <4A8ACD8C.1060908@cornell.edu> Chris Fields wrote: > I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I > could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 > is out. After that, I could then add Yee Man as PAUSE co-maintainer for > those modules (which means Yee Man needs to sign up for a PAUSE > account). Any objections to that? Sounds like a good plan to me, if Yee Man agreed with it. He would be the primary CPAN maintainer of the package. Maybe he should actually be the first uploader too? Then, it would show up under his PAUSE account at the outset, and he would get better attribution and visibility. Rob From cjfields at illinois.edu Tue Aug 18 12:34:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 11:34:00 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 10:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. Might be worth bringing up with Lincoln to get his thoughts. > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar What I'm thinking as well. You can always implement it that way, and if we deem it too heavy-weight then revert back. Or have it evaluated lazily and get the benefits of both. That's the magic of doing this on a branch, it gives you much more latitude to try things out. chris From cain.cshl at gmail.com Tue Aug 18 14:28:05 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 18 Aug 2009 14:28:05 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: Hi Hilmar and all, Actually, Chado stores sources as a dbxref for the feature (where the db.name is "GFF_source") and the source can be any string, which is what the GFF3 spec indicates. I think the source was intended to be free text to allow the creator maximum flexibility when making the GFF; it also allows lots of flexibility when defining what features go into a particular track in GBrowse: you can have lots of gene features in your GFF, but you can segregate them according to what their source attributes are. Additionally, some applications (SynBrowse comes to mind) overload the source value and require them to conform to a certain syntax. So, what I'm trying to say is, source should probably just stay a simple string. Scott On Aug 18, 2009, at 11:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. > > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From marcelo011982 at gmail.com Tue Aug 18 14:34:17 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Tue, 18 Aug 2009 15:34:17 -0300 Subject: [Bioperl-l] Genbank code from Blast results Message-ID: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> hi all.. I was doing a script that take some information of the results of blastn files. Everythig was ok, but i have some dificult to pic the Genbank code number (the 'gb' below). I tried $obj->each_accession_number $hit->name And some variation of this. ------------------------------ >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h segment 1 gmrtDrNS01 Glycine max cDNA 3', mRNA sequence /clone_end=3' /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 Length = 853 Score = 1336 bits (674), Expect = 0.0 Identities = 793/832 (95%), Gaps = 8/832 (0%) Strand = Plus / Minus Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt 294917 |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc 794 ---------------------------------------- But, i still don't get it. thank you with regards Miwata From hlapp at gmx.net Tue Aug 18 16:01:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 16:01:18 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > Additionally, some applications (SynBrowse comes to mind) overload > the source value and require them to conform to a certain syntax. > > So, what I'm trying to say is, source should probably just stay a > simple string. I would rephrase that to source should probably retain the possibility of using made-up strings. You mention one example yourself, and there have been others in a recent thread on BioSQL [1], for why the option to have predictable, structured values with attached semantics could be very useful. -hilmar [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 18 17:46:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 16:46:25 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8ACD8C.1060908@cornell.edu> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> <4A8ACD8C.1060908@cornell.edu> Message-ID: On Aug 18, 2009, at 10:49 AM, Robert Buels wrote: > Chris Fields wrote: >> I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, >> and I could attempt to push the initial Bio-Ext-HMM release after >> core 1.6.1 is out. After that, I could then add Yee Man as PAUSE >> co-maintainer for those modules (which means Yee Man needs to sign >> up for a PAUSE account). Any objections to that? > > > Sounds like a good plan to me, if Yee Man agreed with it. He would > be the primary CPAN maintainer of the package. Maybe he should > actually be the first uploader too? Then, it would show up under > his PAUSE account at the outset, and he would get better attribution > and visibility. > > Rob At the moment BIOPERLML is the primary maintainer. It's an 'umbrella' account for the bioperl group; a few others exist for stuff like DBI, Catalyst, etc I think. Anyone who's designated a co-maintainer can release code onto CPAN. Several of us can assign new co-maintainer status for modules, so the code doesn't get locked up if someone decides to abandon it. We simply designate another co-maintainer if someone decides to take it over. In fact, that's half the reason I would like to get the ext code out there again; either designate it as abandonware or set it up so that it can be reimplemented by someone with the tuits (maybe using biolib, for instance). We have recently moved Bio::Graphics over to LDS as the primary, though, so this is all a point up for debate. chris From rmb32 at cornell.edu Tue Aug 18 17:56:19 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 14:56:19 -0700 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> Message-ID: <4A8B2383.1030207@cornell.edu> Steven, Could you CC Heath Bair on this? He's the YAPC::NA 2010 coordinator that started this thread. Rob Steven Lembark wrote: > On Fri, 26 Jun 2009 14:06:06 -0700 > Robert Buels wrote: > >> This is a really giant opportunity to expose some of the best >> technologists in the world to what we do in bioinformatics, and possibly >> to entice some of them to help us the heck out! ;-) > > OK, so I'm a few months behind on my email... > > One suggestion: Have them add a BioPerl track to the > conference in advance of getting any submissions for > it. The gent I spoke to in Pittsburgh seemed open to > the idea of a Bioinformatcs/BioPerl track in 2010. > > Opening things up a bit to include Bioinformatics > even beyond BioPerl would give people who are > marginally interested a chance to see what the > whole area is about (e.g., adapting the W-Curve > for use with Perl or how we analyzed Clostridia > using Perl for the bookkeeping). > > In the meantime you might want to see how many > people would be willing to give talks in the > track -- even recycled ones -- before the conference > submission period begins. And, yes, I'd volunteer to > give 1-2 talks. > > enjoi > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jncline at gmail.com Tue Aug 18 23:06:19 2009 From: jncline at gmail.com (Jonathan Cline) Date: Tue, 18 Aug 2009 22:06:19 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: <4A8B6C2B.9030101@gmail.com> Chris Fields wrote: > > Your modules may or may not need the Bio* namespace (that's up to you, > actually); there are several non-bioperl modules that also share the > Bio* namespace, and I believe there are modules that aren't Bio* that > use BioPerl (Gbrowse comes to mind). If you're focusing on > interaction with robotics, Robotics::Bio::X might be a better > namespace for instance (b/c you could expand later into other possibly > non-bio robotics interfaces). Based on your & other opinions I have received, I am creating: Robotics.pm (high level hardware abstraction layer) Robotics::Tecan Robotics::Tecan::Genesis I'll post a release note when it's reached an interesting level of maturity (estimate a couple weeks from now) so anyone with the hardware can play with the package. It's currently working great, and I am adding functionality on a daily basis. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> >> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>>> Sent: Thursday, 30 July 2009 2:07 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Cc: Jonathan Cline >>>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>>> >>>> I am writing a module for communication with biology robotics, as >>>> discussed recently on #bioperl, and I invite your comments. >>>> >>>> >>>> On Namespace: >>>> >>>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >>>> s/w modules already called 'robots' (web spider robots, chat bots, www >>>> automate, etc) so I chose the longer name "robotics" to differentiate >>>> this module as manipulating real hardware. Bio::Robotics is the >>>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>>> manufacturer-specific implementation. Robot control is made more >>>> complex due to the very configurable nature of the work table >>>> (placement >>>> of equipment, type of equipment, type of attached arm, etc). The >>>> abstraction has to be careful not to generalize or assume too >>>> much. In >>>> some cases, the Bio::Robotics modules may expand to arbitrary >>>> equipment >>>> such as thermocyclers, tray holders, imagers, etc - that could be a >>>> future roadmap plan. From rmb32 at cornell.edu Wed Aug 19 00:13:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:13:53 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <829996.94283.qm@web30404.mail.mud.yahoo.com> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A8B7C01.5060502@cornell.edu> Yee Man Chan wrote: > I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. The long-term development plan is to factor *ALL* of Bioperl into individual distributions similar to Bio-Tools-HMM. It is actually much easier to maintain and release code in this "broken up" way. This means that the Bio-Ext package is going to go away, so it doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core devs, do you agree with this? > I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Oh good, the next step would just be to coordinate when to do the release in concert with Bioperl 1.6.1, right? Rob From rmb32 at cornell.edu Wed Aug 19 00:37:49 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:37:49 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <190221.61009.qm@web30408.mail.mud.yahoo.com> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A8B819D.9070309@cornell.edu> Yee Man Chan wrote: > Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. Yes, we are probably going to make a Task::BioPerl or something similar. > What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. By going away, I meant that everything in there is going to be spinned off. Except modules that are no longer maintainable, if there are any in there. Rob From deequan at gmail.com Wed Aug 19 00:39:35 2009 From: deequan at gmail.com (deequan) Date: Tue, 18 Aug 2009 21:39:35 -0700 (PDT) Subject: [Bioperl-l] bioperl capability In-Reply-To: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> Message-ID: <25037707.post@talk.nabble.com> Howdy there, Yes, quite right. I apologize for the double posting. Moreover, I appreciate your assistance in trying to sort out what can and cannot be done with bioperl. To address the problem previously stated, I put together a remarkably misbehaving script that has the following parts: #Some parsing: $q_start = $hsp->query->start; $q_end = $hsp->query->end; $h_start = $hsp->hit->start; $h_end = $hsp->hit->end; $length = $hsp->query->seqlength(); $id = $hit->accession; print OUT "$id\t"; my $seq; if($h_start<$h_end){ #the bit per your recommendation my $begin = $h_start-$q_start+1; my $cease = ($length - $q_end) + $h_end; my $strand = 1; my $factory = Bio::DB::GenBank->new(-format=> 'genbank', -seq_start =>$begin, -seq_stop =>$cease, -strand => $strand, #1 = plus, 2 = minus ); $seq = $factory->get_Seq_by_acc($id); }else{#else assume backward, code not shown} #and some stuff to retrieve the sequence my $len = $seq->length(); my $string = $seq->subseq(1, $len); print OUT "length = $len\t"; print OUT "seq = $string\n"; In your previous reply, you said the code accessing the seq object created by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO for basic IO purposes. Not seeing exactly how to go about that, I tried some other functions in combination that seemed as though they should work (length() and subseq()). Unfortunately, the program does not even run to that point, as the script throws an exception: ------------- EXCEPTION ------------- MSG: acc CP000948 does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 2 STACK toplevel test.pl:36 ------------------------------------- Oddly, the record corresponding to this accession number can be found here: http://www.ncbi.nlm.nih.gov/nuccore/169887498 Perhaps you'd be willing to offer another hint. Thank you for your assistance thus far. And on behalf of all posters, thank you for sharing your knowledge. 'Preciate. David Q. Chris Fields-5 wrote: > > I think I already answered this: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 > > chris > > -- View this message in context: http://www.nabble.com/bioperl-capability-tp25024929p25037707.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Wed Aug 19 01:28:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:29 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> <4A8B819D.9070309@cornell.edu> Message-ID: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> On Aug 18, 2009, at 11:37 PM, Robert Buels wrote: > Yee Man Chan wrote: >> Is it going to be an arrangement similar to bioconductor? If so, I >> suppose then it makes sense. But you might want to develop scripts >> to automatically download and install new modules to make it user >> friendly. > Yes, we are probably going to make a Task::BioPerl or something > similar. > >> What do you mean by Bio-Ext is going away? I notice quite many >> people using dpAlign. So if Bio-Ext is going away, then at least >> dpAlign should become another spin off. > By going away, I meant that everything in there is going to be > spinned off. Except modules that are no longer maintainable, if > there are any in there. > > Rob dpAlign could become another spinoff, yes, if it's used (and works fine). The problematic code dealt with pSW, alignment statistics, and staden io_lib support (the latter which is fairly bit rotted now): http://bugzilla.open-bio.org/show_bug.cgi?id=2668 http://bugzilla.open-bio.org/show_bug.cgi?id=1857 http://bugzilla.open-bio.org/show_bug.cgi?id=2069 http://bugzilla.open-bio.org/show_bug.cgi?id=2074 http://bugzilla.open-bio.org/show_bug.cgi?id=2329 dpAlign has it's own bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2384 chris From cjfields at illinois.edu Wed Aug 19 01:28:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:39 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> <4A8B7C01.5060502@cornell.edu> Message-ID: <1DA73AAB-EC4F-4F44-BBF2-CFF7B3E4A0BE@illinois.edu> On Aug 18, 2009, at 11:13 PM, Robert Buels wrote: > Yee Man Chan wrote: >> I think it is better to keep Bio-Tools-HMM within the Bio-Ext >> package and then spin this whole Bio-Ext package out to CPAN. I am >> ok with Robert's arrangement to move the related pm files under Bio/ >> Tools/ to the new Bio-Ext package. > > The long-term development plan is to factor *ALL* of Bioperl into > individual distributions similar to Bio-Tools-HMM. It is actually > much easier to maintain and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so it > doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core > devs, do you agree with this? In general, though there will be a limit as to how small we can split these off. For instance, Bio::Tree/TreeIO will be messy to split up and makes sense to keep together. Others could be more easily split off. YMMV. >> I have a PAUSE already due to my other CPAN contributions. So there >> is no need to create a new one. My PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to do the > release in concert with Bioperl 1.6.1, right? > > Rob Yes. That should be easy enough to do; basically Bio::Tools::HMM will be removed from 1.6.1, then core will be released along with Bio::Ext::HMM (or Bio::Tools::HMM, either way it would double as the distribution name). chris From cjfields at illinois.edu Wed Aug 19 01:28:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:48 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A8B6C2B.9030101@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> <4A8B6C2B.9030101@gmail.com> Message-ID: <2F5111BE-A1F3-437F-AC6C-4AC3BE05E9EB@illinois.edu> On Aug 18, 2009, at 10:06 PM, Jonathan Cline wrote: > Chris Fields wrote: >> >> Your modules may or may not need the Bio* namespace (that's up to >> you, >> actually); there are several non-bioperl modules that also share the >> Bio* namespace, and I believe there are modules that aren't Bio* that >> use BioPerl (Gbrowse comes to mind). If you're focusing on >> interaction with robotics, Robotics::Bio::X might be a better >> namespace for instance (b/c you could expand later into other >> possibly >> non-bio robotics interfaces). > > Based on your & other opinions I have received, I am creating: > > Robotics.pm (high level hardware abstraction layer) > Robotics::Tecan > Robotics::Tecan::Genesis > > > I'll post a release note when it's reached an interesting level of > maturity (estimate a couple weeks from now) so anyone with the > hardware > can play with the package. It's currently working great, and I am > adding functionality on a daily basis. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## That's great to hear! Keep us updated, I'm sure there are a few potential users lurking about here. chris From scott at scottcain.net Wed Aug 19 09:15:12 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 19 Aug 2009 09:15:12 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Hilmar, The examples in that thread ought to go in the ninth column; using the Dbxref tag for references back to GenBank for example. The provenience stuff should go in the ninth column as well, though I don't know exactly how would be best. Scott On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > > On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > >> Additionally, some applications (SynBrowse comes to mind) overload >> the source value and require them to conform to a certain syntax. >> >> So, what I'm trying to say is, source should probably just stay a >> simple string. > > > I would rephrase that to source should probably retain the > possibility of using made-up strings. > > You mention one example yourself, and there have been others in a > recent thread on BioSQL [1], for why the option to have predictable, > structured values with attached semantics could be very useful. > > -hilmar > > [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From saikari78 at gmail.com Wed Aug 19 09:30:07 2009 From: saikari78 at gmail.com (saikari keitele) Date: Wed, 19 Aug 2009 14:30:07 +0100 Subject: [Bioperl-l] Pipeline for generating phylogenetic trees from list of species names Message-ID: Hi, Does anyone know of a simple pipeline for generating a phylogenetic tree from a list of species with bioperl? I've had a look at http://www.bioperl.org/wiki/HOWTO:PhylogeneticAnalysisPipeline#Distance_Distance_in_PHYLIP_.2B_NJ_Tree_in_PHYLIPbut it isn't explicit for the crucial steps (at least given my level of knowledge) For each species, should I extract the longest sequence available for every protein and align it with the same protein sequences of the other species in the list? Would anyone have an example pipeline of the different steps to perform? Thank you very much. Saikari From ymc at yahoo.com Tue Aug 18 22:50:57 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 19:50:57 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <829996.94283.qm@web30404.mail.mud.yahoo.com> I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. There aren't that many modules in Bio-Ext. Plus, based on Chris and Robert's comments, modules other than my dpAlign and HMM appear to be abandoned. Moving HMM out only makes users less likely to try it out. If need be, I can also be a co-maintainer of this spinned off Bio-Ext package. I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Tuesday, August 18, 2009, 8:10 AM > Yee Man, Robert, > > All tests are passing; there was a small change in the > expected floating point, but no warning now. > > Re: passing this on to CPAN, I think it needs a distinct > version from BioPerl (something that should probably happen > with any spinoffs).? I foresee two options (and a > possible conflict): > > 1) Use the same versioning scheme, starting with 1.6.1. > 2) Use a simpler scheme a'la Bio::Graphics, which I > suggest.? Tripartite versions are a PITA, we'll only > need to keep that in core. > > Conflict: Bio::Tools::HMM is currently part of the 1.6 > branch (in 1.6.0).? If this stays in 1.6.1 then we have > two versions of the module floating out there. > > I think we should go ahead and remove Bio::Tools::HMM from > 1.6.1, and I could attempt to push the initial Bio-Ext-HMM > release after core 1.6.1 is out.? After that, I could > then add Yee Man as PAUSE co-maintainer for those modules > (which means Yee Man needs to sign up for a PAUSE > account).? Any objections to that? > > chris > > On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > > > I get it now. So it is now spinned off. Anyway, I > updated the HMM.pm in Bio-Tools-HMM with the latest version. > I think it should work. > > > > Yee Man > > > > --- On Mon, 8/17/09, Robert Buels > wrote: > > > >> From: Robert Buels > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Chris Fields" , > "BioPerl List" > >> Date: Monday, August 17, 2009, 4:00 PM > >> Yee Man Chan wrote: > >>> I noticed that Bio/Tools/HMM.pm was removed > from the > >> trunk. So I added it back in. I think you > shouldn't get the > >> warnings with this version. > >> > >> Please read my email above with instructions for > checkout > >> out the new Bio-Tools-HMM component, where > Bio::Tools::HMM > >> has been moved.? Please do not add the > Bio::Tools::HMM > >> module back into bioperl-live. > >> > >> I think you might be confused about the functions > of 'svn > >> add', 'svn commit', etc, because I don't see any > actual > >> addition of the module in the commit logs.? > Please read > >> through the SVN manual at http://svnbook.red-bean.com/ if you need > >> clarification. > >> > >> Rob > >> > >> > > > > > > > > From ymc at yahoo.com Wed Aug 19 00:24:05 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:24:05 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> Message-ID: <190221.61009.qm@web30408.mail.mud.yahoo.com> Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:13 PM > Yee Man Chan wrote: > > I think it is better to keep Bio-Tools-HMM within the > Bio-Ext package and then spin this whole Bio-Ext package out > to CPAN. I am ok with Robert's arrangement to move the > related pm files under Bio/Tools/ to the new Bio-Ext > package. > > The long-term development plan is to factor *ALL* of > Bioperl into individual distributions similar to > Bio-Tools-HMM.? It is actually much easier to maintain > and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so > it doesn't make sense to keep Bio-Tools-HMM in it.? > Chris, other core devs, do you agree with this? > > > I have a PAUSE already due to my other CPAN > contributions. So there is no need to create a new one. My > PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to > do the release in concert with Bioperl 1.6.1, right? > > Rob > > From ymc at yahoo.com Wed Aug 19 00:49:18 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:49:18 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> Message-ID: <184595.94226.qm@web30407.mail.mud.yahoo.com> Good. That makes sense then. Please update me when all is set. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:37 PM > Yee Man Chan wrote: > > Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > What do you mean by Bio-Ext is going away? I notice > quite many people using dpAlign. So if Bio-Ext is going > away, then at least dpAlign should become another spin off. > By going away, I meant that everything in there is going to > be spinned off.? Except modules that are no longer > maintainable, if there are any in there. > > Rob > > From ymc at yahoo.com Wed Aug 19 05:01:39 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 02:01:39 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> Message-ID: <884845.92813.qm@web30408.mail.mud.yahoo.com> I tried that sample script that reportedly caused the dpAlign "bug" but I can't reproduced it. All I get is a warning from LocatableSeq. ------------------------------------------- [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "-Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl --------------------- WARNING --------------------- MSG: In sequence ABC|9944760 residue count gives end value 101. Overriding value [104] with value 101 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 300 Getting score for ABC|9986984 -> ABC|9944760 = 303 ------------------------------------------ Does the test script crash in your machine? Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Tuesday, August 18, 2009, 10:28 PM > On Aug 18, 2009, at 11:37 PM, Robert > Buels wrote: > > > Yee Man Chan wrote: > >> Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > >> What do you mean by Bio-Ext is going away? I > notice quite many people using dpAlign. So if Bio-Ext is > going away, then at least dpAlign should become another spin > off. > > By going away, I meant that everything in there is > going to be spinned off.? Except modules that are no > longer maintainable, if there are any in there. > > > > Rob > > dpAlign could become another spinoff, yes, if it's used > (and works fine).? The problematic code dealt with pSW, > alignment statistics, and staden io_lib support (the latter > which is fairly bit rotted now): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > dpAlign has it's own bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > chris > From cjfields at illinois.edu Wed Aug 19 10:49:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 09:49:15 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <884845.92813.qm@web30408.mail.mud.yahoo.com> References: <884845.92813.qm@web30408.mail.mud.yahoo.com> Message-ID: I'll have a look. It's probably something that hasn't been updated to deal with LocatableSeq's pathological end point checking. chris On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > I tried that sample script that reportedly caused the dpAlign "bug" > but I can't reproduced it. All I get is a warning from LocatableSeq. > ------------------------------------------- > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "- > Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9944760 residue count gives end value 101. > Overriding value [104] with value 101 for Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT > -GGG-CCGGCCC-AA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 300 > Getting score for ABC|9986984 -> ABC|9944760 > = 303 > ------------------------------------------ > > Does the test script crash in your machine? > > Yee Man > > --- On Tue, 8/18/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] >> Problems with Bioperl-ext package on WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM >> On Aug 18, 2009, at 11:37 PM, Robert >> Buels wrote: >> >>> Yee Man Chan wrote: >>>> Is it going to be an arrangement similar to >> bioconductor? If so, I suppose then it makes sense. But you >> might want to develop scripts to automatically download and >> install new modules to make it user friendly. >>> Yes, we are probably going to make a Task::BioPerl or >> something similar. >>> >>>> What do you mean by Bio-Ext is going away? I >> notice quite many people using dpAlign. So if Bio-Ext is >> going away, then at least dpAlign should become another spin >> off. >>> By going away, I meant that everything in there is >> going to be spinned off. Except modules that are no >> longer maintainable, if there are any in there. >>> >>> Rob >> >> dpAlign could become another spinoff, yes, if it's used >> (and works fine). The problematic code dealt with pSW, >> alignment statistics, and staden io_lib support (the latter >> which is fairly bit rotted now): >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 >> >> dpAlign has it's own bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 >> >> chris >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Aug 19 18:19:25 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 19 Aug 2009 18:19:25 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Message-ID: <4907C3F4-C503-4019-BBDA-153ED777276C@gmx.net> Putting it into the 9nth column is the equivalent of storing it in the {seqfeature,bioentry}_qualifier_value tables in BioSQL. -hilmar On Aug 19, 2009, at 9:15 AM, Scott Cain wrote: > Hilmar, > > The examples in that thread ought to go in the ninth column; using > the Dbxref tag for references back to GenBank for example. The > provenience stuff should go in the ninth column as well, though I > don't know exactly how would be best. > > Scott > > > > On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > >> >> On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: >> >>> Additionally, some applications (SynBrowse comes to mind) overload >>> the source value and require them to conform to a certain syntax. >>> >>> So, what I'm trying to say is, source should probably just stay a >>> simple string. >> >> >> I would rephrase that to source should probably retain the >> possibility of using made-up strings. >> >> You mention one example yourself, and there have been others in a >> recent thread on BioSQL [1], for why the option to have >> predictable, structured values with attached semantics could be >> very useful. >> >> -hilmar >> >> [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Wed Aug 19 20:55:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 19 Aug 2009 20:55:22 -0400 Subject: [Bioperl-l] Hi In-Reply-To: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> References: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> Message-ID: Please ask on the mailing list for these things, I am not really sure what you mean by subtract all taxonomy -- I suspect you mean extract all IDs, I think you should take a look at the example like http://bioperl.org/wiki/Module:Bio::DB::Taxonomy I think the example is basically what you want to do, except replace the nodeid with 7742 instead of 33090 -jason On Aug 19, 2009, at 2:47 PM, JingtaoLiu(TSU) wrote: > Hi Sir, > > Thank you for reading this. > I am working for BioChem Dept Texastate university. > I encounter a problem. > I need subtract all taxonomy IDs from vertebrates(taxon id is 7742) > how I can get all the leaf node of these? > > I referenced Bio::DB::Taxonomy, > but i have no clue about it. > Very appreciate for your help. > > Jingtao Liu -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From yannick.wurm at unil.ch Wed Aug 19 15:25:11 2009 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Wed, 19 Aug 2009 21:25:11 +0200 Subject: [Bioperl-l] Programmer job in Lausanne Switzerland Message-ID: <1D1F031E-29F1-4AE4-A225-D9B434ACE070@unil.ch> Dear list, my apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. We have a job opening for assembly and annotation of ant genomes in Lausanne Switzerland. http://www.isb-sib.ch/about-sib/jobs/details/91-sib-bioinformatician-at-sib--unil.html http://fourmidable.unil.ch/BioinformaticsEngineerLausanneAnts.pdf Kind regards, Yannick http://yannick.poulet.org From sidd.basu at gmail.com Thu Aug 20 06:03:07 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 20 Aug 2009 05:03:07 -0500 Subject: [Bioperl-l] Re: code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: <20090820100304.GA1884@seinfeld> On Tue, 18 Aug 2009, Chris Fields wrote: > > On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > > > Putting it in the bioperl list, makes more sense here, > > > > On Wed, 12 Aug 2009, Chris Fields wrote: > > > >> (BTW, this is re: the reimplementation of major chunks of BioPerl > >> using > >> Moose, Biome: http://github.com/cjfields/biome/tree/) > >> > >> Locations should use a Role (specifically, Biome::Role::Range), so > >> start/end/strand should be attributes, not methods. With attributes > >> the > >> best way to do this is probably with a builder, and lazily (start > >> requires end, and vice versa). Factor out the common code as Tomas > >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- > >> >throw() > >> exception handling; it simply catches any exceptions and passes them > >> to > >> the metaclass exception handling. > >> > >> I've been thinking about making the Range role abstract for this very > >> reason (or defining very basic attributes); something like: > >> > >> ---------------------------- > >> > >> package Bio::Role::Range; > >> > >> requires qw(_build_start _build_end _build_strand); > >> > >> # also require other methods which need to be defined in > >> implementation > >> > >> has 'start' => ( > >> isa => 'Int', > >> is => 'rw', > >> builder => '_build_start', > >> lazy => 1 > >> ); > >> > >> # same for end, strand (except strand has a different isa via > >> MooseX::Types) > >> .... > >> > >> package Bio::Location::Foo; > >> > >> with 'Bio::Role::Range'; > >> > >> sub _build_start { > >> # for location-specific start > >> } > >> > >> sub _build_end { > >> # for location-specific end > >> } > >> > >> sub _build_strand { > >> # for location-specific strand > >> } > >> > >> sub _common_build_method { > >> # factor out common code here, call from other builders > >> } > >> > >> ---------------------------- > > > > This plan makes things much clearer. Currently the > > BioMe::Role::Location has a 'requires' keyword and rest of the > > location modules consume that role to have its own implementation. At > > this point on BioMe::Location::Atomic has attribute based 'start' and > > 'end' implememtation. I got a bit confused because in current bioperl > > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > > i am trying to follow that path in BioMe it has to override that > > method. > > So, my question is do all the location modules really needs to > > inherits > > from each other. I am totally aware about the origianl design ideas > > but > > it would be better to have a flatten hierarchy if possible. > > Flattening with roles is always a good idea, yes. I wouldn't worry as > much about the way it was originally implemented as the general API (and > ways in which we can simplify it). Thanks for clarifying that. > > > One more thing, what about putting the 'start', 'end' and the other > > common base attributes in BioMe::Role::Location instead of > > BioMe::Role::Range. I am not sure which would be correct from bioperl > > stand of view, just throwing out an idea. > > That's a possibility. To me Locations are just Ranges with different > behavior (hence the below comment...) > > >> Also, I think the Coordinate-related stuff should be simplified down > >> to a > >> trait or an attribute; they bring in way too much overhead in > >> bioperl w/o > >> much added value. > > > > You mean instead of having 'builder' method, having a specialized > > traits handling those. That sounds like even better. > > > > -siddhartha > > Yes, that's essentially it. Location behavior could be changed by > having CoordinatePolicy as a trait. Similarly, fuzziness for start/end > could also be thought of as a trait. In essence, you could probably role > most behavior into attribute traits (which, in Moose, are just roles that > are composed into the attribute meta class, Moose::Meta::Attribute). I > had started up a Biome::Meta::Attribute class in case we were to go down > this path, then we could start registering specific traits within that > namespace. > > Just to note, it might be easier to try the simplest approach first and > get tests passing, then layer in traits to see how they act > performance-wise. My guess is they will speed things up, but you never > know. Locations will be a performance bottleneck as they are used in > generic Features. That's seemed to be a saner approach. Will play around with the builder approach and get the tests passing at least. thanks, -siddhartha > > chris From ymc at yahoo.com Wed Aug 19 23:01:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 20:01:28 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <191324.76414.qm@web30403.mail.mud.yahoo.com> I noticed that the $qalseq is a LocatableSeq with gaps. I don't think my program was written to support LocatableSeq with gaps. If I removed the gaps, then I would have the scores agree with each other which should be the desired outcome. --------------------- WARNING --------------------- MSG: In sequence ABC|9986984 residue count gives end value 104. Overriding value [101] with value 104 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 291 Getting score for ABC|9986984 -> ABC|9944760 = 291 Do you think I should check for this LocatableSeq type and give an error or should I remove the gaps if this is a LocatableSeq? Yee Man --- On Wed, 8/19/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 7:49 AM > I'll have a look.? It's probably > something that hasn't been updated to deal with > LocatableSeq's pathological end point checking. > > chris > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > I tried that sample script that reportedly caused the > dpAlign "bug" but I can't reproduced it. All I get is a > warning from LocatableSeq. > > ------------------------------------------- > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > "-Iblib/lib" "-Iblib/arch" > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > --------------------- WARNING --------------------- > > MSG: In sequence ABC|9944760 residue count gives end > value 101. > > Overriding value [104] with value 101 for > Bio::LocatableSeq::end(). > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > --------------------------------------------------- > > Getting score for ABC|9944760 -> ABC|9986984 > > = 300 > > Getting score for ABC|9986984 -> ABC|9944760 > > = 303 > > ------------------------------------------ > > > > Does the test script crash in your machine? > > > > Yee Man > > > > --- On Tue, 8/18/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was > Re: [Bioperl-l] Problems with Bioperl-ext package on > WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Tuesday, August 18, 2009, 10:28 PM > >> On Aug 18, 2009, at 11:37 PM, Robert > >> Buels wrote: > >> > >>> Yee Man Chan wrote: > >>>> Is it going to be an arrangement similar > to > >> bioconductor? If so, I suppose then it makes > sense. But you > >> might want to develop scripts to automatically > download and > >> install new modules to make it user friendly. > >>> Yes, we are probably going to make a > Task::BioPerl or > >> something similar. > >>> > >>>> What do you mean by Bio-Ext is going away? > I > >> notice quite many people using dpAlign. So if > Bio-Ext is > >> going away, then at least dpAlign should become > another spin > >> off. > >>> By going away, I meant that everything in > there is > >> going to be spinned off.? Except modules that > are no > >> longer maintainable, if there are any in there. > >>> > >>> Rob > >> > >> dpAlign could become another spinoff, yes, if it's > used > >> (and works fine).? The problematic code dealt > with pSW, > >> alignment statistics, and staden io_lib support > (the latter > >> which is fairly bit rotted now): > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > >> > >> dpAlign has it's own bug: > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > >> > >> chris > >> > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at gmail.com Thu Aug 20 04:46:52 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Thu, 20 Aug 2009 10:46:52 +0200 Subject: [Bioperl-l] SCF installation Message-ID: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Hi, I am trying to install SCF (a prerequisite to samtools). I installed libread and the compilation seems to be working, only test is failing: zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::SCF zoppel:Bio-SCF-1.01 bernd$ make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c Running Mkbootstrap for Bio::SCF () chmod 644 SCF.bs rm -f blib/arch/auto/Bio/SCF/SCF.bundle LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ -lread -lz \ chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs chmod 644 blib/arch/auto/Bio/SCF/SCF.bs Manifying blib/man3/Bio::SCF.3pm zoppel:Bio-SCF-1.01 bernd$ make test PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) Failed 18/18 subtests Test Summary Report ------------------- t/scf.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: Bad plan. You planned 18 tests but ran 0. Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 csys = 0.11 CPU) Result: FAIL Failed 1/1 test programs. 0/0 subtests failed. make: *** [test_dynamic] Error 2 Any idea what might be going wrong? Please not that in the directory there are some file empty: ls -ltr -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER -rw-r--r-- 1 bernd staff 532 17 mai 2006 README -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm drwxr-xr-x 3 bernd staff 102 17 mai 2006 t drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . Thanks, Bernd From cain.cshl at gmail.com Thu Aug 20 10:30:33 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 20 Aug 2009 10:30:33 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: Hi Bernd, Bio::SCF isn't technically part of BioPerl, but I have installed it before so I'll take a shot: do you have the Staden io-lib installed? It is a prereq for Bio::SCF. If you did install it, is it in a normal library path, and did you run ldconfig (if appropriate for your system) after installing it? io-lib can be obtained here: http://staden.sourceforge.net/ If you do have all of those things in place, what version of io-lib are you using? I wonder if there is an incompatibility between Bio::SCF and your version. The INSTALL doc for Bio::SCF indicates that you should have version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may have broken an api call that Bio::SCF depends on. Scott On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only > test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - > typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv > SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN > SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 > cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From dan.bolser at gmail.com Thu Aug 20 11:00:41 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 20 Aug 2009 16:00:41 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? Message-ID: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Hi, Quick version: How do I get a column of Bio::SimpleAlign using ungapped 'reference' sequence coordinates? Longer version: I have a set of pairwise alignments that I would like to process into a 'multiple sequence alignment' (MSA). All the alignments are short sequence 'contigs' aligned to a 'reference' sequence, so one sequence in all the pairwise alignments is constant (making the resulting MSA unambiguous). I came up with the following pseudo-code to create a MSA (Bio::SimpleAlign) from the set of pairwise alignments... initialise: Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. for each pairwise alignment: Create a Bio::LocatableSeq from the given fragment of the REFERENCE sequence (using ungapped REFERENCE coordinates). for each gap in the REFERENCE sequence: Take the position of the gap (in ungapped REFERENCE coordinates) and look up the corresponding column of the MSA (in ungapped REFERENCE coordinates). for each sequence in the column: Check if there is a gap-character at this position. if any sequence has a non gap-character at this position: Stick a gap in the MSA just before this position. Create a Bio::LocatableSeq from the CONTIG sequence (using ungapped REFERENCE coordinates) and add it to the Bio::SimpleAlign. done. I would very much appreciate, 1) feedback on the correctness of the above algorithm (it could be horribly wrong), and 2) advice on how to get a column of the alignment using ungapped REFERENCE coordinates? Sorry if this is a solved problem (where is it solved?). If not, and if I can get it working, I'll try to write a generic function to merge two MSAs when they have a reference sequence in common. For your reference, the pairwise alignments come from the show-aligns command in the MUMmer sequence alignment package, and have the following format: my.reference.fasta my.contigs.multi.fasta ============================================================ -- Alignments between REFERENCE and CONTIG00012 -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat ^ ^ ^ ^ 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa ^^ ^ ^ ^ 29335 aagacccctacataaatatatggtcccattaatattataaattaataat 97 aagacccctatataaatatatggtctcattaatattataaattaataat ^ ^ ... For further reference: This thread: http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html http://www.bioperl.org/wiki/Align_Refactor http://www.bioperl.org/wiki/Alignment_object All the best, Dan. From lincoln.stein at gmail.com Thu Aug 20 12:07:16 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 20 Aug 2009 12:07:16 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From j_martin at lbl.gov Thu Aug 20 12:41:16 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 20 Aug 2009 09:41:16 -0700 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <20090820164115.GA10681@eniac.jgi-psf.org> Hello, Bio::SCF isn't a pre-requisite of samtools or Bio::Samtools, and neither is actually related to Bioperl. samtools has a pretty active mailing list at sourceforge, you might try asking there. http://sourceforge.net/mailarchive/forum.php?forum_name=samtools-help I use samtools all the time w/o either of those modules. Joel On Thu, Aug 20, 2009 at 10:46:52AM +0200, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Aug 20 12:42:23 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 20 Aug 2009 17:42:23 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? In-Reply-To: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> References: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Message-ID: <4A8D7CEF.4080002@gmail.com> Hi Dan, I think you want the Bio::LocatableSeq method "column_from_residue_number". You might also try combining your pairwise alignments using the profile alignment option in ClustalW. Cheers. Roy. Dan Bolser wrote: > Hi, > > Quick version: How do I get a column of Bio::SimpleAlign using > ungapped 'reference' sequence coordinates? > > > > Longer version: > > I have a set of pairwise alignments that I would like to process into > a 'multiple sequence alignment' (MSA). All the alignments are short > sequence 'contigs' aligned to a 'reference' sequence, so one sequence > in all the pairwise alignments is constant (making the resulting MSA > unambiguous). > > I came up with the following pseudo-code to create a MSA > (Bio::SimpleAlign) from the set of pairwise alignments... > > initialise: > Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. > > for each pairwise alignment: > Create a Bio::LocatableSeq from the given fragment of the > REFERENCE sequence (using ungapped REFERENCE coordinates). > > for each gap in the REFERENCE sequence: > Take the position of the gap (in ungapped REFERENCE > coordinates) and look up the corresponding column of the MSA > (in ungapped REFERENCE coordinates). > > for each sequence in the column: > Check if there is a gap-character at this position. > > if any sequence has a non gap-character at this position: > Stick a gap in the MSA just before this position. > > Create a Bio::LocatableSeq from the CONTIG sequence (using > ungapped REFERENCE coordinates) and add it to the > Bio::SimpleAlign. > > done. > > > I would very much appreciate, 1) feedback on the correctness of the > above algorithm (it could be horribly wrong), and 2) advice on how to > get a column of the alignment using ungapped REFERENCE coordinates? > > > Sorry if this is a solved problem (where is it solved?). If not, and > if I can get it working, I'll try to write a generic function to merge > two MSAs when they have a reference sequence in common. > > > For your reference, the pairwise alignments come from the show-aligns > command in the MUMmer sequence alignment package, and have the > following format: > > my.reference.fasta my.contigs.multi.fasta > > ============================================================ > -- Alignments between REFERENCE and CONTIG00012 > > -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] > > > 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat > 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat > ^ ^ ^ ^ > > 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta > 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa > ^^ ^ ^ ^ > > 29335 aagacccctacataaatatatggtcccattaatattataaattaataat > 97 aagacccctatataaatatatggtctcattaatattataaattaataat > ^ ^ > > ... > > > For further reference: > > This thread: > http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html > > http://www.bioperl.org/wiki/Align_Refactor > > http://www.bioperl.org/wiki/Alignment_object > > > > All the best, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Thu Aug 20 16:31:20 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 20 Aug 2009 16:31:20 -0400 Subject: [Bioperl-l] genbank to fasta conversion Message-ID: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Hello, I have previously converted multiple genbank files to fasta. For some reason I am having trouble with this simple script. #!/usr/bin/perl -w use strict; use Bio::SeqIO; open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); my $in = Bio::SeqIO->new('-file' => "$inFile" , '-format' => 'GenBank'); my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); print $out $_ while <$in>; I keep getting the error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open GLOB(0x36a214): No such file or directory STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ genbank.pm:202 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 ----------------------------------------------------------- I am probably missing something simple, but would appreciate any help. M From cjfields at illinois.edu Thu Aug 20 16:38:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 Aug 2009 15:38:03 -0500 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <7868B105-53AD-4C87-8B21-2E4D4A7781B5@illinois.edu> You are passing filehandles in, not file names. Switch the '-file' parameter to '-fh'. chris On Aug 20, 2009, at 3:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 20 16:43:06 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 20 Aug 2009 13:43:06 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <4A8DB55A.6060605@cornell.edu> The error is that you are opening a filehandle called $outfile, and then you are stringifying it (resulting in a string containing "GLOB(..)", and telling Bio::SeqIO write to a file named "GLOB(...)", which it can't open. You probably want to use the -fh arguments for your two uses of Bio::SeqIO, either that, or remove your open() calls and pass the filenames to the SeqIO objects directly, like: my $in = Bio::SeqIO->new ('-file' => "C:/Documents and Settings/mydir/Desktop/TARGETING.gb", '-format' => 'GenBank', ); my $out = Bio::SeqIO->new ('-file' => ">C:/Documents and Settings/mydir/Desktop/TARGET.fa", '-format' => 'fasta', ); Rob Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From sharpton at berkeley.edu Thu Aug 20 16:40:34 2009 From: sharpton at berkeley.edu (Thomas Sharpton) Date: Thu, 20 Aug 2009 13:40:34 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: This is a problem I think I can solve, so I'm chiming in for once. Looks to me like you're trying to pass a file handle to the -file setting in your SeqIO object. One of the excellent things about using SeqIO is that you don't need to worry about file handles; it's all taken care of under the hood. Try the following adaptation of your script: #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $inFile = "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"; my $outfile = "C:/Documents and Settings/mydir/Desktop/TARGET.fa"; #OPEN A SEQUENCE FILE OF INTEREST ($inFile) AND CREATE A SEQUENCE STREAM ($in) my $in = Bio::SeqIO->new(-file => "$inFile" , '-format' => 'GenBank'); #OPEN AN OUPUT FILE OF INTEREST ($outfile)AND CREATE AN OUTPUT SEQUENCE STREAM ($out) #NOTICE HOW WE SET -file FOR OUTPUT WITH THE > SYMBOL HERE: my $out = Bio::SeqIO->new(-file => ">$outfile" ,'-format' => 'Fasta'); #NOW LET'S DO THE CONVERSION AND DUMP THE OUTPUT #INSTEAD OF DOING THIS #print $out $_ while <$in>; #TRY THIS while(my $seq = $in->next_seq() ){ $out->write_seq($seq) } The above is pretty much what you'll find here: http://www.bioperl.org/wiki/HOWTO:SeqIO which you should definitely look over to better understand what's happening with SeqIO object. Good luck! Tom On Aug 20, 2009, at 1:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 07:34:49 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 13:34:49 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotide database Message-ID: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Hello all I would like to download the wgs sequences of the unfinished genomes from ncbi. (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi here's an example accession NZ_ACVD00000000 and here's the link to the accession at genbank http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 This record contains the accessions that belong to this record in the following line in the genbank output WGS NZ_ACVD01000001-NZ_ACVD01000139 The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession numbers that are are specified by this range. here's a link http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] The bioperl related question is... Since these are unassembled genomes, there are several contigs for each one, and they all available in this record. Is it possible to download a range without trying to recreate each accession number? on the other hand, it is possible to download each individually , this would mean making the following NZ_ACVD01000001 NZ_ACVD01000002 NZ_ACVD01000003 . . . NZ_ACVD01000139 from NZ_ACVD01000001-NZ_ACVD01000139 I can recreate these numbers and download each one separately. However, sometimes I get a timeout exception and the whole thing stops. the code ( copied shamelessly from the bioperl website, works great to get single accessions) my $id = "NZ_ACVD00000000"; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => 'fullcontig.gb'); I did try and catch the exceptions from the get_Response..but its not working as expected... maybe someone can point out what I'm doing wrong here. For some reason, the code never seems to go any print statement in the catch construct... $ele = "somecontig id"; try { print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; $factory->get_Response(-file => "$genbank_file"); } catch Bio::Root::Exception with { my $err = shift; if (! defined $err) { print "MAY HAVE DOWNLOADED $ele..\n"; } else { print "PROBABLE TIMEOUT ERROR\n"; print "$err\n"; } }; Or is it possible to somehow increase the timeout time for the get_Response method? thanks in advance! regards Rohit From bernd.jagla at gmail.com Fri Aug 21 05:30:27 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Fri, 21 Aug 2009 11:30:27 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi, I have installed io_lib-1.9.0. This produces libread.a. I am working on a Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error message. I don't really know how to test that it is working. I am trying to install Bio-SCF-1.01. It seems that the test.scf file cannot be read. Is there another way using some other tools to see if that is working? (Sorry for misrepresenting samtools. I was actually trying to install Bio-Graphics, which was asking for Bio::SCF). Thanks, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln Stein Sent: Thursday, August 20, 2009 6:07 PM To: scott at scottcain.net Cc: bioperl-l at lists.open-bio.org; Bernd Jagla Subject: Re: [Bioperl-l] SCF installation It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Fri Aug 21 09:05:25 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 Aug 2009 09:05:25 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Fri Aug 21 08:50:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 08:50:08 -0400 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Message-ID: <71B4268E5B524F719D24088483568870@NewLife> Hi Rohit- Re: timeout, you could try $factory->ua->timeout($number_greater_than_180_sec) before issuing the request. cheers MAJ ----- Original Message ----- From: "Rohit Ghai" To: Sent: Friday, August 21, 2009 7:34 AM Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase > Hello all > > I would like to download the wgs sequences of the unfinished genomes from > ncbi. > (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi > > here's an example accession > > NZ_ACVD00000000 > > and here's the link to the accession at genbank > > http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 > > This record contains the accessions that belong to this record in the > following line in the genbank output > > WGS NZ_ACVD01000001-NZ_ACVD01000139 > > The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession > numbers that are > > are specified by this range. > > here's a link > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] > > > The bioperl related question is... > > Since these are unassembled genomes, there are several contigs for each one, > and they all available in this record. > > Is it possible to download a range without trying to recreate each accession > number? > > on the other hand, it is possible to download each individually , this would > mean making the following > > NZ_ACVD01000001 > NZ_ACVD01000002 > NZ_ACVD01000003 > . > . > . > NZ_ACVD01000139 > > from NZ_ACVD01000001-NZ_ACVD01000139 > > > I can recreate these numbers and download each one separately. However, > sometimes I get a timeout exception > and the whole thing stops. > > the code ( copied shamelessly from the bioperl website, works great to get > single accessions) > > my $id = "NZ_ACVD00000000"; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => > 'nucleotide', > -id => > $id, > -rettype > => 'gbwithparts'); > > $factory->get_Response(-file => 'fullcontig.gb'); > > > I did try and catch the exceptions from the get_Response..but its not > working as expected... maybe someone can point out what I'm doing wrong > here. For some reason, the code never seems to go any print statement in the > catch construct... > > $ele = "somecontig id"; > > try { > print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; > $factory->get_Response(-file => "$genbank_file"); > > } catch Bio::Root::Exception with { > my $err = shift; > if (! defined $err) { > print "MAY HAVE DOWNLOADED $ele..\n"; > } else { > print "PROBABLE TIMEOUT ERROR\n"; > print "$err\n"; > } > }; > > > Or is it possible to somehow increase the timeout time for the get_Response > method? > > thanks in advance! > > > regards > > Rohit > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at pasteur.fr Fri Aug 21 09:30:38 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Fri, 21 Aug 2009 15:30:38 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina><6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: <0D219C72BC5F432BA5CDBBCFCE94AA02@zillumina> Thanks, I was confused by the error message of Bio::Graphics. Now I tried make, make test and was able to install... Thanks, Let's forget about the rest then since I don't believe I will need that... Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain Sent: Friday, August 21, 2009 3:05 PM To: Bernd Jagla Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] SCF installation Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 09:40:02 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 15:40:02 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <71B4268E5B524F719D24088483568870@NewLife> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> <71B4268E5B524F719D24088483568870@NewLife> Message-ID: <94c73820908210640h3b5854fbxe19c259c66cf9ee4@mail.gmail.com> Thanks! I have made the change... no error yet.. so keeping my fingers crossed cheers Rohit On Fri, Aug 21, 2009 at 2:50 PM, Mark A. Jensen wrote: > Hi Rohit- > Re: timeout, you could try > $factory->ua->timeout($number_greater_than_180_sec) > before issuing the request. > cheers MAJ > ----- Original Message ----- From: "Rohit Ghai" > To: > Sent: Friday, August 21, 2009 7:34 AM > Subject: [Bioperl-l] downloading multiple contigs from ncbi > nucleotidedatabase > > > Hello all >> >> I would like to download the wgs sequences of the unfinished genomes from >> ncbi. >> (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi >> >> here's an example accession >> >> NZ_ACVD00000000 >> >> and here's the link to the accession at genbank >> >> http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 >> >> This record contains the accessions that belong to this record in the >> following line in the genbank output >> >> WGS NZ_ACVD01000001-NZ_ACVD01000139 >> >> The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession >> numbers that are >> >> are specified by this range. >> >> here's a link >> >> >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] >> >> >> The bioperl related question is... >> >> Since these are unassembled genomes, there are several contigs for each >> one, >> and they all available in this record. >> >> Is it possible to download a range without trying to recreate each >> accession >> number? >> >> on the other hand, it is possible to download each individually , this >> would >> mean making the following >> >> NZ_ACVD01000001 >> NZ_ACVD01000002 >> NZ_ACVD01000003 >> . >> . >> . >> NZ_ACVD01000139 >> >> from NZ_ACVD01000001-NZ_ACVD01000139 >> >> >> I can recreate these numbers and download each one separately. However, >> sometimes I get a timeout exception >> and the whole thing stops. >> >> the code ( copied shamelessly from the bioperl website, works great to get >> single accessions) >> >> my $id = "NZ_ACVD00000000"; >> my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', >> -db => >> 'nucleotide', >> -id => >> $id, >> -rettype >> => 'gbwithparts'); >> >> $factory->get_Response(-file => 'fullcontig.gb'); >> >> >> I did try and catch the exceptions from the get_Response..but its not >> working as expected... maybe someone can point out what I'm doing wrong >> here. For some reason, the code never seems to go any print statement in >> the >> catch construct... >> >> $ele = "somecontig id"; >> >> try { >> print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; >> $factory->get_Response(-file => "$genbank_file"); >> >> } catch Bio::Root::Exception with { >> my $err = shift; >> if (! defined $err) { >> print "MAY HAVE DOWNLOADED $ele..\n"; >> } else { >> print "PROBABLE TIMEOUT ERROR\n"; >> print "$err\n"; >> } >> }; >> >> >> Or is it possible to somehow increase the timeout time for the >> get_Response >> method? >> >> thanks in advance! >> >> >> regards >> >> Rohit >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From rmb32 at cornell.edu Fri Aug 21 15:39:31 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 Aug 2009 12:39:31 -0700 Subject: [Bioperl-l] added a perltidy profile file Message-ID: <4A8EF7F3.0@cornell.edu> This one is copied from the parrot project. I added it in maintenance/perltidy.conf. Have a look, tweak as you see fit. The idea with perltidy profile files is to use them to enforce coding style rules. So this perltidy profile file would be the place to codify the BioPerl coding standards, such as indentation, use of cuddled elses, etc. So here is one, let's customize it for our needs. The way I usually run perltidy is with -b to modify a file in-place, and with the '-pro=' option to specify a profile file. Example: perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Aug 21 17:03:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 16:03:07 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <25037707.post@talk.nabble.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> <25037707.post@talk.nabble.com> Message-ID: On Aug 18, 2009, at 11:39 PM, deequan wrote: > > Howdy there, > > Yes, quite right. I apologize for the double posting. > Moreover, I > appreciate your assistance in trying to sort out what can and cannot > be done > with bioperl. To address the problem previously stated, I put > together a > remarkably misbehaving script that has the following parts: > > #Some parsing: > $q_start = $hsp->query->start; > $q_end = $hsp->query->end; > $h_start = $hsp->hit->start; > $h_end = $hsp->hit->end; > $length = $hsp->query->seqlength(); > $id = $hit->accession; > > print OUT "$id\t"; > my $seq; > if($h_start<$h_end){ > > #the bit per your recommendation > my $begin = $h_start-$q_start+1; > my $cease = ($length - $q_end) + $h_end; > my $strand = 1; > my $factory = Bio::DB::GenBank->new(-format=> 'genbank', > -seq_start =>$begin, > -seq_stop =>$cease, > -strand => $strand, #1 = plus, 2 = minus > ); > $seq = $factory->get_Seq_by_acc($id); > }else{#else assume backward, code not shown} > [ > #and some stuff to retrieve the sequence > > my $len = $seq->length(); > my $string = $seq->subseq(1, $len); > print OUT "length = $len\t"; > print OUT "seq = $string\n"; ] Not sure what you are doing with the above sequence. The abve > In your previous reply, you said the code accessing the seq object > created > by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO > for > basic IO purposes. # create an output seq stream somewhere my $out = Bio::SeqIO->new(-file => '>sequences.gb', -format => 'genbank'); .... # take seq object ($seq), write to the stream $out->write_seq($seq); > Not seeing exactly how to go about that, I tried some > other functions in combination that seemed as though they should work > (length() and subseq()). Unfortunately, the program does not even > run to > that point, as the script throws an exception: > > ------------- EXCEPTION ------------- > MSG: acc CP000948 does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 > 2 > STACK toplevel test.pl:36 > ------------------------------------- > > > Oddly, the record corresponding to this accession number can be > found here: > http://www.ncbi.nlm.nih.gov/nuccore/169887498 That's probably something to do with NCBI unfortunately; I'll have to look into it. The best alternative is if you have BLAST reports that include the GI (or UID). That's the most reliable number (using that in coordination with get_Seq_by_id), but it's not on by default, you have to indicate it's inclusion. More recent versions of Bio::SearchIO::blast parse out the GI from the descriptor if it's present. > Perhaps you'd be willing to offer another hint. Thank you for your > assistance thus far. And on behalf of all posters, thank you for > sharing > your knowledge. 'Preciate. > > David Q. No problem. chris From dan.bolser at gmail.com Fri Aug 21 17:55:37 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 21 Aug 2009 22:55:37 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <4A8EF7F3.0@cornell.edu> References: <4A8EF7F3.0@cornell.edu> Message-ID: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Cheers Rob, Whatever objectons may arise from style x or style y, I think it's a great idea to at least have one style or another recognized as being 'standard'. I know TMTOWTDI, but on a project like this, with so many contributors and users, it's essential to at least have a recommendation. I'll try to use this on any contribs. As you pointed out [1], its probably best to provide two patches for any change involving a formating clean up: one to change the fomat to the standard and one to commit the actual code changes. All the best, Dan. [1] irc://irc.freenode.net/#bioperl 2009/8/21 Robert Buels : > This one is copied from the parrot project. ?I added it in > maintenance/perltidy.conf. > Have a look, tweak as you see fit. > > The idea with perltidy profile files is to use them to enforce coding style > rules. ?So this perltidy profile file would be the place to codify the > BioPerl coding standards, such as indentation, use of cuddled elses, etc. > > So here is one, let's customize it for our needs. ?The way I usually run > perltidy is with -b to modify a file in-place, and with the '-pro=' option > to specify a profile file. > > Example: > ? perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY ?14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Aug 21 23:12:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 23:12:55 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <86486D3736614E6A81AF9521B5BB796A@NewLife> Thanks to all (six, seven including Rob and his perltidy) who responded to this thread. (Lurkers, you are not volunteering by responding, honest.) I'm preparing a wiki page (of course) with the major points, some further comments, and an action plan for your consideration. Watch this space. cheers, MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "BioPerl List" Cc: "Chris Fields" Sent: Friday, August 14, 2009 10:32 PM Subject: [Bioperl-l] on BP documentation > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >>I guess that from my perspective, after doing this stuff for >>about 10 years, I personally would prefer to see a "summer of >>documentation" for the bio* languages (or at least bioperl, as that is >>the only one I ever look at). From my own experiences, and from those >>of many colleagues, the documentation for bioperl has gone from >>mediocre to quite poor in the last few years. I largely think the >>wikification of the docs are to blame for this. Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. There is >>far too much inside baseball going on in the documentation scheme. > >>So I worry more about the constant adding of features at the expense >>of documenting what is already there. This is just my 2 cents, and it >>is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I must > agree > that BP docs are rather a rat's nest and of varying quality, but taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal about the > wikification, but I wager that the power of the wiki could be leveraged > to greater advantage (right, Dan?). > > I think that what we all as developers love is to code, and detest is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it all hang > out', > as they said in my mom's era. > > cheers > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 22 00:11:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 23:11:42 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <86486D3736614E6A81AF9521B5BB796A@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <86486D3736614E6A81AF9521B5BB796A@NewLife> Message-ID: <594EBBA3-5043-4DDF-9157-65195747266D@illinois.edu> Mark, One suggestion that i agree with: we need to add API-specific module documentation to the site somehow (not just links to CPAN/PDOC). There are a few ways to do so; a quick way may be to install something like the Mediawiki SecureHTML extension and create a protected template (this would be for pdoc, cpan, or both). Another one is to write up a pod2wiki converter and create API- specific pages, then have a bot automate the pages. A POD extension also exists, but we would still need to embed code. I much prefer the extensions than anything else. chris On Aug 21, 2009, at 10:12 PM, Mark A. Jensen wrote: > Thanks to all (six, seven including Rob and his perltidy) who > responded to this thread. (Lurkers, you are not volunteering > by responding, honest.) I'm preparing a wiki page (of course) > with the major points, some further comments, and an action > plan for your consideration. Watch this space. > cheers, > MAJ > ----- Original Message ----- From: "Mark A. Jensen" > > To: "BioPerl List" > Cc: "Chris Fields" > Sent: Friday, August 14, 2009 10:32 PM > Subject: [Bioperl-l] on BP documentation > > >> Hi All -- >> >> Off-list, an old colleague of mine had this insightful, if damning, >> comment: >> >>> I guess that from my perspective, after doing this stuff for >>> about 10 years, I personally would prefer to see a "summer of >>> documentation" for the bio* languages (or at least bioperl, as >>> that is >>> the only one I ever look at). From my own experiences, and from >>> those >>> of many colleagues, the documentation for bioperl has gone from >>> mediocre to quite poor in the last few years. I largely think the >>> wikification of the docs are to blame for this. Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. There is >>> far too much inside baseball going on in the documentation scheme. >> >>> So I worry more about the constant adding of features at the expense >>> of documenting what is already there. This is just my 2 cents, >>> and it >>> is disappointing to see a downward trend for bioperl in this regard. >> >> I would be really interested in all responses from the list users. >> I must agree >> that BP docs are rather a rat's nest and of varying quality, but >> taken in >> toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount >> of useful and sophisticated information available. I think there are >> approaches we can take to reorganize and standardize the accession >> of it to make it more useful and inviting. I disagree with my pal >> about the >> wikification, but I wager that the power of the wiki could be >> leveraged >> to greater advantage (right, Dan?). >> >> I think that what we all as developers love is to code, and detest >> is to >> document. Since BP is all-volunteer, and volunteers tend to do what >> they like -- the beauty of open source, btw -- documentation reorg >> and cleanup probably must devolve to the Core. I am willing to lead >> such an effort, which will take some time, and more time the fewer >> volunteers there are. First let's hear some thoughts, and 'let it >> all hang out', >> as they said in my mom's era. >> >> cheers >> Mark >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sat Aug 22 10:55:06 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sat, 22 Aug 2009 16:55:06 +0200 Subject: [Bioperl-l] Getting genomic coordinates for a list of SNPs Message-ID: <2ac05d0f0908220755y59b029f2u82eede5b29836a1d@mail.gmail.com> Dear list, I'm searching for a script like this http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates to get the genomic position of a SNP, not a Gene. Does it exist? Thanks a lot Emanuele From cjfields at illinois.edu Sat Aug 22 16:17:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 22 Aug 2009 15:17:46 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> Message-ID: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Anand, You should always post emails to the bioperl-l mailing list, never to individual developers (you'll get an answer much faster). Keep responses on the list as well. Though I use bioperl-db some, I'm probably not the best person to ask. Does anyone know what's going on with this? Does this have to do with the Species/Taxon refactoring? chris Begin forwarded message: > From: "Anand C. Patel" > Date: August 22, 2009 2:57:42 PM CDT > To: cjfields at illinois.edu > Subject: problem with bioperl (where's the Mus?) > > Dr. Fields, > > I'm struggling with what seems to be a strange quirk in Bioperl +/- > Bioperl-db/BioSQL. > > I've successfully loaded in genbank sequences into a biosql database. > > When I try to write a genbank sequence back out, a curious thing > happens -- the Genus is missing from the SOURCE and ORGANISM areas. > > Despite reporting: > primary tag: source > tag: chromosome > value: 3 > > tag: db_xref > value: taxon:10090 > > tag: map > value: 3 74.5 cM > > tag: mol_type > value: mRNA > > tag: organism > value: Mus musculus > The sequence when printed out via SeqIO looks like this: > LOCUS NM_017474 2935 bp dna linear ROD > 13-AUG-2009 > DEFINITION Mus musculus chloride channel calcium activated 3 > (Clca3), mRNA. > ACCESSION NM_017474 XM_978159 > VERSION NM_017474.2 GI:255918210 > KEYWORDS . > SOURCE musculus > ORGANISM musculus > Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; > Bilateria; > Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; > Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; > Tetrapoda; > Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; > Glires; > Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. > Confession -- I have a final project due Monday wherein I boldly > elected to interface Bioperl, MySQL, Perl, and CGI. > (I'm an MD getting my MS in Bioinformatics.) > After many misadventures, I'm getting to the point where I could > actually complete the objectives, but this is bug is rather > problematic. > Thanks, > Anand > Anand C. Patel, MD > Assistant Professor of Pediatrics > Division of Allergy/Pulmonary Medicine > Department of Pediatrics > Washington University School of Medicine > 660 South Euclid Ave, Campus Box 8052 > St. Louis, MO 63110 > acpatel at wustl.edu > acpatel at gmail.com > acpatel at jhu.edu > From hlapp at gmx.net Sat Aug 22 17:36:42 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:36:42 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Message-ID: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> That's a pretty strange bug. Anand, which version of BioPerl and Bioperl-db are you running? Note that the genus *is* actually there in the lineage (and hence does get retrieved from the database). Apparently the Species object fails to pull it out correctly, though? Anand - I suspect there have been some warnings printed to the terminal - can you post these, and otherwise confirm that there haven't been any? -hilmar On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > Anand, > > You should always post emails to the bioperl-l mailing list, never > to individual developers (you'll get an answer much faster). Keep > responses on the list as well. > > Though I use bioperl-db some, I'm probably not the best person to > ask. Does anyone know what's going on with this? Does this have to > do with the Species/Taxon refactoring? > > chris > > Begin forwarded message: > >> From: "Anand C. Patel" >> Date: August 22, 2009 2:57:42 PM CDT >> To: cjfields at illinois.edu >> Subject: problem with bioperl (where's the Mus?) >> >> Dr. Fields, >> >> I'm struggling with what seems to be a strange quirk in Bioperl +/- >> Bioperl-db/BioSQL. >> >> I've successfully loaded in genbank sequences into a biosql database. >> >> When I try to write a genbank sequence back out, a curious thing >> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >> >> Despite reporting: >> primary tag: source >> tag: chromosome >> value: 3 >> >> tag: db_xref >> value: taxon:10090 >> >> tag: map >> value: 3 74.5 cM >> >> tag: mol_type >> value: mRNA >> >> tag: organism >> value: Mus musculus >> The sequence when printed out via SeqIO looks like this: >> LOCUS NM_017474 2935 bp dna linear ROD >> 13-AUG-2009 >> DEFINITION Mus musculus chloride channel calcium activated 3 >> (Clca3), mRNA. >> ACCESSION NM_017474 XM_978159 >> VERSION NM_017474.2 GI:255918210 >> KEYWORDS . >> SOURCE musculus >> ORGANISM musculus >> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >> Bilateria; >> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >> Tetrapoda; >> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >> Glires; >> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >> Confession -- I have a final project due Monday wherein I boldly >> elected to interface Bioperl, MySQL, Perl, and CGI. >> (I'm an MD getting my MS in Bioinformatics.) >> After many misadventures, I'm getting to the point where I could >> actually complete the objectives, but this is bug is rather >> problematic. >> Thanks, >> Anand >> Anand C. Patel, MD >> Assistant Professor of Pediatrics >> Division of Allergy/Pulmonary Medicine >> Department of Pediatrics >> Washington University School of Medicine >> 660 South Euclid Ave, Campus Box 8052 >> St. Louis, MO 63110 >> acpatel at wustl.edu >> acpatel at gmail.com >> acpatel at jhu.edu >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 17:42:32 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:42:32 -0400 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: Consistent coding style is in principle a good thing. It's also worth to keep in mind one of the old BioPerl principles - don't change working code purely to change style. In my interpretation of the rule, however, this has always applied to code writing style, and not code formatting style. I'm assuming the goal here is only to make the formatting consistent. -hilmar On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > Cheers Rob, > > Whatever objectons may arise from style x or style y, I think it's a > great idea to at least have one style or another recognized as being > 'standard'. I know TMTOWTDI, but on a project like this, with so many > contributors and users, it's essential to at least have a > recommendation. I'll try to use this on any contribs. > > As you pointed out [1], its probably best to provide two patches for > any change involving a formating clean up: one to change the fomat to > the standard and one to commit the actual code changes. > > > All the best, > Dan. > > [1] irc://irc.freenode.net/#bioperl > > > 2009/8/21 Robert Buels : >> This one is copied from the parrot project. I added it in >> maintenance/perltidy.conf. >> Have a look, tweak as you see fit. >> >> The idea with perltidy profile files is to use them to enforce >> coding style >> rules. So this perltidy profile file would be the place to codify >> the >> BioPerl coding standards, such as indentation, use of cuddled >> elses, etc. >> >> So here is one, let's customize it for our needs. The way I >> usually run >> perltidy is with -b to modify a file in-place, and with the '-pro=' >> option >> to specify a profile file. >> >> Example: >> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 19:21:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 19:21:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > [...] > I think I know what's broken. Using load_seqdatabases.pl, I'd put a > set of sequences from genbank into a biosql db in mysql. > > I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl > script from biosql. Did you load the NCBI taxonomy first, or afterwards? > > When I searched for house (as in house mouse), I found that the name > of the type of taxon class was "genbank common name". > > When I searched for musculus, it does appear as a type of > "scientific name". It is the 'scientific name' class names that Bioperl-db will onto the lineage array. > [...] > I'm not just getting warnings. I'm getting errors. Tons of them. > It's a wonder it's working at all. I'm not sure what you're referring to, but what you pasted into your email were neither errors nor warnings but a debugging log (and what it prints looks like it's working fine). You triggered that by setting -verbose to a value greater than 0. If you don't want debugging output, then you can just leave off that argument (no debugging output is the default). > > I started with the getentry.cgi script in the cgi-bin folder, and > stripped most of it away. I see - which reminds me that I need to look at that script; I'm afraid it hasn't been updated for a long time (that doesn't mean though that it can't work - the core API has been stable for years). > > Code: > #!/usr/bin/perl > > [...] > if( $@ || !defined $seq) { > print "Got fetch exception of...\n
$@\n
"; > exit(0); > } Wouldn't you want to put that right after the eval() clause? -hilmar > > >> >> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >> >>> Anand, >>> >>> You should always post emails to the bioperl-l mailing list, never >>> to individual developers (you'll get an answer much faster). Keep >>> responses on the list as well. >>> >>> Though I use bioperl-db some, I'm probably not the best person to >>> ask. Does anyone know what's going on with this? Does this have >>> to do with the Species/Taxon refactoring? >>> >>> chris >>> >>> Begin forwarded message: >>> >>>> From: "Anand C. Patel" >>>> Date: August 22, 2009 2:57:42 PM CDT >>>> To: cjfields at illinois.edu >>>> Subject: problem with bioperl (where's the Mus?) >>>> >>>> Dr. Fields, >>>> >>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>> +/- Bioperl-db/BioSQL. >>>> >>>> I've successfully loaded in genbank sequences into a biosql >>>> database. >>>> >>>> When I try to write a genbank sequence back out, a curious thing >>>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>>> >>>> Despite reporting: >>>> primary tag: source >>>> tag: chromosome >>>> value: 3 >>>> >>>> tag: db_xref >>>> value: taxon:10090 >>>> >>>> tag: map >>>> value: 3 74.5 cM >>>> >>>> tag: mol_type >>>> value: mRNA >>>> >>>> tag: organism >>>> value: Mus musculus >>>> The sequence when printed out via SeqIO looks like this: >>>> LOCUS NM_017474 2935 bp dna linear >>>> ROD 13-AUG-2009 >>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>> (Clca3), mRNA. >>>> ACCESSION NM_017474 XM_978159 >>>> VERSION NM_017474.2 GI:255918210 >>>> KEYWORDS . >>>> SOURCE musculus >>>> ORGANISM musculus >>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>> Bilateria; >>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>> Tetrapoda; >>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>> Glires; >>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>> Confession -- I have a final project due Monday wherein I boldly >>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>> (I'm an MD getting my MS in Bioinformatics.) >>>> After many misadventures, I'm getting to the point where I could >>>> actually complete the objectives, but this is bug is rather >>>> problematic. >>>> Thanks, >>>> Anand >>>> Anand C. Patel, MD >>>> Assistant Professor of Pediatrics >>>> Division of Allergy/Pulmonary Medicine >>>> Department of Pediatrics >>>> Washington University School of Medicine >>>> 660 South Euclid Ave, Campus Box 8052 >>>> St. Louis, MO 63110 >>>> acpatel at wustl.edu >>>> acpatel at gmail.com >>>> acpatel at jhu.edu >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 10:38:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:38:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: On Aug 22, 2009, at 9:13 PM, Anand C. Patel wrote: > Turns out that using the default namespace bioperl doesn't change > anything. No it shouldn't, so long as you are consistent about it. (And if you're not, all that should happen is that you don't find your sequences any more.) > > Common name -- still "genbank common name" in name_class in the > taxon_name table for "house mouse", which I think the module is > looking for as "common name". If you are loading the NCBI taxonomy first, this is coming from NCBI, not one of the scripts or BioPerl, and hence we have no control over it. Are you saying that there is no designated name of class 'common name' for Mus musculus in the NCBI taxonomy dump? Also, the common name being present or not should have no bearing on the lineage array, where the actual problem is, so I don't understand right now how this would be connected to the problem you are seeing. > > It's not behaving differently despite reloading the sequences. > > I've created a horrible munge that fixes it for cosmetic purposes: > my $species = $seq->species; > my $justspecies = $species->scientific_name(); > my $binspecies = $species->binomial(); > > my $gbstring2 = $gbstring; > > $gbstring2 =~ s/$binspecies/$justspecies/g; > $gbstring2 =~ s/$justspecies/$binspecies/g; I don't understand what you are trying to achieve here - it seems like you are making a substitution and then reverting it? Also, $species- >scientific_name() and $species->binomial() should be identical for Mus musculus - are you finding different values being returned? So in essence, I wouldn't expect your above code snippet to have any effect, for both of these reasons. How do you find $gbstring2 to be different from $gbstring at the end of this block of code? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 10:42:58 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:42:58 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Message-ID: <119BC08A-6D3A-4D03-B0D5-7619EDE682AE@gmx.net> On Aug 22, 2009, at 8:13 PM, Anand C. Patel wrote: > Do I need to load ontology before loading sequences? You don't. Especially if you load genbank sequences as they come. Loading ontologies that are used for sequence annotation is useful as it will get your features (or sequences) linked to fully populated (description, synonyms, relationships, etc) terms rather than skeleton term records created on the fly. However, in GenBank format ontology terms are part of the feature table, and require a post-processing (using, e.g., a SeqProcessor class) step to be identified and turned into Bio::Annotation::OntologyTerm objects. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jorismeys at gmail.com Sun Aug 23 11:08:47 2009 From: jorismeys at gmail.com (joris meys) Date: Sun, 23 Aug 2009 17:08:47 +0200 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree Message-ID: Hi, I'm currently exploring the phylogenetic parts of Bio Perl, but I can't seem to find a quick solution to following problem : Say you have a tree obtained by a certain method. From this tree, you want to have the evolutionary distances between species, defined as the sum of the branch lengths between any 2 species. There is as far as I know no function for doing that. But is there a possibility to get a list of some sort of "shortest paths" from one species to another, allowing to easily calculate that matrix? >From the phylip package, I get following data if I run the neighbor or fitch program. From there I can easily get an algorithm to calculate the distances I need. But I also need to do that for maximum likelihood trees and the like. Is there a way to get this information in Bio Perl? >From to dist node1 sp1 xxxxx node2 sp3 xxxxxx node1 node2 xxxxx node 1 sp2 xxxxx Kind regards Joris From heikki.lehvaslaiho at gmail.com Mon Aug 24 01:59:22 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 24 Aug 2009 08:59:22 +0300 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: De facto coding style standard for BioPerl has been emacs using cperl mode and bioperl.list file. As long as this configuration does not change the conventions used, I see this as great way in helping to format code from other editors. -Heikki 2009/8/23 Hilmar Lapp : > Consistent coding style is in principle a good thing. > > It's also worth to keep in mind one of the old BioPerl principles - don't > change working code purely to change style. In my interpretation of the > rule, however, this has always applied to code writing style, and not code > formatting style. I'm assuming the goal here is only to make the formatting > consistent. > > ? ? ? ?-hilmar > > On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > >> Cheers Rob, >> >> Whatever objectons may arise from style x or style y, I think it's a >> great idea to at least have one style or another recognized as being >> 'standard'. I know TMTOWTDI, but on a project like this, with so many >> contributors and users, it's essential to at least have a >> recommendation. I'll try to use this on any contribs. >> >> As you pointed out [1], its probably best to provide two patches for >> any change involving a formating clean up: one to change the fomat to >> the standard and one to commit the actual code changes. >> >> >> All the best, >> Dan. >> >> [1] irc://irc.freenode.net/#bioperl >> >> >> 2009/8/21 Robert Buels : >>> >>> This one is copied from the parrot project. ?I added it in >>> maintenance/perltidy.conf. >>> Have a look, tweak as you see fit. >>> >>> The idea with perltidy profile files is to use them to enforce coding >>> style >>> rules. ?So this perltidy profile file would be the place to codify the >>> BioPerl coding standards, such as indentation, use of cuddled elses, etc. >>> >>> So here is one, let's customize it for our needs. ?The way I usually run >>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>> option >>> to specify a profile file. >>> >>> Example: >>> ?perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>> >>> Rob >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY ?14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp ?-:- ?Durham, NC ?-:- ?hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Building #2, Office #4216 Computational Bioscience Research Centre (CBRC) 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From geoeco at rambler.ru Mon Aug 24 05:20:13 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Mon, 24 Aug 2009 13:20:13 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file Message-ID: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Dear all, I am trying to extract species taxonomy from ORGANISM line. In fact I only need a first line under ORGANISM tag (e.i. genus + species). I though that it would be possible to do with the SeqBuilder object by stating $builder->add_wanted_slot('display_id','species'); the problem is, however, that I've got an empty file as a result. What might be wrong with the script (see below)? Thanks a lot in advance for any ideas, ------------------------------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'raw'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species'); while(my $seq = $seq_in->next_seq()) { $seq_out->write_seq($seq); } exit; ---------------------------------------------------- Anna From maj at fortinbras.us Mon Aug 24 07:30:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 07:30:27 -0400 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree In-Reply-To: References: Message-ID: Hi Joris, AFAIK, there is only one path between any two nodes in a typical phylogenetic tree, the one passing through the most recent common ancestor of the nodes. The distance() method in Bio::Tree::TreeFunctionsI will give you what I think you want: use Bio::TreeIO; use Bio::Tree::TreeFunctionsI; $t = Bio::TreeIO->new(-file=>'t/data/urease.tre.nexus', -format=>'nexus')->next_tree; $n1 = $t->find_node('Anidulans'); $n2 = $t->find_node('Ncrassa'); $dist = $t->distance(-nodes => [$n1, $n2] ); print $dist; Use the Bio::TreeIO package to read in the tree in your favorite format; it will handle many. cheers, MAJ ----- Original Message ----- From: "joris meys" To: Sent: Sunday, August 23, 2009 11:08 AM Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree > Hi, > > I'm currently exploring the phylogenetic parts of Bio Perl, but I > can't seem to find a quick solution to following problem : > Say you have a tree obtained by a certain method. From this tree, you > want to have the evolutionary distances between species, defined as > the sum of the branch lengths between any 2 species. There is as far > as I know no function for doing that. But is there a possibility to > get a list of some sort of "shortest paths" from one species to > another, allowing to easily calculate that matrix? > >>From the phylip package, I get following data if I run the neighbor or > fitch program. From there I can easily get an algorithm to calculate > the distances I need. But I also need to do that for maximum > likelihood trees and the like. Is there a way to get this information > in Bio Perl? >>From to dist > node1 sp1 xxxxx > node2 sp3 xxxxxx > node1 node2 xxxxx > node 1 sp2 xxxxx > > Kind regards > Joris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.bolser at gmail.com Mon Aug 24 08:26:13 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:26:13 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <2c8757af0908240526j1cb0a455x53f7f3dccaceda86@mail.gmail.com> 2009/8/24 Heikki Lehvaslaiho : > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. 'bioperl.list' file? I guess you made a typo and you mean bioperl.lisp http://www.bioperl.org/wiki/Emacs_template > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - don't >> change working code purely to change style. In my interpretation of the >> rule, however, this has always applied to code writing style, and not code >> formatting style. I'm assuming the goal here is only to make the formatting >> consistent. I have changed coding style in the past. IIRC this was in the Quality.pm file. I made the changes because two different styles were being used to do (roughly) the same thing at different points in the script. The two styles were being used interchangeably (at random?). As a noob, the use of two different styles was very confusing, because I didn't know if the difference was significant or what the significance of the difference might be. I resolved the issue by writing a set of additional tests and then slowly harmonizing the coding style while confirming that the tests were still running OK. In this case I think it was reasonable to try to have a consistent style at least within the module. Or should I have left the style as it was? Cheers, Dan. From dan.bolser at gmail.com Mon Aug 24 08:50:46 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:50:46 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> Message-ID: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> I just ran into the same problem described here. Here is my code to demonstrate what I expected: #!/usr/bin/perl -w use strict; use Bio::SimpleAlign; use Bio::LocatableSeq; use Bio::AlignIO; my $CLUDGE = 0; ## REF tacattaaagacccg ## SEQ1 taca.taaa...... ## SEQ2 .....taaaga.ccg my $aln = Bio::SimpleAlign->new(); $aln->gap_char('.'); my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' ); my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' ); $aln->add_seq( $r ); $aln->add_seq( $s1 ); $aln->add_seq( $s2 ); if($CLUDGE){ foreach(($r, $s1, $s2)){ $_->seq( '.' x ($_->start - 1) . $_->seq ) } } ## Prepare an 'output stream' for the alignment: my $aliWriter = Bio::AlignIO-> new( -fh => \*STDOUT, -format => 'clustalw', ); warn "\nOUTPUT:\n"; $aliWriter->write_aln($aln); I was calling the "fill in the gaps yourself" step a CLUDGE because I had expected the alignment object to take care of this for me. Is there any reason that it couldn't do this 'CLUDGE' automatically? It seems strange that it insists on being passed locatable sequence objects, but then largely ignore the given location. Would it not be possible to have this happen when the sequences are written out from the alignment? I think it should still be possible to index the column number via the (gapless) sequence number... or did I get confused? There are two levels of confusion here (on my part), 1) the concepts behind the objects and 2) the implementation details. Thanks for any hints on how to understand or potentially how to fix these problems. Cheers, Dan. 2009/7/22 Mark A. Jensen : > Hi Paolo, > I think I see what you want to do, however, it doesn't quite work > this way. I'm supposing you want to specify something like > > s1/3-6 attc > s2/7-10 gaag > > and obtain output like > > s1 --attc---- > s2 ------gaag > > But (and this is why LocatableSeqs are "locatable"), the alignment described > by the former data is always going to be > > s1 attc > s2 gaag > > so that I can query the alignment *column* number 1 and obtain > the residue coordinates of the original sequences in that column: > > $loc = $aln->get_seq_by_pos(1)->location_from_column(1); # 3 > > or vice-versa > > $col = $aln->column_from_residue_number( 's1', 3); # 1 > > As far as I know, you have to fill in the gaps yourself; a good > exercise, since you already have all the information you need, in having set > up the start and end coordinates (which are really > the column coordinates in this model). > If this wasn't what you had in mind, I apologize. > cheers, Mark > > > ----- Original Message ----- From: "Paolo Pavan" > To: > Sent: Thursday, July 16, 2009 6:17 AM > Subject: [Bioperl-l] Bio::SimpleAlign constructor? > > >> Hi, >> I have a brief question: I would like to know if there is a method to >> obtain a valid formatted and flush Bio::SimpleAlign object (i.e. >> properly filled with gaps on the right and on the left side of each >> sequence) given a bounch of Bio::LocatableSeq objects in which I have >> specified the -start and -end properties. >> Can anyone help me? Thank you very much, >> >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghai.rohit at gmail.com Mon Aug 24 08:53:03 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Mon, 24 Aug 2009 14:53:03 +0200 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> hi I think you forgot to add the "seq" in the builder.. thats why the file is empty. Also, the species name, though being parsed, is nowhere in the output. Here's a version using fasta output that you can probably customize further. This also takes the full name of the organism and adds to the description line in the output. use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'fasta'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species','seq','description'); while(my $seq = $seq_in->next_seq()) { my $desc = $seq->description(); my $species_string = $seq->species()->binomial('FULL'); $desc = $desc . " [$species_string]"; $seq->description($desc); $seq_out->write_seq($seq); } exit; On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact I only > need a first line under ORGANISM tag (e.i. genus + species). I though that > it would be possible to do with the SeqBuilder object by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 24 08:55:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:55:56 -0500 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Anna, It's stored in the Bio::Species object. I have to say, though, I think you're using a stick of dynamite for a scalpel here; if you only need ORGANISM parse it out directly (it's much faster). Or am I missing something? chris On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 08:56:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:56:02 -0500 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <1E5347D2-A60F-49CB-8F3B-C5E06342417E@illinois.edu> Heikki, perltidy has become the most common way to standardize perl coding style (in a non-text-editor-dependent way). A number of projects have started using it as a means for checking and cleaning up modules prior to release. I think Perl Best Practices reinforced that. chris On Aug 24, 2009, at 12:59 AM, Heikki Lehvaslaiho wrote: > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. > > > -Heikki > > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - >> don't >> change working code purely to change style. In my interpretation of >> the >> rule, however, this has always applied to code writing style, and >> not code >> formatting style. I'm assuming the goal here is only to make the >> formatting >> consistent. >> >> -hilmar >> >> On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: >> >>> Cheers Rob, >>> >>> Whatever objectons may arise from style x or style y, I think it's a >>> great idea to at least have one style or another recognized as being >>> 'standard'. I know TMTOWTDI, but on a project like this, with so >>> many >>> contributors and users, it's essential to at least have a >>> recommendation. I'll try to use this on any contribs. >>> >>> As you pointed out [1], its probably best to provide two patches for >>> any change involving a formating clean up: one to change the fomat >>> to >>> the standard and one to commit the actual code changes. >>> >>> >>> All the best, >>> Dan. >>> >>> [1] irc://irc.freenode.net/#bioperl >>> >>> >>> 2009/8/21 Robert Buels : >>>> >>>> This one is copied from the parrot project. I added it in >>>> maintenance/perltidy.conf. >>>> Have a look, tweak as you see fit. >>>> >>>> The idea with perltidy profile files is to use them to enforce >>>> coding >>>> style >>>> rules. So this perltidy profile file would be the place to >>>> codify the >>>> BioPerl coding standards, such as indentation, use of cuddled >>>> elses, etc. >>>> >>>> So here is one, let's customize it for our needs. The way I >>>> usually run >>>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>>> option >>>> to specify a profile file. >>>> >>>> Example: >>>> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>>> >>>> Rob >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > Building #2, Office #4216 > Computational Bioscience Research Centre (CBRC) > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 09:36:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 08:36:32 -0500 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: Dan, all, Bio::SimpleAlign doesn't align anything for you. It makes no assumptions about the data being added, beyond possibly checking for the seqs to be flush prior to analyses. Here's the reason why: The object doesn't 'know' the seqs map across from one to the other as below: > ... > ## REF tacattaaagacccg > ## SEQ1 taca.taaa...... > ## SEQ2 .....taaaga.ccg > > my $aln = Bio::SimpleAlign->new(); > > $aln->gap_char('.'); > > my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); > my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, - > seq=>'taca.taaa' ); > my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, - > seq=>'taaaga.ccg' ); > > $aln->add_seq( $r ); > $aln->add_seq( $s1 ); > $aln->add_seq( $s2 ); Above, you are making the assumption that SimpleAlign 'knows' where to match the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does NOT indicate that (the LocatableSeq docs, and their usage, should indicate that). Think about HSP alignments in a BLAST report; the start/end/strand coordinates are where the sequence in the alignment maps to the original query or hit sequence. They don't indicate where the hit maps to the query (the alignment itself does that in a column-wise fashion). I'm not sure, maybe it needs to be more explicit in the documentation, but SimpleAlign does not align the sequences for you (and it shouldn't be expected to). There are much better (faster, more accurate) ways to do that. > if($CLUDGE){ > foreach(($r, $s1, $s2)){ > $_->seq( '.' x ($_->start - 1) . $_->seq ) > } > } > > ## Prepare an 'output stream' for the alignment: > my $aliWriter = Bio::AlignIO-> > new( -fh => \*STDOUT, > -format => 'clustalw', > ); > > warn "\nOUTPUT:\n"; > $aliWriter->write_aln($aln); ... > I was calling the "fill in the gaps yourself" step a CLUDGE because I > had expected the alignment object to take care of this for me. Is > there any reason that it couldn't do this 'CLUDGE' automatically? It > seems strange that it insists on being passed locatable sequence > objects, but then largely ignore the given location. > > Would it not be possible to have this happen when the sequences are > written out from the alignment? I think it should still be possible to > index the column number via the (gapless) sequence number... or did I > get confused? There are two levels of confusion here (on my part), 1) > the concepts behind the objects and 2) the implementation details. Mentioned above (no assumptions on how locatableseqs map to one another). WYSIWYG. There is nothing precluding you from writing up code to do that, though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl alignment implementation (there are, believe it or not, pure perl implementations of Smith- Waterman and Needleman-Wunsch. > Thanks for any hints on how to understand or potentially how to fix > these problems. > > Cheers, > Dan. Not that SimpleAlign and LocatableSeqs don't have their share of problems. However, I don't think you can expect this behavior to change with the refactors. chris From hlapp at gmx.net Mon Aug 24 09:44:43 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:44:43 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> Message-ID: On Aug 23, 2009, at 1:25 PM, Anand C. Patel wrote: > The other piece of potentially useful information is below -- output > from > SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; > (taxon_id 138 maps to ncbi_taxon_id 10090) > > taxon_id name name_class > 138 LK3 transgenic mice includes > 138 Mus muscaris misnomer > 138 Mus musculus scientific name > 138 Mus sp. 129SV includes > 138 house mouse genbank common name > 138 mice C57BL/6xCBA/CaJ hybrid misspelling > 138 mouse common name > 138 nude mice includes > 138 transgenic mice includes > > The source from the genbank entry NM_017474 is: > SOURCE Mus musculus (house mouse) > > Which is why I think the issue is that the name_class is "genbank > common name" rather than common name. Note that apparently NCBI has decided that the common name is 'mouse', not 'house mouse'. Why what they report in the genbank record is different from what they decided to be the common name is beyond me. Note also that the common name in parentheses is optional. If it's missing the record is still in valid format. > What does strike me as odd though is that not even "mouse" shows up > -- common_name is empty. Indeed, that's odd. Can you file this as a bug report and assign to the bioperl-db queue? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Aug 24 09:50:17 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:50:17 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> Message-ID: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > [...] > Code snippet: > my $species = $seq->species; > print "common name = ",$species->common_name, "\n"; > print "scientific name = ",$species->scientific_name, "\n"; > print "species = ",$species->species, "\n"; > print "genus = ",$species->genus, "\n"; > print "sub_species = ",$species->sub_species, "\n"; > print "binomial = ",$species->binomial, "\n"; > print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; > > Output: > common name = > scientific name = musculus > species = musculus > genus = Mus > sub_species = > binomial = Mus musculus > ncbi_taxid = 10090 This points to a problem in Bio::Species::scientific_name(), given that binomial() is correct. Could you file this as a bug report? > The common name is missing, despite having loaded it from NCBI > taxonomy using the provided script. > It is ONLY present as this "genbank common name". > [...] > I could go through and replace all of the instances of "genbank > common name" with "common name" and see if this fixes it. I think we need to first discuss how we want to treat the 'common name' versus 'genbank common name' classes in BioPerl. So question for everyone: do we need to have both available (in which case we need to add an accessor in Bio::Species), or only 'common name', or should 'genbank common name' override 'common name' if both are present and have different values. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Mon Aug 24 10:18:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Aug 2009 15:18:20 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> Message-ID: <320fb6e00908240718q194afe78j4a05b31aeb33e313@mail.gmail.com> On Mon, Jul 27, 2009 at 2:06 PM, Chris Fields wrote: > > I added this (and the others) to our ticket tracking this. ?Looks like > solexa conversion either way is borked, which is very likely an issue > with conversion. Hi Chris, I've been digging into the current SVN code for BioPerl's FASTQ support - I realised you are doing the Solexa to PHRED mapping twice when parsing "fastq-solexa" files. Using "qual" output (which shows the PHRED scores in plain text) makes it very clear something is wrong: $ cat solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<; That is Solexa scores from 40 (h) down to -5 (;), which should map onto PHRED scores from 40 down to 1 (according to our prior discussions). $ ./bioperl_solexa2qual.pl < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 10 9 8 7 6 6 5 5 5 5 4 4 4 4 For reference, $ python biopython_solexa2qual.py < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 9 8 7 6 5 5 4 4 3 3 2 2 1 1 I can "fix" this in fastq.pm by commenting out one of the log mappings, for example see the patch I've just uploaded to Bug 2857: http://bugzilla.open-bio.org/show_bug.cgi?id=2857 That brings me to another problem, consider the following (with the double conversion fixed): $ ./bioperl_solexa2solexa.pl < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJJHGFEDDBB@@>><< If you compare that to the original, you'll notice a loss of detail in the poor quality reads. e.g. Solexa scores 9 (I) and 10 (J) have both been mapped onto 10 (J). I believe this happens because BioPerl is converting the Solexa scores to PHRED scores on loading (which is fine - EMBOSS does this too), but you are also storing them as integers! In order to preserve these details, I think you'll have to hold the converted PHRED scores as floating point numbers (which I think is what EMBOSS does). This has the downside of taking more memory, and may also complicate file output (you may need to round things). Regards, Peter (@Biopython) From acpatel at gmail.com Sat Aug 22 18:44:20 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 17:44:20 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> Message-ID: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> On Aug 22, 2009, at 4:36 PM, Hilmar Lapp wrote: > That's a pretty strange bug. Anand, which version of BioPerl and > Bioperl-db are you running? BioPerl is: https://launchpad.net/ubuntu/karmic/+source/bioperl/1.6.0-2ubuntu1 (1.6.0 loaded via apt-get into ubuntu karmic alpha 4) BioPerl-db is version 1.006 (1.6.0) loaded via CPAN. BioSQL is 1.0.1 I think I know what's broken. Using load_seqdatabases.pl, I'd put a set of sequences from genbank into a biosql db in mysql. I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl script from biosql. When I searched for house (as in house mouse), I found that the name of the type of taxon class was "genbank common name". When I searched for musculus, it does appear as a type of "scientific name". > Note that the genus *is* actually there in the lineage (and hence > does get retrieved from the database). Apparently the Species object > fails to pull it out correctly, though? > > Anand - I suspect there have been some warnings printed to the > terminal - can you post these, and otherwise confirm that there > haven't been any? > > -hilmar I'm not just getting warnings. I'm getting errors. Tons of them. It's a wonder it's working at all. I started with the getentry.cgi script in the cgi-bin folder, and stripped most of it away. Code: #!/usr/bin/perl use DBI; use CGI::Carp qw( fatalsToBrowser ); use CGI qw/:standard/; use Bio::DB::BioDB; use Bio::Seq::RichSeq; use Bio::SeqIO; use IO::String; my $q = new CGI; # create new CGI object print $q->header; # create the HTTP header my $value = "NM_017474"; my $host = "localhost"; my $dbname = "biosql"; my $driver = "mysql"; my $dbuser = "webuser"; my $dbpass = "wrjFfjjW9y243xvF"; my $biodbname = "genbank"; my $seq; eval { my $db = Bio::DB::BioDB->new(-database => "biosql", -host => $host, -dbname => $dbname, -driver => $driver, -user => $dbuser, -pass => $dbpass, -verbose => 10, ); my $seqadaptor = $db->get_object_adaptor('Bio::SeqI'); $seq = Bio::Seq::RichSeq->new( -accession_number => $value, - namespace => $biodbname ); $seq = $seqadaptor->find_by_unique_key($seq); }; my $seqfh = IO::String->new($gbstring); my $ioseq = Bio::SeqIO->new(-fh => $seqfh, -format => 'genbank'); $ioseq->write_seq($seq); if( $@ || !defined $seq) { print "Got fetch exception of...\n
$@\n
"; exit(0); } print "BioSQL display of ". $seq->display_id ."\n"; print "\n"; print "
\n
".$gbstring."\n
\n
\n"; Errors (some but not all): test1.cgi: attempting to load adaptor class for Bio::SeqI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load adaptor class for BioNamespace test1.cgi: \tattempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? test1.cgi: BioNamespaceAdaptor: binding UK column 1 to "genbank" (namespace) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor test1.cgi: preparing UK select statement: SELECT bioentry.bioentry_id, bioentry.name, bioentry.identifier, bioentry.accession, bioentry.description, bioentry.version, bioentry.division, bioentry.biodatabase_id, bioentry.taxon_id FROM bioentry WHERE biodatabase_id = ? AND accession = ? test1.cgi: SeqAdaptor: binding UK column 1 to "1" (bionamespace) test1.cgi: SeqAdaptor: binding UK column 2 to "NM_017474" (accession_number) test1.cgi: attempting to load adaptor class for Bio::PrimarySeq test1.cgi: \tattempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: preparing PK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE biodatabase_id = ? test1.cgi: BioNamespaceAdaptor: binding PK column to "1" test1.cgi: attempting to load adaptor class for Bio::Species test1.cgi: \tattempting to load module Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: preparing PK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND taxon_name.name_class = 'scientific name' AND taxon.taxon_id = ? test1.cgi: SpeciesAdaptor: binding PK column to "138" test1.cgi: prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value >= node.left_value AND taxon.left_value <= node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value test1.cgi: preparing SELECT COMMON_NAME: SELECT taxon_name.name FROM taxon_name WHERE taxon_name.taxon_id = ? AND taxon_name.name_class = 'common_name' test1.cgi: attempting to load adaptor class for Bio::Tree::Tree test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::Root test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::RootI test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeFunctionsI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor test1.cgi: no adaptor found for class Bio::Tree::Tree test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy::list test1.cgi: \tattempting to load module Bio::DB::BioSQL::listAdaptor test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy test1.cgi: \tattempting to load module Bio::DB::BioSQL::TaxonomyAdaptor test1.cgi: no adaptor found for class Bio::DB::Taxonomy::list test1.cgi: attempting to load adaptor class for Biosequence test1.cgi: \tattempting to load module Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BiosequenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: preparing UK select statement: SELECT biosequence.bioentry_id, biosequence.version, biosequence.length, biosequence.alphabet, NULL, NULL, biosequence.bioentry_id FROM biosequence WHERE bioentry_id = ? test1.cgi: BiosequenceAdaptor: binding UK column 1 to "1" (primary_seq) test1.cgi: attempting to load adaptor class for Bio::AnnotationCollectionI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: attempting to load adaptor class for Bio::Annotation::TypeManager test1.cgi: \tattempting to load module Bio::DB::BioSQL::TypeManagerAdaptor test1.cgi: no adaptor found for class Bio::Annotation::TypeManager test1.cgi: attempting to load adaptor class for Bio::Annotation::Reference test1.cgi: \tattempting to load module Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.reference_id, t2.authors, t2.title, t2.location, t2.crc, bioentry_reference.start_pos, bioentry_reference.end_pos, bioentry_reference.rank, t2.dbxref_id FROM bioentry t1, reference t2, bioentry_reference WHERE t1.bioentry_id = bioentry_reference.bioentry_id AND t2.reference_id = bioentry_reference.reference_id AND t1.bioentry_id = ? test1.cgi: ReferenceAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::DBLink test1.cgi: \tattempting to load module Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: preparing PK select statement: SELECT dbxref.dbxref_id, dbxref.dbname, dbxref.accession, dbxref.version, NULL FROM dbxref WHERE dbxref_id = ? test1.cgi: DBLinkAdaptor: binding PK column to "1" test1.cgi: DBLinkAdaptor: binding PK column to "2" test1.cgi: DBLinkAdaptor: binding PK column to "3" test1.cgi: DBLinkAdaptor: binding PK column to "4" test1.cgi: DBLinkAdaptor: binding PK column to "5" test1.cgi: DBLinkAdaptor: binding PK column to "6" test1.cgi: DBLinkAdaptor: binding PK column to "7" test1.cgi: DBLinkAdaptor: binding PK column to "8" test1.cgi: DBLinkAdaptor: binding PK column to "9" test1.cgi: DBLinkAdaptor: binding PK column to "10" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, bioentry_dbxref.rank FROM bioentry t1, dbxref t2, bioentry_dbxref WHERE t1.bioentry_id = bioentry_dbxref.bioentry_id AND t2.dbxref_id = bioentry_dbxref.dbxref_id AND t1.bioentry_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::SimpleValue test1.cgi: \tattempting to load module Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::Ontology test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing UK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE name = ? test1.cgi: OntologyAdaptor: binding UK column 1 to "Annotation Tags" (name) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, bioentry_qualifier_value.value, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::OntologyTerm test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyTermAdaptor test1.cgi: attempting to load adaptor class for Bio::AnnotationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::TermI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::TermAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::Comment test1.cgi: \tattempting to load module Bio::DB::BioSQL::CommentAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::CommentAdaptor test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE t1.bioentry_id = ? test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::Seq::RichSeq: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::SeqFeatureI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: preparing query: SELECT t1.seqfeature_id, t1.display_name, t1.rank, t1.bioentry_id, t1.type_term_id, t1.source_term_id FROM seqfeature t1 WHERE t1.bioentry_id = ? ORDER BY t1.rank test1.cgi: Query FIND FEATURE BY SEQ: binding column 1 to "1" test1.cgi: preparing PK select statement: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL, term.ontology_id FROM term WHERE term_id = ? test1.cgi: TermAdaptor: binding PK column to "245" test1.cgi: attempting to load adaptor class for Bio::Ontology::OntologyI test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing PK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE ontology_id = ? test1.cgi: OntologyAdaptor: binding PK column to "32" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, term_dbxref.rank FROM term t1, dbxref t2, term_dbxref WHERE t1.term_id = term_dbxref.term_id AND t2.dbxref_id = term_dbxref.dbxref_id AND t1.term_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "245" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: preparing: SELECT synonym FROM term_synonym WHERE term_id = ? test1.cgi: SELECT SYNONYMS: executing with values (245) (FK to Bio::Ontology::Term) test1.cgi: TermAdaptor: binding PK column to "246" test1.cgi: OntologyAdaptor: binding PK column to "33" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "246" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (246) (FK to Bio::Ontology::Term) test1.cgi: attempting to load adaptor class for Bio::LocationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::LocationAdaptor test1.cgi: preparing query: SELECT t1.location_id, t1.start_pos, t1.end_pos, t1.strand, t1.rank, t1.seqfeature_id, t1.dbxref_id FROM location t1 WHERE t1.seqfeature_id = ? test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::DB::Persistent::PersistentObjectFactory test1.cgi: \tattempting to load module Bio::DB::BioSQL::PersistentObjectFactoryAdaptor test1.cgi: attempting to load adaptor class for Bio::Factory::ObjectFactoryI test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryAdaptor test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, seqfeature_dbxref.rank FROM seqfeature t1, dbxref t2, seqfeature_dbxref WHERE t1.seqfeature_id = seqfeature_dbxref.seqfeature_id AND t2.dbxref_id = seqfeature_dbxref.dbxref_id AND t1.seqfeature_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, seqfeature_qualifier_value.value, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE 1 = 1 test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::SeqFeature::Generic: binding column 1 to "1" test1.cgi: TermAdaptor: binding PK column to "260" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "260" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (260) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "2" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "250" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "250" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (250) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "3" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "264" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "264" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (264) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "4" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing SELECT statement: SELECT seq FROM biosequence WHERE bioentry_id = ? > > On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > >> Anand, >> >> You should always post emails to the bioperl-l mailing list, never >> to individual developers (you'll get an answer much faster). Keep >> responses on the list as well. >> >> Though I use bioperl-db some, I'm probably not the best person to >> ask. Does anyone know what's going on with this? Does this have >> to do with the Species/Taxon refactoring? >> >> chris >> >> Begin forwarded message: >> >>> From: "Anand C. Patel" >>> Date: August 22, 2009 2:57:42 PM CDT >>> To: cjfields at illinois.edu >>> Subject: problem with bioperl (where's the Mus?) >>> >>> Dr. Fields, >>> >>> I'm struggling with what seems to be a strange quirk in Bioperl >>> +/- Bioperl-db/BioSQL. >>> >>> I've successfully loaded in genbank sequences into a biosql >>> database. >>> >>> When I try to write a genbank sequence back out, a curious thing >>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>> >>> Despite reporting: >>> primary tag: source >>> tag: chromosome >>> value: 3 >>> >>> tag: db_xref >>> value: taxon:10090 >>> >>> tag: map >>> value: 3 74.5 cM >>> >>> tag: mol_type >>> value: mRNA >>> >>> tag: organism >>> value: Mus musculus >>> The sequence when printed out via SeqIO looks like this: >>> LOCUS NM_017474 2935 bp dna linear >>> ROD 13-AUG-2009 >>> DEFINITION Mus musculus chloride channel calcium activated 3 >>> (Clca3), mRNA. >>> ACCESSION NM_017474 XM_978159 >>> VERSION NM_017474.2 GI:255918210 >>> KEYWORDS . >>> SOURCE musculus >>> ORGANISM musculus >>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>> Bilateria; >>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>> Tetrapoda; >>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>> Glires; >>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>> Confession -- I have a final project due Monday wherein I boldly >>> elected to interface Bioperl, MySQL, Perl, and CGI. >>> (I'm an MD getting my MS in Bioinformatics.) >>> After many misadventures, I'm getting to the point where I could >>> actually complete the objectives, but this is bug is rather >>> problematic. >>> Thanks, >>> Anand >>> Anand C. Patel, MD >>> Assistant Professor of Pediatrics >>> Division of Allergy/Pulmonary Medicine >>> Department of Pediatrics >>> Washington University School of Medicine >>> 660 South Euclid Ave, Campus Box 8052 >>> St. Louis, MO 63110 >>> acpatel at wustl.edu >>> acpatel at gmail.com >>> acpatel at jhu.edu >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sat Aug 22 20:04:35 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:04:35 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? First -- before the sequences. In fact, I'm in the midst of reloading the taxonomy into a clean new database. I used namespace "genbank" instead of namespace "bioperl". Could that be the problem? >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). I did not know that! They were flagged "error", so I thought those might be the problem. >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > It works -- I just think I confused the system by not sticking with the default namespace? Thanks, Anand >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sat Aug 22 20:13:37 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:13:37 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Do I need to load ontology before loading sequences? (I promise I've been reading the documentation for days, and could not find a yea or nay on this) Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at usa.net Sat Aug 22 21:13:14 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sat, 22 Aug 2009 20:13:14 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Turns out that using the default namespace bioperl doesn't change anything. Common name -- still "genbank common name" in name_class in the taxon_name table for "house mouse", which I think the module is looking for as "common name". It's not behaving differently despite reloading the sequences. I've created a horrible munge that fixes it for cosmetic purposes: my $species = $seq->species; my $justspecies = $species->scientific_name(); my $binspecies = $species->binomial(); my $gbstring2 = $gbstring; $gbstring2 =~ s/$binspecies/$justspecies/g; $gbstring2 =~ s/$justspecies/$binspecies/g; But this does not strike me as a long term solution. Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From jkb at sanger.ac.uk Mon Aug 24 05:02:34 2009 From: jkb at sanger.ac.uk (James Bonfield) Date: Mon, 24 Aug 2009 10:02:34 +0100 Subject: [Bioperl-l] SCF installation Message-ID: <20090824090234.GB821@sanger.ac.uk> Lincoln Stein wrote: > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If you download and install Staden 1.12, you'll get a library > named libstaden-read rather than libread; Bio::SCF hasn't been updated > for the name change, and so you will have to open up the Makefile.PL > and change "-lread" to "-lstaden-read" in order for it to compile. This post was pointed out to me by one of the Debian maintainers. I'm mailing the list directly but am not a subscriber, so please keep me listed in any replies. The Staden Package home page recently underwent a revamp to use the RSS feeds, automatically updating it. Unfortunately within a couple weeks of doing that sourceforge managed to break the file release RSS and so the site has stopped updating. The News section is still working though, so I ought to add a news post about io_lib-1.12.1 and it'll at least appear somewhere on the home page. Regarding the library name change, this was requested by Debian and also already implemented by Fedora. I agree with it too as libread.so is a truely appalling name, so the new name is here to stay. There shouldn't be a great number of differences compared to the 1.11.x release set though, with the only incompatibility I can immediately think of being the change from int to size_t in the Array structs. James PS. There's been very few changes to SCF over the years so it's likely all working just fine. Most recent io_lib changes have been SRF support, and a few associated tweaks to ZTR necessitated by SRF. -- James Bonfield (jkb at sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From acpatel at usa.net Sun Aug 23 13:17:08 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:17:08 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> On Aug 23, 2009, at 9:38 AM, Hilmar Lapp wrote: >> Common name -- still "genbank common name" in name_class in the >> taxon_name table for "house mouse", which I think the module is >> looking for as "common name". > > If you are loading the NCBI taxonomy first, this is coming from > NCBI, not one of the scripts or BioPerl, and hence we have no > control over it. Are you saying that there is no designated name of > class 'common name' for Mus musculus in the NCBI taxonomy dump? > > Also, the common name being present or not should have no bearing on > the lineage array, where the actual problem is, so I don't > understand right now how this would be connected to the problem you > are seeing. > >> >> It's not behaving differently despite reloading the sequences. >> >> I've created a horrible munge that fixes it for cosmetic purposes: >> my $species = $seq->species; >> my $justspecies = $species->scientific_name(); >> my $binspecies = $species->binomial(); >> >> my $gbstring2 = $gbstring; >> >> $gbstring2 =~ s/$binspecies/$justspecies/g; >> $gbstring2 =~ s/$justspecies/$binspecies/g; > > I don't understand what you are trying to achieve here - it seems > like you are making a substitution and then reverting it? Also, > $species->scientific_name() and $species->binomial() should be > identical for Mus musculus - are you finding different values being > returned? > > So in essence, I wouldn't expect your above code snippet to have any > effect, for both of these reasons. How do you find $gbstring2 to be > different from $gbstring at the end of this block of code? > > -hilmar I should have been clearer. Code snippet: my $species = $seq->species; print "common name = ",$species->common_name, "\n"; print "scientific name = ",$species->scientific_name, "\n"; print "species = ",$species->species, "\n"; print "genus = ",$species->genus, "\n"; print "sub_species = ",$species->sub_species, "\n"; print "binomial = ",$species->binomial, "\n"; print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; Output: common name = scientific name = musculus species = musculus genus = Mus sub_species = binomial = Mus musculus ncbi_taxid = 10090 The common name is missing, despite having loaded it from NCBI taxonomy using the provided script. It is ONLY present as this "genbank common name". So, what I get in $gbstring is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE musculus ORGANISM musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. What I get in $gbstring2 is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE Mus musculus ORGANISM Mus musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. Not perfect -- common name is still missing, but better. I could go through and replace all of the instances of "genbank common name" with "common name" and see if this fixes it. Any other thoughts? Thanks, Anand > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From acpatel at usa.net Sun Aug 23 13:25:16 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:25:16 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> The other piece of potentially useful information is below -- output from SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; (taxon_id 138 maps to ncbi_taxon_id 10090) taxon_id name name_class 138 LK3 transgenic mice includes 138 Mus muscaris misnomer 138 Mus musculus scientific name 138 Mus sp. 129SV includes 138 house mouse genbank common name 138 mice C57BL/6xCBA/CaJ hybrid misspelling 138 mouse common name 138 nude mice includes 138 transgenic mice includes The source from the genbank entry NM_017474 is: SOURCE Mus musculus (house mouse) Which is why I think the issue is that the name_class is "genbank common name" rather than common name. What does strike me as odd though is that not even "mouse" shows up -- common_name is empty. Thanks again, Anand From maj at fortinbras.us Mon Aug 24 10:37:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 10:37:45 -0400 Subject: [Bioperl-l] The Documentation Project Message-ID: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Hi All, I'm starting this journey of 1000 mi (1620 km) with the following step: http://www.bioperl.org/wiki/The_Documentation_Project Please visit and comment. Thanks, Mark From hlapp at gmx.net Mon Aug 24 10:47:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 10:47:34 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Hi Anna, sequence formats all have some varying amount of information that must be present or otherwise the syntax is invalid. If what you need is a two-column table of display_id and species name, then I would simply write that, and not squeeze it into a standard sequence format. (Unless you actually do want the sequence too, in which case you need to add it as a wanted slot; even in that case though, writing a three- column table might serve you better.) -hilmar On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Mon Aug 24 12:50:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 11:50:05 -0500 Subject: [Bioperl-l] The Documentation Project In-Reply-To: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> References: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Message-ID: Mark, We should probably keep some of this discussion on the list, primarily as I've been running into conflicts with responses on the wiki page. It's more amenable to discussion. For anyone out there interested, you should speak up now, this is the best opportunity to do so (we're considering lack of input assent). I want to make a a few key points on behalf of the devs. It's impossible to consistently maintain two active copies of any documentation (wiki vs docs in the distribution). I have tried keeping up with this, helping with the 1.5.2 release, and full-on with the 1.6.0 release, and it's an extreme headache. From the maintenance point-of-view, this is what I would do: 1) Where possible always link to the official POD (either pdoc or CPAN) from the distribution. Make the API documentation link very prominent (I moved it to the docs section in the sidebar). Protect wiki module pages (in line with the 'one official copy' rule), allow writable discussion pages for additional, wiki-specific documentation (which can be added to the official docs as needed). 2) ...or, have a search bar specifically for the module documentation that links directly to the proper API/PDOC/CPAN page. Not sure how feasible that is, particularly since we plan on splitting things up a bit. 3) POD-ify any relevant documentation we intend on including in the wiki that also comes with the distribution (similar to Moose::Manual). I do not want to repeatedly edit a plain text INSTALL/ BUGS/DEPENDENCIES file to correspond with the wikified version for every release (nor vice versa). Long term: (this is my own personal style, YMMV) move all POD to the end of the file. Add a 'Status' tags to any method docs indicating implementation status (virtual, stable, unstable, public, private, etc). Move method POD to it's own section within the main documentation. Implement a coding style (as mentioned recently on list using perltidy, but also using proper method names). HOWTO's are also subject to API changes, but we haven't run into many issues with those yet, and they're wiki-specific. chris On Aug 24, 2009, at 9:37 AM, Mark A. Jensen wrote: > Hi All, > I'm starting this journey of 1000 mi (1620 km) with the following > step: > http://www.bioperl.org/wiki/The_Documentation_Project > Please visit and comment. > Thanks, > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 13:37:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 12:37:39 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: On Aug 24, 2009, at 12:16 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >>> >>> ... >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I think it's bioperl-db-related. You've previously pointed out the incongruity bioperl-db has with Bio::Species in a bug report (I indicated that in a separate post to this thread). >>> The common name is missing, despite having loaded it from NCBI >>> taxonomy using the provided script. >>> It is ONLY present as this "genbank common name". >>> [...] >>> I could go through and replace all of the instances of "genbank >>> common name" with "common name" and see if this fixes it. >> I think we need to first discuss how we want to treat the 'common >> name' versus 'genbank common name' classes in BioPerl. >> So question for everyone: do we need to have both available (in >> which case we need to add an accessor in Bio::Species), or only >> 'common name', or should 'genbank common name' override 'common >> name' if both are present and have different values. > > Bio::Species (via Bio::Taxon) has the common_names() method, for > which common_name() is an alias that in scalar context returns the > first of possibly many common names, one of which may be the genbank > common name. > > See: > http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes Yes, but that method stored names in an array and removes the context, presumed or not. If there are two or more, which names correspond to common_name, which to genbank_common_name (and which should we prefer)? chris From bix at sendu.me.uk Mon Aug 24 13:16:13 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 18:16:13 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> Message-ID: <4A92CADD.10901@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > >> [...] >> Code snippet: >> my $species = $seq->species; >> print "common name = ",$species->common_name, "\n"; >> print "scientific name = ",$species->scientific_name, "\n"; >> print "species = ",$species->species, "\n"; >> print "genus = ",$species->genus, "\n"; >> print "sub_species = ",$species->sub_species, "\n"; >> print "binomial = ",$species->binomial, "\n"; >> print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; >> >> Output: >> common name = >> scientific name = musculus >> species = musculus >> genus = Mus >> sub_species = >> binomial = Mus musculus >> ncbi_taxid = 10090 > > This points to a problem in Bio::Species::scientific_name(), given that > binomial() is correct. Could you file this as a bug report? What code creates the Bio::Species object here? I suspect this code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> The common name is missing, despite having loaded it from NCBI >> taxonomy using the provided script. >> It is ONLY present as this "genbank common name". >> [...] >> I could go through and replace all of the instances of "genbank common >> name" with "common name" and see if this fixes it. > I think we need to first discuss how we want to treat the 'common name' > versus 'genbank common name' classes in BioPerl. > > So question for everyone: do we need to have both available (in which > case we need to add an accessor in Bio::Species), or only 'common name', > or should 'genbank common name' override 'common name' if both are > present and have different values. Bio::Species (via Bio::Taxon) has the common_names() method, for which common_name() is an alias that in scalar context returns the first of possibly many common names, one of which may be the genbank common name. See: http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes From hlapp at gmx.net Mon Aug 24 13:54:13 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 13:54:13 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I see. Any pointer to what would tell me what I need to change or is everything in the Bio::Species POD? BTW what the Bioperl-db code does is instantiate the blank object and then populate it through its accessors (mostly the classification() array). If what it has been doing in the past is now considered incorrect, at least it doesn't raise any warning that would alert one to that ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From robert.bradbury at gmail.com Mon Aug 24 14:38:08 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 24 Aug 2009 14:38:08 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: As a really "off-the-wall" suggestion, you might see if somehow the "name" being pulled is the SwissProt name rather than the species name. I run into this when I'm fetching FASTA sequences from SwissProt in that the sequence identifier names are non-standard for some of the early "standard" species, e.g. "HUMAN", # Homo sapiens "MOUSE", # Mus musculus "RAT", # Rattus norvegicus "BOVIN", # Bos taurus "HORSE", # Equus caballus "PIG", # Sus scrofa "RABIT", # Oryctolagus cuniculus "SHEEP", # Ovis aries "YEAST", # Saccharomyces cerevisiae (Baker's yeast) etc. Eventually they largely adopted the 3+2 letter species derived name, but the early "standard" names are anomalies. You might run a test on a newly sequenced species (Gorilla, Opossum, Armadillo, Dog, etc.) to see if you get a "standard" species name. Robert Bradbury From dan.bolser at gmail.com Mon Aug 24 15:13:26 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 20:13:26 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > From bix at sendu.me.uk Mon Aug 24 15:12:05 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 20:12:05 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: <4A92E605.5090706@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: > >>> This points to a problem in Bio::Species::scientific_name(), given >>> that binomial() is correct. Could you file this as a bug report? >> >> What code creates the Bio::Species object here? I suspect this code >> isn't aware of changes in Bio::Species since BioPerl 1.5.2. > > I see. Any pointer to what would tell me what I need to change or is > everything in the Bio::Species POD? ... I won't guarantee the perfection of the POD ;) > BTW what the Bioperl-db code does is instantiate the blank object and > then populate it through its accessors (mostly the classification() > array). If what it has been doing in the past is now considered > incorrect, at least it doesn't raise any warning that would alert one to > that ... Yuh... If you point out the code that creates the Bio::Species I can look into it for you and suggest what needs changing and why it doesn't work (or if it's a bug in Bio::Species). I can't remember things clearly right now, though classification() I guess was supposed to be backwards compatible. From cjfields at illinois.edu Mon Aug 24 15:52:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 14:52:56 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92E605.5090706@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> Message-ID: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>> This points to a problem in Bio::Species::scientific_name(), >>>> given that binomial() is correct. Could you file this as a bug >>>> report? >>> >>> What code creates the Bio::Species object here? I suspect this >>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> I see. Any pointer to what would tell me what I need to change or >> is everything in the Bio::Species POD? > > ... I won't guarantee the perfection of the POD ;) > > >> BTW what the Bioperl-db code does is instantiate the blank object >> and then populate it through its accessors (mostly the >> classification() array). If what it has been doing in the past is >> now considered incorrect, at least it doesn't raise any warning >> that would alert one to that ... > > Yuh... If you point out the code that creates the Bio::Species I can > look into it for you and suggest what needs changing and why it > doesn't work (or if it's a bug in Bio::Species). I can't remember > things clearly right now, though classification() I guess was > supposed to be backwards compatible. Sendu, I think it's related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 Bio::DB::BioSQL::SpeciesAdaptor and Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in question i think. chris From bix at sendu.me.uk Mon Aug 24 16:01:29 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 21:01:29 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> Message-ID: <4A92F199.2030900@sendu.me.uk> Chris Fields wrote: > > On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > >> Hilmar Lapp wrote: >>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>> This points to a problem in Bio::Species::scientific_name(), given >>>>> that binomial() is correct. Could you file this as a bug report? >>>> >>>> What code creates the Bio::Species object here? I suspect this code >>>> isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>> I see. Any pointer to what would tell me what I need to change or is >>> everything in the Bio::Species POD? >> >> ... I won't guarantee the perfection of the POD ;) >> >> >>> BTW what the Bioperl-db code does is instantiate the blank object and >>> then populate it through its accessors (mostly the classification() >>> array). If what it has been doing in the past is now considered >>> incorrect, at least it doesn't raise any warning that would alert one >>> to that ... >> >> Yuh... If you point out the code that creates the Bio::Species I can >> look into it for you and suggest what needs changing and why it >> doesn't work (or if it's a bug in Bio::Species). I can't remember >> things clearly right now, though classification() I guess was supposed >> to be backwards compatible. > > Sendu, I think it's related to this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 > > Bio::DB::BioSQL::SpeciesAdaptor and > Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in > question i think. Ah, yes, well there you go then. So it is a classification() issue. Judging by what I said in that bug, looks like the db code needs to be changed to put the full scientific name in the first element it passes to classification. From cjfields at illinois.edu Mon Aug 24 16:27:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 15:27:23 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92F199.2030900@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> Message-ID: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>> Hilmar Lapp wrote: >>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>> given that binomial() is correct. Could you file this as a bug >>>>>> report? >>>>> >>>>> What code creates the Bio::Species object here? I suspect this >>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>> I see. Any pointer to what would tell me what I need to change or >>>> is everything in the Bio::Species POD? >>> >>> ... I won't guarantee the perfection of the POD ;) >>> >>> >>>> BTW what the Bioperl-db code does is instantiate the blank object >>>> and then populate it through its accessors (mostly the >>>> classification() array). If what it has been doing in the past is >>>> now considered incorrect, at least it doesn't raise any warning >>>> that would alert one to that ... >>> >>> Yuh... If you point out the code that creates the Bio::Species I >>> can look into it for you and suggest what needs changing and why >>> it doesn't work (or if it's a bug in Bio::Species). I can't >>> remember things clearly right now, though classification() I guess >>> was supposed to be backwards compatible. >> Sendu, I think it's related to this: >> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >> Bio::DB::BioSQL::SpeciesAdaptor and >> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in >> question i think. > > Ah, yes, well there you go then. So it is a classification() issue. > Judging by what I said in that bug, looks like the db code needs to > be changed to put the full scientific name in the first element it > passes to classification. Yup. I believe the only blocking issue with implementing it was potential backwards-compat problems with databases loaded using old behavior and then being updated post-1.5.2 (new behavior). I would think this only affects sequence data loaded w/o taxonomy preloaded, but I'm not sure. I suggest, if you can fix it, go ahead make the necessary change. We can then post a big warning to BioSQL and here about the problem, something along the lines of 'bioperl-db in svn may be backwards incompatible with species information loaded in previous versions; it may eat your first born' or similar. It's an absolutely necessary fix, and may effectively kill a bunch of other db/species-related bugs. chris From Kevin.M.Brown at asu.edu Mon Aug 24 17:48:35 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 24 Aug 2009 14:48:35 -0700 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com><990CEF10B1AD4BD5BE9977FD62DB3437@NewLife><2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B4062D2655@EX02.asurite.ad.asu.edu> You can use Bio::SimpleAlign for those tasks, but you, the programmer, have to remember that you didn't front pad the sequence and so can't utilize certain functions blindly. I've used SimpleAlign with LocatableSeq objects and wrote a few custom methods that did things like creating slices from the simplealign for each locatableseq. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Bolser Sent: Monday, August 24, 2009 12:13 PM To: Chris Fields Cc: bioperl-l at lists.open-bio.org; Mark A. Jensen; Paolo Pavan Subject: Re: [Bioperl-l] Bio::SimpleAlign constructor? Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Mon Aug 24 20:12:18 2009 From: hartzell at alerce.com (George Hartzell) Date: Mon, 24 Aug 2009 17:12:18 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl Message-ID: <19091.11362.190209.844074@already.dhcp.gene.com> There's a warning at Ensembl about the perl api code depending on an old version of bioperl (1.2.3) http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html Does anyone have current information about that dependency? My quick-n-dirty tests suggest that one can't build an app that uses both new Bioperl and the ensembl api without ensembl picking up the newer bioperl libraries (or your app getting the older ones). It's not clear what parts of the ensembl world depend on the older BioPerl. Anyone have any recipes to make it work? Any info on a possible modernization of the ensembl code? Thanks, g. From cjfields at illinois.edu Mon Aug 24 22:29:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 21:29:38 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <19091.11362.190209.844074@already.dhcp.gene.com> References: <19091.11362.190209.844074@already.dhcp.gene.com> Message-ID: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > > There's a warning at Ensembl about the perl api code depending on an > old version of bioperl (1.2.3) > > http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html > > Does anyone have current information about that dependency? > > My quick-n-dirty tests suggest that one can't build an app that uses > both new Bioperl and the ensembl api without ensembl picking up the > newer bioperl libraries (or your app getting the older ones). It's > not clear what parts of the ensembl world depend on the older BioPerl. I've asked this question several times of the ensembl folk w/o an adequate response. My general feeling is even they may not really know for sure (though I recall ewan saying something about feature/ annotation changes around then, and maybe something about the blastreporter). Saying that, the ensembl perl API worked for me using bioperl-live (and bioperl 1.6) as of a couple months ago. You might eventually run into some issues; if so report them back here and to the ensembl list. > Anyone have any recipes to make it work? > > Any info on a possible modernization of the ensembl code? That is completely up to the ensembl folks. bioperl 1.2.3 is full enough of bugs, and I don't plan on backporting any changes to that branch (seems kind of silly, as that branch is now about six yrs old). > Thanks, > > g. np! -chris From hlapp at gmx.net Mon Aug 24 23:17:29 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 23:17:29 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> [...] >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). The code change is for retrieving data, right? So I'm not sure how it would break backwards compatibility, unless one has taxon entries created before the change (i.e., about 3 years ago?) and through loading sequences rather than through loading the NCBI taxonomy. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 25 00:10:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:10:15 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Right, that's what I thought as well, but I just wasn't clear on that. So, basically we're saying, as long as the code change is on the retrieving side, everything's okay? Then I'm pretty sure I know how to fix it, at least partly. I can probably squeeze that in unless Sendu's working on it. Sendu? chris From cjfields at illinois.edu Tue Aug 25 00:28:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:28:26 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Okay, if possible I would like you or Sendu to review that last commit I made to bioperl-db. It includes Sendu's patch; I commented out sections that were modifying the genus/species when loaded in, but there are a few TODO's I noted as well (everything is in populate_from_row()). 02species.t is now failing but I think it's based on the same old behavior; I'll look into it. chris From geoeco at rambler.ru Tue Aug 25 03:01:24 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:01:24 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> Message-ID: <1074529971.1251183684.50392744.40754@mcgi70.rambler.ru> Hi Rohit, Thanks a lot for your comments, it actually worked well, but in fact i only want to extract species names as I want to have it in a separate file together with a fasta file with sequences. So, thanks a lot again! Anna * Rohit Ghai [Mon, 24 Aug 2009 14:53:03 +0200]: > hi > > I think you forgot to add the "seq" in the builder.. thats why the file > is > empty. > Also, the species name, though being parsed, is nowhere in the output. > Here's a version > using fasta output that you can probably customize further. This also > takes > the full > name of the organism and adds to the description line in the output. > > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'fasta'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species','seq','description'); > > while(my $seq = $seq_in->next_seq()) { > > my $desc = $seq->description(); > my $species_string = $seq->species()->binomial('FULL'); > $desc = $desc . " [$species_string]"; > $seq->description($desc); > $seq_out->write_seq($seq); > } > > exit; > > > On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova > wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact I > only > > need a first line under ORGANISM tag (e.i. genus + species). I though > that > > it would be possible to do with the SeqBuilder object by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From geoeco at rambler.ru Tue Aug 25 03:03:56 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:03:56 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Message-ID: <734135890.1251183836.48962856.71827@mcgi59.rambler.ru> hello Chris, Well, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? Thanks, Anna * Chris Fields [Mon, 24 Aug 2009 07:55:56 -0500]: > Anna, > > It's stored in the Bio::Species object. I have to say, though, I > think you're using a stick of dynamite for a scalpel here; if you only > need ORGANISM parse it out directly (it's much faster). Or am I > missing something? > > chris > > On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From geoeco at rambler.ru Tue Aug 25 03:09:43 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:09:43 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Message-ID: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> hello Hilmar, Thanks for your comments. Actually, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? the reason, why I don't want a simple parsing for species names is that i also want to be able to which gene has been sequenced while (my $inseq = $seq_in->next_seq) { if ($inseq->desc =~ m/5\.8S ribosomal RNA/) { $seq_out->write_seq($inseq); } } and only it is 5.8s rRNA I want to extract the species name and a sequences. And I thought that with direct parsing it would be much longer code. Am I wrong? i am a newbie both in bioperl and bioinformatics, so all comments would be appreciated:) Anna * Hilmar Lapp [Mon, 24 Aug 2009 10:47:34 -0400]: > Hi Anna, > > sequence formats all have some varying amount of information that must > be present or otherwise the syntax is invalid. If what you need is a > two-column table of display_id and species name, then I would simply > write that, and not squeeze it into a standard sequence format. > (Unless you actually do want the sequence too, in which case you need > to add it as a wanted slot; even in that case though, writing a three- > column table might serve you better.) > > -hilmar > > On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Aug 25 07:34:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:34:18 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> Message-ID: <4A8C2A89-C212-4969-8B01-3DA7D7DE7862@gmx.net> On Aug 25, 2009, at 12:28 AM, Chris Fields wrote: > Okay, if possible I would like you or Sendu to review that last > commit I made to bioperl-db. Will do. > [...] > 02species.t is now failing but I think it's based on the same old > behavior; I'll look into it. I would expect that if the classification array is now different, so the test will need changing to expect the "new" behavior. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Aug 25 07:52:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:52:11 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> Message-ID: <3B23691B-B165-4CC3-889E-04DE45AB1627@gmx.net> Hi Anna: On Aug 25, 2009, at 3:09 AM, Anna Kostikova wrote: > Actually, my final aim is to get 2 files: first one is a fasta file > with all the sequences, and the seconds one is simply a list of > species names Then I'd change your script to write two files: one with the sequences in FASTA format (you can use Bio::SeqIO for that), and the second one in the format you need it (one species name per line?). (Right now you are writing one file in Genbank format, which is quite unlike the above, right?) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From whs at ebi.ac.uk Tue Aug 25 07:04:23 2009 From: whs at ebi.ac.uk (William Spooner) Date: Tue, 25 Aug 2009 12:04:23 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> Message-ID: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> On 25 Aug 2009, at 03:29, Chris Fields wrote: > On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > >> >> There's a warning at Ensembl about the perl api code depending on an >> old version of bioperl (1.2.3) >> >> http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html >> >> Does anyone have current information about that dependency? >> >> My quick-n-dirty tests suggest that one can't build an app that uses >> both new Bioperl and the ensembl api without ensembl picking up the >> newer bioperl libraries (or your app getting the older ones). It's >> not clear what parts of the ensembl world depend on the older >> BioPerl. > > I've asked this question several times of the ensembl folk w/o an > adequate response. My general feeling is even they may not really > know for sure (though I recall ewan saying something about feature/ > annotation changes around then, and maybe something about the > blastreporter). > > Saying that, the ensembl perl API worked for me using bioperl-live > (and bioperl 1.6) as of a couple months ago. You might eventually > run into some issues; if so report them back here and to the ensembl > list. I'm not sure of the full list of dependencies, but my feeling is that most are related to the Ensembl application/web code; the blast interface in particular. I can support Chris's findings that the API works (AFAIK) with bioperl-live, but this is obviously untested. > >> Anyone have any recipes to make it work? >> >> Any info on a possible modernization of the ensembl code? > > That is completely up to the ensembl folks. bioperl 1.2.3 is full > enough of bugs, and I don't plan on backporting any changes to that > branch (seems kind of silly, as that branch is now about six yrs old). It would be nice if someone at Ensembl could compile a list of BioPerl dependencies. At least that would give a feel for the scope of the problem... Will From ak at ebi.ac.uk Tue Aug 25 09:43:19 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 25 Aug 2009 14:43:19 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <20090825134319.GE12422@qux.windows.ebi.ac.uk> [cut] > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... > > Will Hi Will, and list, These are the BioPerl modules that the Ensembl Core API "use" or otherwise directly call (scanned our current HEAD code): Bio::Annotation::DBLink in Bio::EnsEMBL::DBEntry Bio::Tools::CodonTable in Bio::EnsEMBL::Utils::TranscriptAlleles in Bio::EnsEMBL::PredictionTranscript in Bio::EnsEMBL::Transcript.pm Bio::LocatableSeq in Bio::EnsEMBL::DnaDnaAlignFeature Bio::PrimarySeqI in Bio::EnsEMBL::Slice Bio::Root::IO in Bio::EnsEMBL::Utils::Converter Bio::Root::Root in Bio::EnsEMBL::Utils::EasyArgv Bio::Seq in Bio::EnsEMBL::Utils::PolyA in Bio::EnsEMBL::Intron in Bio::EnsEMBL::Exon in Bio::EnsEMBL::Transcript in Bio::EnsEMBL::Translation in Bio::EnsEMBL::Utils::TranscriptAlleles Bio::SeqFeature::FeaturePair in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeature::Generic in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeatureI in Bio::EnsEMBL::SeqFeatureI Bio::SimpleAlign in Bio::EnsEMBL::DnaDnaAlignFeature Bio::Species in Bio::EnsEMBL::DBSQL::MetaContainer I have not looked at the other Ensembl APIs (Variation, FuncGen, Compara, Web, Pipeline, etc.), and I might possibly have missed references to some BioPerl modules. I have also not indicated the relative importance of any of these modules (clearly Bio::Seq is central, but I don't know how widely the code that accesses Bio::SeqFeature::Generic is used) or investigated if any of the references to BioPerl modules occur in deprecated code. As far as I know, there are currently no plans to get rid of these dependencies. Or there might be, only they are not very far up the priority list right now. I would be happy to look at conservative patches, but can not promise snappy response times. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer -{ }- European Bioinformatics Institute (EMBL-EBI) -{ }- Wellcome Trust Genome Campus, Hinxton -{ }- Cambridge CB10 1SD, United Kingdom -{ }- From cjfields at illinois.edu Tue Aug 25 10:07:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 Aug 2009 09:07:52 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <20090825134319.GE12422@qux.windows.ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <20090825134319.GE12422@qux.windows.ebi.ac.uk> Message-ID: <9D26C8FA-6D74-42C2-A2BD-4EFF529DA05A@illinois.edu> Andreas, Thanks for the response, been waiting for something a bit more official for a while now. We can definitely help you patch these as needed when problems arise, just let us know, or file a bug report listing issues. Scanning through there will be a could of future trouble spots: 1) We are very likely deprecating Bio::Species in favor of Bio::Taxon (that may be relatively easy to map, as Bio::Species now delegates to Bio::Taxon and similar anyway). 2) We will be refactoring Bio::SimpleAlign/LocatableSeq. There are too many corner cases where assumptions are made. We'll try to stick with the current API, but there may be a few delegating methods. More significantly, we're also planning a significant restructuring of bioperl prior to 1.7, basically splitting it into several (more easily maintainable) parts. The exact nature of these is still a bit fuzzy (we have to sort out dependencies) but we do plan on making a bundle package to assemble a complete old-style 'monolithic' bioperl, just a bit more customizable. It's very likely the versioning scheme will stay the same for the core (root) set of modules, but the others may end up having their own versioning for monitoring dependencies. chris On Aug 25, 2009, at 8:43 AM, Andreas K?h?ri wrote: > [cut] >> >> It would be nice if someone at Ensembl could compile a list of >> BioPerl dependencies. At least that would give a feel for the scope >> of the problem... >> >> Will > > Hi Will, and list, > > These are the BioPerl modules that the Ensembl Core API "use" or > otherwise directly call (scanned our current HEAD code): > > Bio::Annotation::DBLink > in Bio::EnsEMBL::DBEntry > > Bio::Tools::CodonTable > in Bio::EnsEMBL::Utils::TranscriptAlleles > in Bio::EnsEMBL::PredictionTranscript > in Bio::EnsEMBL::Transcript.pm > > Bio::LocatableSeq > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::PrimarySeqI > in Bio::EnsEMBL::Slice > > Bio::Root::IO > in Bio::EnsEMBL::Utils::Converter > > Bio::Root::Root > in Bio::EnsEMBL::Utils::EasyArgv > > Bio::Seq > in Bio::EnsEMBL::Utils::PolyA > in Bio::EnsEMBL::Intron > in Bio::EnsEMBL::Exon > in Bio::EnsEMBL::Transcript > in Bio::EnsEMBL::Translation > in Bio::EnsEMBL::Utils::TranscriptAlleles > > Bio::SeqFeature::FeaturePair > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeature::Generic > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeatureI > in Bio::EnsEMBL::SeqFeatureI > > Bio::SimpleAlign > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::Species > in Bio::EnsEMBL::DBSQL::MetaContainer > > > I have not looked at the other Ensembl APIs (Variation, FuncGen, > Compara, Web, Pipeline, etc.), and I might possibly have missed > references to some BioPerl modules. I have also not indicated > the relative importance of any of these modules (clearly Bio::Seq > is central, but I don't know how widely the code that accesses > Bio::SeqFeature::Generic is used) or investigated if any of the > references to BioPerl modules occur in deprecated code. > > As far as I know, there are currently no plans to get rid of these > dependencies. Or there might be, only they are not very far up the > priority list right now. I would be happy to look at conservative > patches, but can not promise snappy response times. > > > Regards, > Andreas > > -- > Andreas K?h?ri, Ensembl Software Developer -{ }- > European Bioinformatics Institute (EMBL-EBI) -{ }- > Wellcome Trust Genome Campus, Hinxton -{ }- > Cambridge CB10 1SD, United Kingdom -{ }- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From acpatel at usa.net Mon Aug 24 23:54:01 2009 From: acpatel at usa.net (Anand C. Patel) Date: Mon, 24 Aug 2009 22:54:01 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <9BA4272D-E7A1-4530-B8D8-B6156823BFDB@usa.net> I preloaded the NCBI taxonomy into the biosql database using the provided script before adding the sequences from genbank format text file (downloaded directly from genbank) using the script provided by bioperl-db, which would be what created the Bio::Species objects (I'd assume) from the text files, prior to inserting them into the database. Hope this helps, Anand On Aug 24, 2009, at 3:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>>> Hilmar Lapp wrote: >>>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>>> given that binomial() is correct. Could you file this as a bug >>>>>>> report? >>>>>> >>>>>> What code creates the Bio::Species object here? I suspect this >>>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>>> I see. Any pointer to what would tell me what I need to change >>>>> or is everything in the Bio::Species POD? >>>> >>>> ... I won't guarantee the perfection of the POD ;) >>>> >>>> >>>>> BTW what the Bioperl-db code does is instantiate the blank >>>>> object and then populate it through its accessors (mostly the >>>>> classification() array). If what it has been doing in the past >>>>> is now considered incorrect, at least it doesn't raise any >>>>> warning that would alert one to that ... >>>> >>>> Yuh... If you point out the code that creates the Bio::Species I >>>> can look into it for you and suggest what needs changing and why >>>> it doesn't work (or if it's a bug in Bio::Species). I can't >>>> remember things clearly right now, though classification() I >>>> guess was supposed to be backwards compatible. >>> Sendu, I think it's related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >>> Bio::DB::BioSQL::SpeciesAdaptor and >>> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules >>> in question i think. >> >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). I would > think this only affects sequence data loaded w/o taxonomy preloaded, > but I'm not sure. > > I suggest, if you can fix it, go ahead make the necessary change. > We can then post a big warning to BioSQL and here about the problem, > something along the lines of 'bioperl-db in svn may be backwards > incompatible with species information loaded in previous versions; > it may eat your first born' or similar. It's an absolutely > necessary fix, and may effectively kill a bunch of other db/species- > related bugs. > > chris > From dan.bolser at gmail.com Tue Aug 25 11:16:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 25 Aug 2009 16:16:14 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? Message-ID: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Hi, Can some one set $wgEnableMWSuggest on the BioPerl wiki please? http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest I generally find this a great feature to have on any MW install. Can we also create a page (usually "BioPerl:Configuration" (or '$wgSiteName:Configuration')) to report details of the specific MW configuration settings used on the wiki? This is also a good place for people to request configuration changes to tweak the way the wiki works. Cheers, Dan. From jason at bioperl.org Tue Aug 25 13:17:44 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 25 Aug 2009 10:17:44 -0700 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: Can you send sysadmin request mail to the helpdesk - support at open-bio.org so mauricio or someone can have it in the queue. [aside] I've had to stop doing OBF sysadmin work so we are definitely looking for someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris Dagdigian who do mediawiki and sysadmin support. We've reached a bit of crunch where there are lots of things to tweak and customize for the various flavors of MW installs that the projects want but we don't have enough dedicated admins to really support this. Most of us have gotten into these projects to support our own bioinformatics programming not sysadmin tasks so there is a bit of gap here. Some of us (me) were not trained as sysadmin but jumped in and figured out how to help and do it - and learned valuable life skills... =) We're discussing plans to upgrade the machines in the future which would improve performance and reliability we hope and also use this opportunity to streamline the MW installs to be a more easily maintained wikifarm. [/aside] -jason On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > Hi, > > Can some one set $wgEnableMWSuggest on the BioPerl wiki please? > > http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest > > > I generally find this a great feature to have on any MW install. Can > we also create a page (usually "BioPerl:Configuration" (or > '$wgSiteName:Configuration')) to report details of the specific MW > configuration settings used on the wiki? This is also a good place for > people to request configuration changes to tweak the way the wiki > works. > > > Cheers, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Tue Aug 25 09:45:59 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 25 Aug 2009 14:45:59 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... I just downloaded ? ensembl ? ensembl-compara ? ensembl-variation ? ensembl-functgenomics from their website and did a regex on the files for /^use (Bio::.+);/ which reveals (filtering out Bio::EnsEMBL::*): Bio::AlignIO Bio::Annotation::DBLink Bio::Das::ProServer::SourceAdaptor Bio::Das::ProServer::SourceAdaptor::Transport::generic Bio::Index::Fastq Bio::LocatableSeq Bio::Location::Simple Bio::MAGE::Experiment::Experiment Bio::MAGE::XMLUtils Bio::Perl Bio::PrimarySeq Bio::PrimarySeqI Bio::Root::Root Bio::Root::RootI Bio::Search::HSP::EnsemblHSP Bio::Seq Bio::SeqFeature::FeaturePair Bio::SeqFeature::Generic Bio::SeqFeatureI Bio::SeqIO Bio::SimpleAlign Bio::Species Bio::Tools::CodonTable Bio::Tools::Run::Phylo::PAML::Codeml Bio::TreeIO does that help? (I have the list broken down by which module/script contains which if that helps also) cheers adam From hartzell at alerce.com Tue Aug 25 16:22:20 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 13:22:20 -0700 Subject: [Bioperl-l] code review on LocatableSeq performance fix. Message-ID: <19092.18428.494334.482303@already.dhcp.gene.com> [For better or worse] I use pairs of locatable seq's to represent alignments between cDNAs (spliced mRNA) and genomic sequence. I end up using column_from_residue_number a lot to map features back and forth between the coordinate system. My sequences tend to be fairly long, and the current implementation of column_from_residue_number (which splits the sequences into arrays of individual characters) performs very badly on them. I've included below a small variation on a patch that I've been using for a while (when I pulled it up to the current bioperl-live I changed a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). Instead of creating whopping big arrays and then looping over them it breaks the sequence down into runs of residues/gaps and strides across them. It also unwinds the strandedness test and avoids the cute trick of using an anonymous sub (which saves a couple of lines in the source file but adds *signficant* overhead every time around the loop). All hail Devel::NYTProf. Chris et al.'s comments about the mysteries and vagaries of Bio::LocatableSeq makes me leary of just committing it. Anyone want to comment on it? g. Index: Bio/LocatableSeq.pm =================================================================== --- Bio/LocatableSeq.pm (revision 16001) +++ Bio/LocatableSeq.pm (working copy) @@ -423,27 +423,47 @@ unless $resnumber =~ /^\d+$/ and $resnumber > 0; if ($resnumber >= $self->start() and $resnumber <= $self->end()) { - my @residues = split //, $self->seq; - my $count = $self->start(); - my $i; - my ($start,$end,$inc,$test); - my $strand = $self->strand || 0; - # the following bit of "magic" allows the main loop logic to be the - # same regardless of the strand of the sequence - ($start,$end,$inc,$test)= ($strand == -1)? - (scalar(@residues-1),0,-1,sub{$i >= $end}) : - (0,scalar(@residues-1),1,sub{$i <= $end}); + my @chunks; + my $column_incr; + my $current_column; + my $current_residue = $self->start - 1; + my $seq = $self->seq; + my $strand = $self->strand || 0; - for ($i=$start; $test->(); $i+= $inc) { - if ($residues[$i] ne '.' and $residues[$i] ne '-') { - $count == $resnumber and last; - $count++; - } - } - # $i now holds the index of the column. - # The actual column number is this index + 1 + if ($strand == -1) { +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = -1; + $current_column = (CORE::length $seq) + 1; + } + else { +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = 1; + $current_column = 0; + } - return $i+1; + while (my $chunk = shift @chunks) { +# if ($chunk =~ m|^[\.\-]|o) { + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { + $current_column += $column_incr * CORE::length($chunk); + } + else { + if ($current_residue + CORE::length($chunk) < $resnumber) { + $current_column += $column_incr * CORE::length($chunk); + $current_residue += CORE::length($chunk); + } + else { + if ($strand == -1) { + $current_column -= $resnumber - $current_residue; + } + else { + $current_column += $resnumber - $current_residue; + } + return $current_column; + } + } + } } $self->throw("Could not find residue number $resnumber"); From hartzell at alerce.com Tue Aug 25 17:07:43 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 14:07:43 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> Message-ID: <19092.21151.457226.192791@already.dhcp.gene.com> Adam Witney writes: > > > > It would be nice if someone at Ensembl could compile a list of > > BioPerl dependencies. At least that would give a feel for the scope > > of the problem... > > I just downloaded > > $,1s"(B ensembl > $,1s"(B ensembl-compara > $,1s"(B ensembl-variation > $,1s"(B ensembl-functgenomics > > from their website and did a regex on the files for > > /^use (Bio::.+);/ > > which reveals (filtering out Bio::EnsEMBL::*): > > Bio::AlignIO > Bio::Annotation::DBLink > Bio::Das::ProServer::SourceAdaptor > Bio::Das::ProServer::SourceAdaptor::Transport::generic > Bio::Index::Fastq > Bio::LocatableSeq > Bio::Location::Simple > Bio::MAGE::Experiment::Experiment > Bio::MAGE::XMLUtils > Bio::Perl > Bio::PrimarySeq > Bio::PrimarySeqI > Bio::Root::Root > Bio::Root::RootI > Bio::Search::HSP::EnsemblHSP > Bio::Seq > Bio::SeqFeature::FeaturePair > Bio::SeqFeature::Generic > Bio::SeqFeatureI > Bio::SeqIO > Bio::SimpleAlign > Bio::Species > Bio::Tools::CodonTable > Bio::Tools::Run::Phylo::PAML::Codeml > Bio::TreeIO > > does that help? (I have the list broken down by which module/script > contains which if that helps also) What would be most useful to me would be to understand where they *need* to use release 1.2.3. Is there something magical about their use of e.g. Bio::Seq. It's worth noting that your technique won't pick up various modules that are loaded on demand by e.g. Bio::SearchIO. g. From maj at fortinbras.us Wed Aug 26 07:39:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 07:39:40 -0400 Subject: [Bioperl-l] code review on LocatableSeq performance fix. In-Reply-To: <19092.18428.494334.482303@already.dhcp.gene.com> References: <19092.18428.494334.482303@already.dhcp.gene.com> Message-ID: <55514878273F4E3F8D9E438FD2F3AB7D@NewLife> I think it's great. column_from_residue_number doesn't have any secret side effects, and the patch preserves nice integer in, nice integer out, and input and output both are 1-origin indices as far as I can tell. I say go for it- MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, August 25, 2009 4:22 PM Subject: [Bioperl-l] code review on LocatableSeq performance fix. > > [For better or worse] I use pairs of locatable seq's to represent > alignments between cDNAs (spliced mRNA) and genomic sequence. > > I end up using column_from_residue_number a lot to map features back > and forth between the coordinate system. > > My sequences tend to be fairly long, and the current implementation of > column_from_residue_number (which splits the sequences into arrays of > individual characters) performs very badly on them. > > I've included below a small variation on a patch that I've been using > for a while (when I pulled it up to the current bioperl-live I changed > a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It > passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). > > Instead of creating whopping big arrays and then looping over them it > breaks the sequence down into runs of residues/gaps and strides across > them. It also unwinds the strandedness test and avoids the cute trick > of using an anonymous sub (which saves a couple of lines in the source > file but adds *signficant* overhead every time around the loop). > > All hail Devel::NYTProf. > > Chris et al.'s comments about the mysteries and vagaries of > Bio::LocatableSeq makes me leary of just committing it. > > Anyone want to comment on it? > > g. > > Index: Bio/LocatableSeq.pm > =================================================================== > --- Bio/LocatableSeq.pm (revision 16001) > +++ Bio/LocatableSeq.pm (working copy) > @@ -423,27 +423,47 @@ > unless $resnumber =~ /^\d+$/ and $resnumber > 0; > > if ($resnumber >= $self->start() and $resnumber <= $self->end()) { > - my @residues = split //, $self->seq; > - my $count = $self->start(); > - my $i; > - my ($start,$end,$inc,$test); > - my $strand = $self->strand || 0; > - # the following bit of "magic" allows the main loop logic to be the > - # same regardless of the strand of the sequence > - ($start,$end,$inc,$test)= ($strand == -1)? > - (scalar(@residues-1),0,-1,sub{$i >= $end}) : > - (0,scalar(@residues-1),1,sub{$i <= $end}); > + my @chunks; > + my $column_incr; > + my $current_column; > + my $current_residue = $self->start - 1; > + my $seq = $self->seq; > + my $strand = $self->strand || 0; > > - for ($i=$start; $test->(); $i+= $inc) { > - if ($residues[$i] ne '.' and $residues[$i] ne '-') { > - $count == $resnumber and last; > - $count++; > - } > - } > - # $i now holds the index of the column. > - # The actual column number is this index + 1 > + if ($strand == -1) { > +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = -1; > + $current_column = (CORE::length $seq) + 1; > + } > + else { > +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = 1; > + $current_column = 0; > + } > > - return $i+1; > + while (my $chunk = shift @chunks) { > +# if ($chunk =~ m|^[\.\-]|o) { > + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { > + $current_column += $column_incr * CORE::length($chunk); > + } > + else { > + if ($current_residue + CORE::length($chunk) < $resnumber) { > + $current_column += $column_incr * CORE::length($chunk); > + $current_residue += CORE::length($chunk); > + } > + else { > + if ($strand == -1) { > + $current_column -= $resnumber - $current_residue; > + } > + else { > + $current_column += $resnumber - $current_residue; > + } > + return $current_column; > + } > + } > + } > } > > $self->throw("Could not find residue number $resnumber"); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Aug 26 10:59:24 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 26 Aug 2009 16:59:24 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis Message-ID: <4A954DCC.4050200@pasteur.fr> Hi, I am playing with Bio::Restriction::* objects and find it very useful. Especially I am filtering output for blunt and cohesive enzymes. However, there's an exception thrown when I use 'cutters' method from B::R::Analysis : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (34). End must be less than the total length of sequence (total=7) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::PrimarySeq::subseq /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 STACK: Bio::Restriction::Analysis::_cuts /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 STACK: Bio::Restriction::Analysis::cut /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 STACK: Bio::Restriction::Analysis::cutters /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion lib/Bio/Restriction/Analysis/blunt.pm:86 STACK: Bio::Restriction::Analysis::blunt::cut_in_frames lib/Bio/Restriction/Analysis/blunt.pm:65 STACK: ./check_phase.pl:213 ----------------------------------------------------------- The problem with this enzyme is that the cut site is over the enzyme recognition site (from Rebase withrefm.907): <1>BceSI <2> <3>SSAAGCG(27/27) <4> <5>Bacillus cereus <6>ATCC 10987 <7> <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. Lett., vol. 202, pp. 189-193. Xu, S.-Y., Unpublished observations. For this enzyme, here are the values stored into B::R::Enzyme object ($e): $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN $e->cut => 34 $e->string => SSAAGCG $e->seq->seq => SSAAGCG So my question is, wouldn't be faire to set B::PrimarySeq::seq with value of $e->site when such enzyme are seen in the source file. NOTE from B::R::Analysis::_enzymes_sites (commented): # The following should not be an exception, both Type I and Type III # enzymes cut outside of their recognition sequences #if ($site < 0 || $site > length($enz->string)) { # $self->throw("This is (probably) not your fault.\nGot a cut site of $site and a # sequence of ".$enz->string); # } And this is exactly the problem I'm facing! In _enzymes_sites the code is trying to subseq our sequence to get before and after seq as : $beforeseq=$enz->seq->subseq(1, $site); $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); and this throws an error as the cutting site is far over (pos 34) the enzyme know recognition site SSAAGCG (length=7). Has anybody a clue on how to fix/patch it? Thanks for any reply Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Aug 26 11:20:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 10:20:59 -0500 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: <07222470-41ED-4E17-9383-65A7D02CE9E1@illinois.edu> What version of Bioperl are you using? Mark Jensen did some refactoring of this code after the 1.6.0 release that should appear in 1.6.1; I'll be working on the first alpha for that release starting Friday. chris On Aug 26, 2009, at 9:59 AM, Emmanuel Quevillon wrote: > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type > III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Wed Aug 26 11:38:44 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Wed, 26 Aug 2009 11:38:44 -0400 Subject: [Bioperl-l] Generalized reciprocal blast Message-ID: I would like to know whether or not anyone has attempted to create a "generalized" reciprocal blast component for BioPerl? One sees papers all the time where they discuss running reciprocal blasts to compare a new species to an old "standard" species or a set of species or running an all-to-all set of comparisons to match up all of the "known" proteins from species and determine which are outliers (and therefore "novel"). There are also accumulating merged sets in NCBI HomoloGene (which seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) and Ensembl (which seems to be working with a much larger set of 40-50 genomes some of which may be somewhat incomplete and are certainly poorly "explored". I have, I believe, seen code "fragments" from various authors, perhaps some on the BioPerl list, which perform some major subset of a typical "reciprocal blast". Now what I am looking for is a relatively generalizable some-to-some reciprocal blast utility. I want to be able to specify the genes (or gene family), e.g. some of the ~150 known DNA repair genes. It would be helpful to also specify how "tolerant" the blast "true reciprocal" criteria are. There are some genes where there is a very strict 1-to-1 relationship across many genomes. But for genes which involve relatively standard domains, e.g. "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for example its more like 5-to-5 and it would be really nice to be able to specify the strictness or quality level [1] for "matching" genes (and even which genes are to be excluded because they are known to be false homologues). Then to top this off I want to be able to combine known public e.g. (HomoloGene / Uniigene / Ensembl) databases with perhaps local private databases or database subsets (e.g. emerging or specialized genomes). The goal here of course to determine the precise phylogenetic relationships between all of the DNA repair genes and how there may be gain / loss / evolution of function that can be related to species characteristics (size, longevity, etc.). Is there a generalized reciprocal blast component in BioPerl? Or is it a "build-it-yourself" situation (that I have to believe has been built probably a few dozen times by various researchers / organizations / companies)? Thanks, Robert Bradbury 1. This would be handled in BioPerl with a customizable user function which could be tailored to handle specific cases -- for example a function which when handed a set of 100 potential "matches" could go through those 100 matches, identify common domains, and then "re-rate" matches based on considerations such as the type and number of common domains, domains being in the same order, etc. I.e. criteria which may be difficult to completely generalize across entire genomes but are fairly obvious if you are looking at a graphical replication of a gene set in HomoloGene. From jason at bioperl.org Wed Aug 26 11:55:04 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 26 Aug 2009 08:55:04 -0700 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: Robert - BioPerl is has traditionally been a toolkit for building these types of pipelines and not intended to necessarily be a place for larger systems. That said, BRH is a pretty easy algorithm that could be applied with the tools in place, the main issue is what kind of lookup table you want to do for establishing the BRH. Hashes are okay, but I think BDB or Sqlite end up being more scalable and allow for persistence. Really, I would use something like OrthoMCL rather than reciprocal BLAST to identify families anyways. It uses Bioperl under the hood for parsing - though it suffers from some pretty inefficient management of the lookup table for the BRH part of the algorithm - it can be run on your own customized datasets to integrate public and private data. You might also find better luck in building good alignments for the key members of your target gene family of interest and then using a profile HMM (or even just the new HMMER3 jackhmmer or phmmer which don't require a MSA) to identify the full set of homologs in all the databases. If this is the only set of families you care about it is a lot less computational work to go through and pull these out with an HMM or HMMER search and build trees from these results rather than dealing with the computational time of the all-vs-all DB searches that you are proposing. -jason On Aug 26, 2009, at 8:38 AM, Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal > blasts to > compare a new species to an old "standard" species or a set of > species or > running an all-to-all set of comparisons to match up all of the > "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI > HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" > genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly > poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, > perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes > (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be > helpful > to also specify how "tolerant" the blast "true reciprocal" criteria > are. > There are some genes where there is a very strict 1-to-1 > relationship across > many genomes. But for genes which involve relatively standard > domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in > mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes > (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic > relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics > (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is > it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user > function which > could be tailored to handle specific cases -- for example a function > which > when handed a set of 100 potential "matches" could go through those > 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, > domains being > in the same order, etc. I.e. criteria which may be difficult to > completely > generalize across entire genomes but are fairly obvious if you are > looking > at a graphical replication of a gene set in HomoloGene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Wed Aug 26 11:20:41 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 11:20:41 -0400 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: Hi Emmanuel-- This may be fixed in the latest version of Bio::Restriction, which is not available in the standard 1.6 distribution. I suggest you try replacing the Bio/Restriction directory in your distribution with the current bioperl-live modules. You can get these by using Subversion: $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction ./Restriction If you're brave, better might be to obtain the latest trunk and reinstall; $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live $ cd bioperl-live $ perl Build.PL $ ./Build $ ./Build test $ ./Build install Please update the list with your progress- cheers Mark ----- Original Message ----- From: "Emmanuel Quevillon" To: Sent: Wednesday, August 26, 2009 10:59 AM Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Aug 26 12:03:59 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 12:03:59 -0400 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: re:aside -- I can help with this; I promise not to break anything. cheers MAJ ----- Original Message ----- From: "Jason Stajich" To: "Dan Bolser" Cc: "BioPerl List" Sent: Tuesday, August 25, 2009 1:17 PM Subject: Re: [Bioperl-l] $wgEnableMWSuggest on the wiki please? > Can you send sysadmin request mail to the helpdesk - support at open-bio.org > so mauricio or someone can have it in the queue. > > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking > for someone to help with the ALL VOLUNTEER team of now just Mauricio > and Chris Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak > and customize for the various flavors of MW installs that the projects > want but we don't have enough dedicated admins to really support > this. Most of us have gotten into these projects to support our own > bioinformatics programming not sysadmin tasks so there is a bit of gap > here. Some of us (me) were not trained as sysadmin but jumped in and > figured out how to help and do it - and learned valuable life > skills... =) > > We're discussing plans to upgrade the machines in the future which > would improve performance and reliability we hope and also use this > opportunity to streamline the MW installs to be a more easily > maintained wikifarm. > > [/aside] > > -jason > On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > >> Hi, >> >> Can some one set $wgEnableMWSuggest on the BioPerl wiki please? >> >> http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest >> >> >> I generally find this a great feature to have on any MW install. Can >> we also create a page (usually "BioPerl:Configuration" (or >> '$wgSiteName:Configuration')) to report details of the specific MW >> configuration settings used on the wiki? This is also a good place for >> people to request configuration changes to tweak the way the wiki >> works. >> >> >> Cheers, >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed Aug 26 12:25:21 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 Aug 2009 18:25:21 +0200 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <628aabb70908260925q25039506nab6e1c661f704e2a@mail.gmail.com> Hi Robert, Just to add another comment on this: The problem of identifying orthologs is quite a bit trickier than it looks, in part due to the many-to-many relationships you noted. There is a whole body of literature on this topic -- here's a recent review that includes OrthoMCL that Jason mentioned and others: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000262 (disclaimer: I work in a lab that offers one of the many attempts to solve this problem) So I would say that although it is possible to make a customizable function as you describe, there are several existing approaches (read: downloadable code you can run on your data) that would probably give better results. Dave From hsa_rim at yahoo.co.in Wed Aug 26 15:56:38 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 01:26:38 +0530 (IST) Subject: [Bioperl-l] Latest Cytoband files Message-ID: <484629.15190.qm@web94612.mail.in2.yahoo.com> Hi, Can anybody tell me how can I get latest cytoband files with stain information for homo spaiens, mus musculus and others. I am using 36.3 version of RefSeq for Humans and 36.1 version of RefSeq for mus musculus. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From cjfields at illinois.edu Wed Aug 26 16:36:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 15:36:31 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates Message-ID: All, I just pushed one very key bit for nextgen sequence analysis to svn, mainly parsing of all three FASTQ variants. These can be called by using: # grabs the FASTQ parser, specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq-illumina', -file => 'mydata.fq'); # same, explicitly specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq', -variant => 'illumina', -file => 'mydata.fq'); # simple 'fastq' format defaults to 'sanger' variant my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>mydata.fq'); FASTQ works for both input and output. As mentioned before, the next_dataset() method also exists for getting simple hashrefs, see the module documentation for more. This was one of the few remaining blockers for the 1.6.1 point release. I'll run a clean checkout of main trunk to test, then work on merging everything over from trunk starting Friday and push out 1.6.0_1 (first alpha) beginning of next week to get some CPAN Tester information. If everything looks fine the final point release will follow soon after. Cheers! chris From rmb32 at cornell.edu Wed Aug 26 16:56:20 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 Aug 2009 13:56:20 -0700 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <4A95A174.3070706@cornell.edu> Hurray! You rock Chris! R From lsbrath at gmail.com Wed Aug 26 17:08:06 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 26 Aug 2009 17:08:06 -0400 Subject: [Bioperl-l] rendering graphics from genbank files. Message-ID: <69367b8f0908261408g6750c1d2we3409a016fe186b7@mail.gmail.com> Hi, I am running into to problems rendering the 5'UTR and 3'UTR features in the graphic. I get an error message saying that these are string literals. Better yet, how do I add the 5'UTR and 3'UTR regions to the CDS feature when the only features in my genbank file are mRNA, CDS, and gene? What I want is to display the gene structure. I am using the last template provided in bioperl howto graphics. Mgavi From biopython at maubp.freeserve.co.uk Wed Aug 26 17:16:08 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 22:16:08 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> On Wed, Aug 26, 2009 at 9:36 PM, Chris Fields wrote: > All, > > I just pushed one very key bit for nextgen sequence analysis to svn, mainly > parsing of all three FASTQ variants. ?These can be called by using: > > ?# grabs the FASTQ parser, specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq-illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# same, explicitly specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -variant ? => 'illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# simple 'fastq' format defaults to 'sanger' variant > ?my $out = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> '>mydata.fq'); > > FASTQ works for both input and output. ?As mentioned before, the > next_dataset() method also exists for getting simple hashrefs, see the > module documentation for more. > > This was one of the few remaining blockers for the 1.6.1 point release. > ... ?If everything looks fine the final point release will follow soon after. It is looking much better than yesterday - nice work :) However, there are a few rough edges still. =========================== Evil wrapping =========================== Chris - Did you get the zip file of FASTQ examples I sent off list? One of these was the evil_wrapping.fastq file already in Biopython CVS/git (under a new name). This is intended as a real torture test, with line wrapped quality strings where plenty of the lines start with "+" or "@" characters. Bioperl doesn't like this file at all - but I have not dug into why. =========================== Sanger To Illumina 1.3+ =========================== When mapping a Sanger FASTQ file with very high scores to Illumina, these don't get the maximum value imposes (ASCII 126, tidle). e.g. $ ./biopython_sanger2illumina < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:676: UserWarning: Data loss - max PHRED quality 62 in Illumina FASTQ warnings.warn("Data loss - max PHRED quality 62 in Illumina FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ But, with bioperl-live SVN, $ ./bioperl_sanger2illumina < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for illumina:63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ You are using "@" (ASCI 64), which in this context means a PHRED score of zero. =========================== Sanger To Solexa =========================== Likewise when mapping a Sanger FASTQ file with very high scores to Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, tidle). For example, $ ./biopython_sanger2solexa < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; But, $ ./bioperl_sanger2solexa < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for solexa:0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFEDB@><< i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 (an odd thing to happen - getting the lowest score wouldn't surprise me so much). Furthermore, notice that PHRED scores 0 and 1 have both been mapped to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. =========================== Still, things are looking up :) Peter From maj at fortinbras.us Wed Aug 26 17:03:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 17:03:13 -0400 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <4A95A174.3070706@cornell.edu> References: <4A95A174.3070706@cornell.edu> Message-ID: <1E03634D20424F659F417AE7F5D26039@NewLife> +1 ----- Original Message ----- From: "Robert Buels" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 26, 2009 4:56 PM Subject: Re: [Bioperl-l] Next-Gen and the next point release - updates > Hurray! You rock Chris! > > R > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sac at bioperl.org Wed Aug 26 18:33:16 2009 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 26 Aug 2009 15:33:16 -0700 Subject: [Bioperl-l] MGED meeting in Phoenix, AZ, Oct 5-8 Message-ID: <8f200b4c0908261533y74c42b1aif662ef13a8fe6711@mail.gmail.com> The MGED Society's annual meeting is of potential interest to anyone working with functional genomics data sets, or interested in best practices for analyzing and annotating their functional genomics experiments. The meeting topic is "Next-Gen Sequencing and Translational Genomics" and as usual, they've got a great line-up of speakers (included below). It's in Phoenix, AZ Oct 5-8, early registration ends on 5 Sep. (Note that MGED has expanded its reach beyond just microarrays.) For more information on registration and abstract submission, go to * http://www.mgedmeeting.org* For hotel accommodations, go to * http://www.starwoodmeeting.com/StarGroupsWeb/res?id=0903232443&key=42DE2* Keynotes *Hank Greely* Deane F. and Kate Edelman Johnson Professor of Law Stanford Law School *Elaine Mardis* Associate Professor, Genetics, Molecular Microbiology Washington University in St. Louis School of Medicine *Daniel Von Hoff* Director, Clinical Translational Research Division Translational Genomics Research Institute (TGen) Plenary Speakers: *Steven Brenner* Associate Professor, Plant and Microbial Biology University of California, Berkeley *Lynda Chin* Associate Professor, Dermatology Dana Farber Cancer Institute, Harvard Medical School *David Craig* Associate Director, Neurogenomics Division Translational Genomics Research Institute (TGen) *Michael Eisen* Scientist, Lawrence Berkeley National Lab and Associate Professor Department of Molecular and Cellular Biology, University of California, Berkeley *Gad Getz* Head of Cancer Genome Analysis at the Broad Institute of MIT and Harvard *Mathieu Lupien* Assistant Professor, Genetics Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center *Joanna Mountain* Senior Director, Research 23andMe, Inc. *Dana Pe'er* Assistant Professor, Biology and Computer Science Columbia University Biological Sciences *John Quackenbush* Professor of Computational Biology & Bioinformatics, Biostatistics Dana Farber Cancer Institute, Harvard School of Public Health *Cole Trapnell* Ph. D. Student, Computer Science University of Maryland, College Park From cjfields at illinois.edu Wed Aug 26 22:52:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 21:52:13 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: On Aug 26, 2009, at 4:16 PM, Peter wrote: > It is looking much better than yesterday - nice work :) > However, there are a few rough edges still. Not unexpected, actually. > =========================== > Evil wrapping > =========================== > Chris - Did you get the zip file of FASTQ examples I sent off list? > One of > these was the evil_wrapping.fastq file already in Biopython CVS/git > (under > a new name). This is intended as a real torture test, with line > wrapped > quality strings where plenty of the lines start with "+" or "@" > characters. > Bioperl doesn't like this file at all - but I have not dug into why. Now fixed; I've saved this as very_tricky.fastq, but it's the same file. > =========================== > Sanger To Illumina 1.3+ > =========================== > When mapping a Sanger FASTQ file with very high scores to Illumina, > these don't get the maximum value imposes (ASCII 126, tidle). e.g. ... Yes, I know where that one is going wrong. Fixed now for bounds for the above. Partly related to the below. > =========================== > Sanger To Solexa > =========================== > Likewise when mapping a Sanger FASTQ file with very high scores to > Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, > tidle). For example, > > $ ./biopython_sanger2solexa < sanger_93.fastq > /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: > UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ > warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFECB@>;; > > But, > > $ ./bioperl_sanger2solexa < sanger_93.fastq > > --------------------- WARNING --------------------- > MSG: Quality values not found for > solexa: > 0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 > --------------------------------------------------- > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFEDB@><< > > i.e. You've mapped the high value scores to "<", ASCII 60, thus > Solexa -4 > (an odd thing to happen - getting the lowest score wouldn't surprise > me so > much). This one is fixed, it was the same bounding issue as above. > Furthermore, notice that PHRED scores 0 and 1 have both been mapped > to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning > Solexa -5. The two conversions to solexa are still failing. I'm not sure but I think it's something fairly simple, but I can't work on it until Friday (got too many other things on my plate ATM). If I get stumped I'll post a message. > =========================== > > Still, things are looking up :) > > Peter Yes they are, much more so that previously. I'll add these to the tests. chris From tuco at pasteur.fr Thu Aug 27 04:28:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Thu, 27 Aug 2009 10:28:41 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: References: <4A954DCC.4050200@pasteur.fr> Message-ID: <4A9643B9.7000709@pasteur.fr> Mark A. Jensen wrote: > Hi Emmanuel-- > This may be fixed in the latest version of Bio::Restriction, which is not > available in the standard 1.6 distribution. I suggest you try replacing the > Bio/Restriction directory in your distribution with the current > bioperl-live > modules. You can get these by using Subversion: > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction > ./Restriction Hi Mark, Thanks for pointing me to this svn repo. I've just updated the Bio::Restriction::* part just to test it. I don't get any error anymore. I just need to continue working on this with my ideas. I'll let you know if I encounter any other problem. Cheers Emmanuel > > If you're brave, better might be to obtain the latest trunk and reinstall; > > $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > $ cd bioperl-live > $ perl Build.PL > $ ./Build > $ ./Build test > $ ./Build install > > Please update the list with your progress- > cheers > Mark >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From dan.bolser at gmail.com Thu Aug 27 06:34:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 27 Aug 2009 11:34:00 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: <2c8757af0908270334kcb3dfc4w17553e65f7e0e4b5@mail.gmail.com> 2009/8/25 Jason Stajich : > Can you send sysadmin request mail to the helpdesk - support at open-bio.org?so > mauricio or someone can have it in the queue. OK. > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking for > someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris > Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak and > customize for the various flavors of MW installs that the projects want but > we don't have enough dedicated admins to really support this. ?Most of us I know how you feel! > have gotten into these projects to support our own bioinformatics > programming not sysadmin tasks so there is a bit of gap here. Some of us > (me) were not trained as sysadmin but jumped in and figured out how to help > and do it - and learned valuable life skills... =) > > We're discussing plans to upgrade the machines in the future which would > improve performance and reliability we hope and also use this opportunity to > streamline the MW installs to be a more easily maintained wikifarm. Sounds like a good idea. There are also extensions that put more of the MW config on the website itself (restricted to admins of course). Dan. From hsa_rim at yahoo.co.in Thu Aug 27 07:14:03 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 16:44:03 +0530 (IST) Subject: [Bioperl-l] Mapping of genome with cytoband Message-ID: <29549.68962.qm@web94610.mail.in2.yahoo.com> Hi, I need gene , mrna , cds , sts and exon files as per the mapping with cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and .gbk files but the genes and other features are not coming across the whole chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from .gbk / .gbs files the locations on chromosome 1 genes show only half way on the ideogram graph. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From biopython at maubp.freeserve.co.uk Thu Aug 27 07:55:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 27 Aug 2009 12:55:55 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: <320fb6e00908270455y2a80907chfae8007df60e72e2@mail.gmail.com> On Thu, Aug 27, 2009 at 3:52 AM, Chris Fields wrote: > > On Aug 26, 2009, at 4:16 PM, Peter wrote: > >> It is looking much better than yesterday - nice work :) >> However, there are a few rough edges still. > > Not unexpected, actually. > >> =========================== >> Evil wrapping >> =========================== >> Chris - Did you get the zip file of FASTQ examples I sent off list? One of >> these was the evil_wrapping.fastq file already in Biopython CVS/git (under >> a new name). This is intended as a real torture test, with line wrapped >> quality strings where plenty of the lines start with "+" or "@" >> characters. >> Bioperl doesn't like this file at all - but I have not dug into why. > > Now fixed; I've saved this as very_tricky.fastq, but it's the same file. Looks good. >> =========================== >> Sanger To Illumina 1.3+ >> =========================== >> When mapping a Sanger FASTQ file with very high scores to Illumina, >> these don't get the maximum value imposes (ASCII 126, tidle). e.g. > > ... > > Yes, I know where that one is going wrong. ?Fixed now for bounds for the > above. ?Partly related to the below. Looks good. >> =========================== >> Sanger To Solexa >> =========================== >> Likewise when mapping a Sanger FASTQ file with very high scores to >> Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, >> tidle). For example, >> ... >> i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 >> (an odd thing to happen - getting the lowest score wouldn't surprise me so >> much). > > This one is fixed, it was the same bounding issue as above. Yes, the high score truncation looks good. >> Furthermore, notice that PHRED scores 0 and 1 have both been mapped >> to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. > > The two conversions to solexa are still failing. ?I'm not sure but I think > it's something fairly simple, but I can't work on it until Friday (got too > many other things on my plate ATM). ?If I get stumped I'll post a message. Actually it's not just PHRED 0 and 1 that look wrong, all of the low scores are messed up. I could repeat this using the sanger_93.fastq file, but to avoid email line wrapping here I'm using a smaller example file with PHRED scores in the range 40 to 0 only: $ cat sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! Biopython: $ python ./biopython_sanger2solexa.py < sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; BioPerl SVN (with Chris' latest fixes): $ ./bioperl_sanger2solexa.pl < sanger_faked.fastq --------------------- WARNING --------------------- MSG: Data loss for solexa: following values exceed max 62 0 --------------------------------------------------- @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFDCA?=~ The last ten characters are wrong (i.e. PHRED score 0 to 9, which is precisely the range where the PHRED/Solexa mapping is non trivial). Also note that data loss warning is misleading (0 is less than 62). Plus you get the exactly same problems with Illumina to Solexa. This should narrow it down - the bug is in mapping PHRED scores (from either Sanger or Illumina 1.3+ files) to the Solexa encoding. Peter From sanjaysingh765 at gmail.com Thu Aug 27 09:59:13 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Thu, 27 Aug 2009 19:29:13 +0530 Subject: [Bioperl-l] query about libwww-perl collection Message-ID: hello, i want to use libwww-perl collection to query BLINK with multiple queries. it works in very good way for single but how can i used it for multiple queries...lz help me out regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From bosborne11 at verizon.net Thu Aug 27 11:10:30 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 11:10:30 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: > > ----- Original Message ----- From: "Hilmar Lapp" > ... >> As for the FASTA example, I can understand - I've heard repeatedly >> from people that one of the things that they are missing is >> documentation for every SeqIO format we support (such as GenBank, >> UniProt, FASTA, etc) about where to find a particular piece of the >> format in the object model. > .... > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help create > our list of action items. > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 27 13:38:45 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:38:45 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations Message-ID: <4A96C4A5.9090406@cornell.edu> Hi all, Recently a user came into #bioperl looking to truncate an annotated sequence (leaving the region between e.g. 150 to 250 nt), and have the annotations from the original sequence be remapped onto the new truncated sequence. Poking through code, I came across an undocumented function trunc() that from the comments looks like it was written by Jason as part of a master plan to implement this very functionality. Just wondering, what's the status of that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rmb32 at cornell.edu Thu Aug 27 13:40:41 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:40:41 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C4A5.9090406@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> Message-ID: <4A96C519.3020001@cornell.edu> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 Rob Robert Buels wrote: > Hi all, > > Recently a user came into #bioperl looking to truncate an annotated > sequence (leaving the region between e.g. 150 to 250 nt), and have the > annotations from the original sequence be remapped onto the new > truncated sequence. > > Poking through code, I came across an undocumented function trunc() that > from the comments looks like it was written by Jason as part of a master > plan to implement this very functionality. > > Just wondering, what's the status of that? > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 27 14:20:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 13:20:42 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C519.3020001@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> Message-ID: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> It's not implemented completely. As Jason mentioned in the bug report, it was meant to be part of an overall system to truncate sequences with remapped features, but the implementation in place is substandard. It's open for implementation if anyone wants to take it up. I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal with this in a more elegant and lightweight way, and is probably the direction I would take. YMMV. chris On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 > > Rob > > Robert Buels wrote: >> Hi all, >> Recently a user came into #bioperl looking to truncate an annotated >> sequence (leaving the region between e.g. 150 to 250 nt), and have >> the annotations from the original sequence be remapped onto the new >> truncated sequence. >> Poking through code, I came across an undocumented function trunc() >> that from the comments looks like it was written by Jason as part >> of a master plan to implement this very functionality. >> Just wondering, what's the status of that? >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Aug 27 14:41:28 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 11:41:28 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> Message-ID: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Yeah one thought that we batted around at a hackathon many moons ago had been to use Bio::DB::SeqFeature in a lightweight way under the hood to represent sequences in layers more rather than the arbitrary data model that is setup by focusing on handling GenBank records. A lot of the architecture development (that is like 10-15 years old now!) was initially just focused on round-tripping the sequence files. We more recently felt like a new model was more appropriate. With the fast SQLite implementation that Lincoln has put in for DB::SeqFeature we could in theory map every sequence into a SQLite DB and then have the power of the interface. Some more bells and whistles might be needed but the basic API is respected AFAIK and it prevents needing to store whole sequences in memory. The SeqIO->DB::SeqFeature loading would need some finessing so that as parsed the sequence object could be updated efficiently. Actually this might also help reduce the number of objects needed to be created by basically efficiently serializing sequences into the DB on parsing (and with some simple caching this could make for pretty fast system). Since disk is basically not a limitation now could be an interesting experiment? Maybe it is too out there, but if not it could be something major enough that it has to go in a bioperl-2/ bioperl-ng. It sort of assumes the data model of Bio::DB::SeqFeature is adequate for all the messiness of sequence data formats and one problem for some people has been the seq file format => GFF in order to load it into a SeqFeature DB for Gbrowse... So I don't know what are the boundary cases here. Certainly for FASTA it should be straightforward. -jason On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > It's not implemented completely. As Jason mentioned in the bug > report, it was meant to be part of an overall system to truncate > sequences with remapped features, but the implementation in place is > substandard. It's open for implementation if anyone wants to take > it up. > > I should point out, though, in my opinion Bio::DB::GFF/SeqFeature > deal with this in a more elegant and lightweight way, and is > probably the direction I would take. YMMV. > > chris > > On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > >> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >> >> Rob >> >> Robert Buels wrote: >>> Hi all, >>> Recently a user came into #bioperl looking to truncate an >>> annotated sequence (leaving the region between e.g. 150 to 250 >>> nt), and have the annotations from the original sequence be >>> remapped onto the new truncated sequence. >>> Poking through code, I came across an undocumented function >>> trunc() that from the comments looks like it was written by Jason >>> as part of a master plan to implement this very functionality. >>> Just wondering, what's the status of that? >>> Rob >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From lsbrath at gmail.com Thu Aug 27 15:04:36 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 27 Aug 2009 15:04:36 -0400 Subject: [Bioperl-l] rendering the 5' & 3' UTR in a graphic Message-ID: <69367b8f0908271204p7f153be1p6673faac931b646d@mail.gmail.com> Hello, I am able to render all of the features except the 5' & 3' UTR. This is how the features part of the Genbank file looks: FEATURES Location/Qualifiers source 1..185000 /note="locus_tag=Nbl1" /organism="Mus musculus" gene 142646..153328 /note="locus_tag=Nbl1" /gene="ENSMUSG00000041120" /note="neuroblastoma, suppression of tumorigenicity 1 [Source:MGI;Acc:MGI:104591]" 5'UTR 142646..150000 /note="Nbl1" mRNA join(142646..142794,149973..150167,150269..150380, 152019..153328) /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" CDS join(150001..150167,150269..150380,152019..152276) /db_xref="CCDS:CCDS18839.1" /db_xref="MGI:Nbl1" /db_xref="Vega_mouse_transcript:OTTMUST00000022949" /protein_id="ENSMUSP00000045608" /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" misc_feature 150001..152276 /note="deletion" 3'UTR 152277..153328 /gene="Nbl1" ORIGIN - 1 GACCAGAGCC ACTCGCTAGG AGTCACACCG AGCCTGGGGG TCCGAAGGGA ACAGCATCAA He is the code: # file: embl2picture.pl # This is code example 6 in the Graphics-HOWTO # Author: Lincoln Stein use strict; #use lib "$ENV{HOME}/projects/bioperl-live"; use Bio::Graphics; use Bio::SeqIO; use constant USAGE =>< Render a GenBank/EMBL entry into drawable form. Return as a GIF or PNG image on standard output. File must be in embl, genbank, or another SeqIO- recognized format. Only the first entry will be rendered. Example to try: embl2picture.pl factor7.embl | display - END my $file = shift or die USAGE; my $io = Bio::SeqIO->new(-file=>$file) or die USAGE; my $seq = $io->next_seq or die USAGE; my $wholeseq = Bio::SeqFeature::Generic->new( -start => 1, -end => $seq->length, -display_name => $seq->display_name ); # script reads the features from the sequence object by calling all_SeqFeatures() my @features = $seq->all_SeqFeatures; # sorts each feature by its primary tag into a hash # of array references named %sorted_features my %sorted_features; my %want = map {$_ =>1} qw/source CDS gene utr5prime utr3prime mRNA misc_feature/; for my $f (@features) { #get cds, primer_bind, and genes features only my $tag = $f->primary_tag; # create a hash of $f keys and $tag values #push @{$sorted_features{$tag}},$f if ($tag =~ /CDS|gene|mRNA|source|misc_feature|5'UTR|3'UTR/); push @{$sorted_features{$tag}},$f if ($want{$tag}); } # we create the Bio::Graphics::Panel object. # As in previous examples, we specify the width of the image, # as well as some extra white space to pad out the left and right borders. my $panel = Bio::Graphics::Panel->new( -length => $seq->length, -key_style => 'between', -width => 400, -pad_left => 10, -pad_right => 10, ); # We now add two tracks, one for the scale # and the other for the sequence as a whole. $panel->add_track($wholeseq, -glyph => 'arrow', -bump => 0, -double => 1, -tick => 2, -bgcolor => 'blue', -label => 1, ); =cut $panel->add_track($wholeseq, -glyph => 'generic', -bgcolor => 'blue', -label => 1, ); =cut # Locate primary tag of "CDS" and create a track using a glyph # at creation time. After we handle this special case, we remove # the CDS feature type from the %sorted_features associative array. if ($sorted_features{CDS}) { $panel->add_track($sorted_features{CDS}, -glyph => 'transcript2', -bgcolor => 'orange', -fgcolor => 'black', -font2color => 'red', -key => 'CDS', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'CDS'}; } # Locate primary tag of "mRNA" and create a track using a glyph # at creation time. After we handle this special case, we remove # the mRNA feature type from the %sorted_features associative array. if ($sorted_features{mRNA}) { $panel->add_track($sorted_features{mRNA}, -glyph => 'transcript2', -bgcolor => 'red', -fgcolor => 'black', -font2color => 'red', -key => 'mRNA', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'mRNA'}; } #=cut # Locate primary tag of "5'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 5'UTR feature type from the %sorted_features associative array. if ($sorted_features{utr5prime}) { $panel->add_track($sorted_features{utr5prime}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => 'utr5prime', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{utr5prime}; } =cut # Locate primary tag of "3'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 3'UTR feature type from the %sorted_features associative array. if ($sorted_features{3\'UTR}) { $panel->add_track($sorted_features{'3\'UTR'}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => '3\'UTR', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'3\'UTR'}; } =cut # general case # Create a track for each feature type. In order to distinguish the tracks by color, # we initialize an array of 9 color names and simply cycle through them my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua); my $idx = 0; for my $tag (sort keys %sorted_features) { my $features = $sorted_features{$tag}; $panel->add_track($features, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'red', -key => "${tag}s", -bump => +1, -height => 8, # -description option to point to a subroutine # that will generate more informative description strings. -description => \&generic_description, ); } binmode(STDOUT); print $panel->png; exit 0; sub gene_label { my $feature = shift; my @notes; foreach (qw(product gene)) { @notes = eval {$feature->get_tag_values($_)}; last; } $notes[0]; } sub gene_description { my $feature = shift; my @notes; foreach (qw(note)) { # Notice that we place calls to get_tag_values() inside eval{} blocks # in order to avoid having an exception raised if the feature does not # have a tag with the desired value. @notes = eval{$feature->get_tag_values($_)}; last; } return unless @notes; substr($notes[0],30) = '...' if length $notes[0] > 30; $notes[0]; } sub generic_description { my $feature = shift; my $description; foreach ($feature->get_all_tags) { my @values = $feature->get_tag_values($_); $description .= $_ eq 'note' ? "@values" : "$_=@values; "; } $description =~ s/; $//; # get rid of last $description; } sub fp_utr{ my $five_prime_utr = '5\'UTR'; return $five_prime_utr; } This is how the image currently looks: Any ideas why I am unable to render the 5' & 3' UTR features? From jorvis at gmail.com Thu Aug 27 15:23:05 2009 From: jorvis at gmail.com (Joshua Orvis) Date: Thu, 27 Aug 2009 15:23:05 -0400 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: I should weigh in here since I am the above-mentioned 'user' who posed the question in #bioperl. To clarify, to train one particular gene finder I need to take a full genbank file with annotation for a whole genome and create separate gbk records, one for each gene. Each record will then contain the gene, exon coordinates for the CDS and sequence for the gene. I can iterate through the features of the full record and do the math myself for each spliced coordinate, making/writing individual records as I go, but thought I would see if BioPerl had any mechanism to extract a region of an annotated record and treat the starting base of that extraction as position 1, recoordinating all the other features that were present. Then I could just iterate through the features of the whole entry, extracting regions for each gene as I see them. Hopefully this makes sense. Joshua On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > > Yeah one thought that we batted around at a hackathon many moons ago had > been to use Bio::DB::SeqFeature in a lightweight way under the hood to > represent sequences in layers more rather than the arbitrary data model that > is setup by focusing on handling GenBank records. A lot of the architecture > development (that is like 10-15 years old now!) was initially just focused > on round-tripping the sequence files. We more recently felt like a new model > was more appropriate. With the fast SQLite implementation that Lincoln has > put in for DB::SeqFeature we could in theory map every sequence into a > SQLite DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is respected > AFAIK and it prevents needing to store whole sequences in memory. The > SeqIO->DB::SeqFeature loading would need some finessing so that as parsed > the sequence object could be updated efficiently. > > Actually this might also help reduce the number of objects needed to be > created by basically efficiently serializing sequences into the DB on > parsing (and with some simple caching this could make for pretty fast > system). Since disk is basically not a limitation now could be an > interesting experiment? Maybe it is too out there, but if not it could be > something major enough that it has to go in a bioperl-2/bioperl-ng. It > sort of assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some people has > been the seq file format => GFF in order to load it into a SeqFeature DB for > Gbrowse... So I don't know what are the boundary cases here. Certainly for > FASTA it should be straightforward. > > -jason > > On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > > It's not implemented completely. As Jason mentioned in the bug report, it >> was meant to be part of an overall system to truncate sequences with >> remapped features, but the implementation in place is substandard. It's >> open for implementation if anyone wants to take it up. >> >> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >> with this in a more elegant and lightweight way, and is probably the >> direction I would take. YMMV. >> >> chris >> >> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >> >> Looks like bug 1572 is related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>> >>> Rob >>> >>> Robert Buels wrote: >>> >>>> Hi all, >>>> Recently a user came into #bioperl looking to truncate an annotated >>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>> annotations from the original sequence be remapped onto the new truncated >>>> sequence. >>>> Poking through code, I came across an undocumented function trunc() that >>>> from the comments looks like it was written by Jason as part of a master >>>> plan to implement this very functionality. >>>> Just wondering, what's the status of that? >>>> Rob >>>> >>> >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Aug 27 16:00:24 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 13:00:24 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: So when I did this for the retraining of AUGUSTUS I loaded all my gene models in Bio::DB::GFF as GFF3 and then just extracted each locus I needed +/- some surrounding sequence context and wrote it out as genbank file. There might have been one or two problems collapsing the features back into Genbank's concept of a CDS as a single-feature rather than individual, but I just make a split-location and added the sub-pieces to it. It was only a few lines of code to do it right - the flatten/unflatten being one of the most annoying parts maybe we could work out to streamline. -jason On Aug 27, 2009, at 12:23 PM, Joshua Orvis wrote: > I should weigh in here since I am the above-mentioned 'user' who > posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate > gbk > records, one for each gene. Each record will then contain the gene, > exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the > math myself > for each spliced coordinate, making/writing individual records as I > go, but > thought I would see if BioPerl had any mechanism to extract a region > of an > annotated record and treat the starting base of that extraction as > position > 1, recoordinating all the other features that were present. Then I > could > just iterate through the features of the whole entry, extracting > regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich > wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons >> ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood >> to >> represent sequences in layers more rather than the arbitrary data >> model that >> is setup by focusing on handling GenBank records. A lot of the >> architecture >> development (that is like 10-15 years old now!) was initially just >> focused >> on round-tripping the sequence files. We more recently felt like a >> new model >> was more appropriate. With the fast SQLite implementation that >> Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence >> into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is >> respected >> AFAIK and it prevents needing to store whole sequences in memory. >> The >> SeqIO->DB::SeqFeature loading would need some finessing so that as >> parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed >> to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). Since disk is basically not a limitation now could be an >> interesting experiment? Maybe it is too out there, but if not it >> could be >> something major enough that it has to go in a bioperl-2/bioperl- >> ng. It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate >> for all >> the messiness of sequence data formats and one problem for some >> people has >> been the seq file format => GFF in order to load it into a >> SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. >> Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> It's not implemented completely. As Jason mentioned in the bug >> report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is >>> substandard. It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature >>> deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an >>>>> annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and >>>>> have the >>>>> annotations from the original sequence be remapped onto the new >>>>> truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function >>>>> trunc() that >>>>> from the comments looks like it was written by Jason as part of >>>>> a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Thu Aug 27 16:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 15:19:56 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: On Aug 27, 2009, at 1:41 PM, Jason Stajich wrote: > Yeah one thought that we batted around at a hackathon many moons ago > had been to use Bio::DB::SeqFeature in a lightweight way under the > hood to represent sequences in layers more rather than the arbitrary > data model that is setup by focusing on handling GenBank records. A > lot of the architecture development (that is like 10-15 years old > now!) was initially just focused on round-tripping the sequence > files. We more recently felt like a new model was more appropriate. > With the fast SQLite implementation that Lincoln has put in for > DB::SeqFeature we could in theory map every sequence into a SQLite > DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is > respected AFAIK and it prevents needing to store whole sequences in > memory. The SeqIO->DB::SeqFeature loading would need some finessing > so that as parsed the sequence object could be updated efficiently. Exactly my thought. Probably worth pushing the FeatureHolderI interface into something like a SeqFeature::Collection. What about annotation? Maybe add that to the 'source' feature? Also makes me think Seq needs to be RangeI (or potentially locatable to another sequence). Bio::DB::SF::Segment is. I'm thinking the old way of doing it (parsing a file) is still possible, but underneath would be an Bio::Index or similar, and the returned Bio::Seq would have a backend Bio::Index/ Bio::SeqFeature::Collection database (the latter maybe being lazily implemented). > Actually this might also help reduce the number of objects needed to > be created by basically efficiently serializing sequences into the > DB on parsing (and with some simple caching this could make for > pretty fast system). Since disk is basically not a limitation now > could be an interesting experiment? Yes. > Maybe it is too out there, but if not it could be something major > enough that it has to go in a bioperl-2/bioperl-ng. It sort of > assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some > people has been the seq file format => GFF in order to load it into > a SeqFeature DB for Gbrowse... So I don't know what are the boundary > cases here. Certainly for FASTA it should be straightforward. > > -jason Well, one could possibly test something like this on a branch, or with their own Bio::Seq, or in Biome ;> Just sayin'.... chris From maj at fortinbras.us Thu Aug 27 20:58:34 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 27 Aug 2009 20:58:34 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Message-ID: <4C2E185C74CF449495BC8FDC26419702@NewLife> Thanks Brian; these are really valuable insights and suggestions. Of course, the "todo list" is not "mine", but the community's (otherwise, I would have used Post-its), and I have added your action items to it. My thinking about a survey is twofold. Intermittent users may, likely will, have different issues than the usual suspects here on the list, or they will put those issues in a different way--likely with more expression of affect, which I personally think is key. It seems to me that documentation is the public face of this project, and hearing visceral reactions from "the public" will help us (or me) prioritize. The other fold is, this kind of data is better acquired a) actively, rather than passively ("Please respond to this thread") and b) anonymously. Obviously, it can't be active in the sense of spamming, but we could reduce the energy barrier by providing something clickable with a few textboxes to the list. cheers MAJ ----- Original Message ----- From: Brian Osborne To: Mark A. Jensen Cc: BioPerl List ; Chris Fields Sent: Thursday, August 27, 2009 11:10 AM Subject: Re: [Bioperl-l] on BP documentation Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: ----- Original Message ----- From: "Hilmar Lapp" ... As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Aug 27 22:00:01 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 22:00:01 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <4C2E185C74CF449495BC8FDC26419702@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> <4C2E185C74CF449495BC8FDC26419702@NewLife> Message-ID: <047387CF-C3AD-4E2E-8FB8-091AB23D5FEE@verizon.net> Mark, As you wish. As I said, the one who does the work calls the shots, this is not a democracy. The fundamental problem is, and I speak with some experience here, that detailed examination of documentation is of so little interest that participation in the survey will be limited ("the usual suspects"), and the results will be skewed. You're not going to get reactions from "the public", the thousands of Bioperl users. But, if you feel comfortable with the notion that a survey will justify your actions, do it. But honestly, I know that you already know what to do. Brian O. On Aug 27, 2009, at 8:58 PM, Mark A. Jensen wrote: > My thinking about a survey is twofold. Intermittent users may, > likely will, have different issues than the usual suspects here on > the list, or they will put those issues in a different way--likely > with more expression of affect, which I personally think is key. It > seems to me that documentation is the public face of this project, > and hearing visceral reactions from "the public" will help us (or > me) prioritize. The other fold is, this kind of data is better > acquired a) actively, rather than passively ("Please respond to this > thread") and b) anonymously. Obviously, it can't be active in the > sense of spamming, but we could reduce the energy barrier by > providing something clickable with a few textboxes to the list. From David.Messina at sbc.su.se Fri Aug 28 04:40:47 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 Aug 2009 10:40:47 +0200 Subject: [Bioperl-l] on BP documentation Message-ID: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> > - Use Dobfuscator links to reveal method documentation > -- Most notably in SeqIO HOWTO Do you mean to click on a method name in a HOWTO and open up the Deobfuscator view of that method's documentation? I like that. > -- Does Deobfuscator have a bug or two that need to be fixed? I use > it, it seems to work but I've heard a rumor... It's true -- sometimes the Deobfuscator claims that a method isn't documented when it is. Mark, I can commit to fixing this. It's long overdue, so I'm happy to use your doc push as an impetus. Dave From maj at fortinbras.us Fri Aug 28 07:31:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 28 Aug 2009 07:31:05 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> References: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> Message-ID: Dave-- thanks for stepping up- MAJ ----- Original Message ----- From: "Dave Messina" To: "Brian Osborne" Cc: "Mark A. Jensen" ; "BioPerl List" ; "Chris Fields" Sent: Friday, August 28, 2009 4:40 AM Subject: Re: [Bioperl-l] on BP documentation > >> - Use Dobfuscator links to reveal method documentation >> -- Most notably in SeqIO HOWTO > > Do you mean to click on a method name in a HOWTO and open up the Deobfuscator > view of that method's documentation? I like that. > > >> -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it >> seems to work but I've heard a rumor... > > It's true -- sometimes the Deobfuscator claims that a method isn't documented > when it is. > > Mark, I can commit to fixing this. It's long overdue, so I'm happy to use > your doc push as an impetus. > > > Dave > > > From fgarret at ub.edu Fri Aug 28 12:37:54 2009 From: fgarret at ub.edu (Filipe Garrett) Date: Fri, 28 Aug 2009 18:37:54 +0200 Subject: [Bioperl-l] splice alignment Message-ID: <4A9807E2.4080608@ub.edu> Hi all, I need to analyse the 1st, 2nd and 3rd positions of an alignment separately. I've been through BioPerl pages but couldn't find no direct way to do it. The closest I fond was "slice" (AlignI) but it just extracts a contiguous subsequence. Is there any subroutine that does the job? Or maybe a more generic one, so we can select the columns to be extracted; eg: @aln_pos = qw/1,4,7,10,13,14,17,20/; $aln_1 = $aln->get_pos(@aln_pos); thanks in adv, FG -- Filipe G. Vieira Departament de Genetica Universitat de Barcelona Av. Diagonal, 645 08028 Barcelona SPAIN Phone: +34 934 035 306 Fax: +34 934 034 420 fgarret at ub.edu http://www.ub.edu/molevol/ From mmorley at mail.med.upenn.edu Fri Aug 28 17:18:28 2009 From: mmorley at mail.med.upenn.edu (Michael Morley) Date: Fri, 28 Aug 2009 17:18:28 -0400 Subject: [Bioperl-l] How to plot coverage using Bio::DB::Sam and Bio::Graphics? Message-ID: <4A9849A4.7060702@mail.med.upenn.edu> Have a few questions some perhaps too simple which I know I should have been able to find the answers but have eluded me. Problem: What I want to do visualize coverage (Illumina RNA-seq) across a gene for 40 or so samples. I thought about gbrowse but what I was hoping to was to use Bio::Graphics and created a few PNGs of the genes I'm interested in, nothing too fancy. My current attempt: So I've used Bio::DB::Sam (thank you LDS!!,great package) as following.. Works perfect. my $features = $sam->features(-type=>'coverage',-seq_id=>$chrom,-start=>$genomest,-end=>$genomest); Then I tried this: $panel->add_track($features, -glyph => 'xyplot', -graph_type=>'histogram', ); After poking at the return of '-type=converge', I don't think this is possible directly but any ideas how I can do it? The coverage is too deep in the region to plot every sequence in the alignment, I was able to do it just was not useful. One last question.. I also would like to plot the gene model as well. If I simply grab the genbank file for refseq NM###, the features only have exon,cds,etc and coordinates based off the mRNA seq. So how does one get the genomic info and then create the track for a gene/transcript as you would see in gbrowse? Any help I'd greatly appreciate it! -Michael From roy.chaudhuri at gmail.com Sat Aug 29 09:22:53 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Sat, 29 Aug 2009 23:22:53 +1000 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: <1372eece0908290622mc21f297w503225242d82ada9@mail.gmail.com> Hi Joshua, A couple of years ago I did implement (in a fairly hacky way) a trunc_with_features method that does exactly this. It was incorporated into Bio::SeqUtils and is still there as far as I know. Maybe it would be suitable for your purposes? Roy. 2009/8/28 Joshua Orvis : > I should weigh in here since I am the above-mentioned 'user' who posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate gbk > records, one for each gene. ?Each record will then contain the gene, exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the math myself > for each spliced coordinate, making/writing individual records as I go, but > thought I would see if BioPerl had any mechanism to extract a region of an > annotated record and treat the starting base of that extraction as position > 1, recoordinating all the other features that were present. ?Then I could > just iterate through the features of the whole entry, extracting regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood to >> represent sequences in layers more rather than the arbitrary data model that >> is setup by focusing on handling GenBank records. ?A lot of the architecture >> development (that is like 10-15 years old now!) was initially just focused >> on round-tripping the sequence files. We more recently felt like a new model >> was more appropriate. ?With the fast SQLite implementation that Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is respected >> AFAIK and it prevents needing to store whole sequences in memory. ?The >> SeqIO->DB::SeqFeature loading would need some finessing so that as parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). ?Since disk is basically not a limitation now could be an >> interesting experiment? ?Maybe it is too out there, but if not it could be >> something major enough that it has to go in a bioperl-2/bioperl-ng. ? It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate for all >> the messiness of sequence data formats and one problem for some people has >> been the seq file format => GFF in order to load it into a SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. ?Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> ?It's not implemented completely. ?As Jason mentioned in the bug report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is substandard. ?It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. ?YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> ?Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>>> annotations from the original sequence be remapped onto the new truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function trunc() that >>>>> from the comments looks like it was written by Jason as part of a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY ?14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adlai at refenestration.com Sun Aug 30 12:16:41 2009 From: adlai at refenestration.com (adlai burman) Date: Sun, 30 Aug 2009 18:16:41 +0200 Subject: [Bioperl-l] Install on host server Message-ID: Hey there, I have an embarrassingly silly question. I have BioPerl set up and working on my computer. Does anyone here know if there is a standard way to ask one's hosting server to install BioPerl so you can use it within a web page? Barring that, is there a standard way to set it up for your own domain on a hosting server that knows nothing about BioPerl? Thanks, Adlai From ymc at yahoo.com Mon Aug 31 02:10:10 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 30 Aug 2009 23:10:10 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? Message-ID: <472878.20951.qm@web30402.mail.mud.yahoo.com> Hi Chris I added a check for LocatableSeq in dpAlign.pm. It will now create an Bio::Seq object internally to copy the sequence in LocatableSeq but taking out all the gaps. This should make it behave properly. I commited the updated Bio/Tools/dpAlign.pm to SVN. In dpAlign.pm, I also added a note saying what will happen if you supplied LocatableSeq to the functions in this module. With regard to that warning, I think the person who reported the bug misused the instantiator of LocatableSeq. He/she can't use the length of the sequence with gaps as the "end". The "end" should be the length without gaps. Let me know if you have any questions or concerns. Have a great day! Yee Man --- On Wed, 8/19/09, Yee Man Chan wrote: > From: Yee Man Chan > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 8:01 PM > I noticed that the $qalseq is a > LocatableSeq with gaps. I don't think my program was written > to support LocatableSeq with gaps. If I removed the gaps, > then I would have the scores agree with each other which > should be the desired outcome. > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9986984 residue count gives end value > 104. > Overriding value [101] with value 104 for > Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 291 > Getting score for ABC|9986984 -> ABC|9944760 > = 291 > > Do you think I should check for this LocatableSeq type and > give an error or should I remove the gaps if this is a > LocatableSeq? > > Yee Man > > > --- On Wed, 8/19/09, Chris Fields > wrote: > > > From: Chris Fields > > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for > CPAN, was Re:? Problems with Bioperl-ext package on > WinVista? > > To: "Yee Man Chan" > > Cc: "Robert Buels" , > "BioPerl List" > > Date: Wednesday, August 19, 2009, 7:49 AM > > I'll have a look.? It's probably > > something that hasn't been updated to deal with > > LocatableSeq's pathological end point checking. > > > > chris > > > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > > > > I tried that sample script that reportedly caused > the > > dpAlign "bug" but I can't reproduced it. All I get is > a > > warning from LocatableSeq. > > > ------------------------------------------- > > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > > "-Iblib/lib" "-Iblib/arch" > > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > > > --------------------- WARNING > --------------------- > > > MSG: In sequence ABC|9944760 residue count gives > end > > value 101. > > > Overriding value [104] with value 101 for > > Bio::LocatableSeq::end(). > > > > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > > > --------------------------------------------------- > > > Getting score for ABC|9944760 -> ABC|9986984 > > > = 300 > > > Getting score for ABC|9986984 -> ABC|9944760 > > > = 303 > > > ------------------------------------------ > > > > > > Does the test script crash in your machine? > > > > > > Yee Man > > > > > > --- On Tue, 8/18/09, Chris Fields > > wrote: > > > > > >> From: Chris Fields > > >> Subject: Re: Packaging Bio::Ext::HMM for > CPAN, was > > Re: [Bioperl-l] Problems with Bioperl-ext package on > > WinVista? > > >> To: "Robert Buels" > > >> Cc: "Yee Man Chan" , > > "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM > > >> On Aug 18, 2009, at 11:37 PM, Robert > > >> Buels wrote: > > >> > > >>> Yee Man Chan wrote: > > >>>> Is it going to be an arrangement > similar > > to > > >> bioconductor? If so, I suppose then it makes > > sense. But you > > >> might want to develop scripts to > automatically > > download and > > >> install new modules to make it user > friendly. > > >>> Yes, we are probably going to make a > > Task::BioPerl or > > >> something similar. > > >>> > > >>>> What do you mean by Bio-Ext is going > away? > > I > > >> notice quite many people using dpAlign. So > if > > Bio-Ext is > > >> going away, then at least dpAlign should > become > > another spin > > >> off. > > >>> By going away, I meant that everything > in > > there is > > >> going to be spinned off.? Except modules > that > > are no > > >> longer maintainable, if there are any in > there. > > >>> > > >>> Rob > > >> > > >> dpAlign could become another spinoff, yes, if > it's > > used > > >> (and works fine).? The problematic code > dealt > > with pSW, > > >> alignment statistics, and staden io_lib > support > > (the latter > > >> which is fairly bit rotted now): > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > >> > > >> dpAlign has it's own bug: > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > >> > > >> chris > > >> > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > From tuco at pasteur.fr Mon Aug 31 10:13:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Mon, 31 Aug 2009 16:13:41 +0200 Subject: [Bioperl-l] Can't add track to Panel Bio::Graphics Message-ID: <4A9BDA95.2020109@pasteur.fr> Hi, I'm trying to create png image using Bio::Graphics. I followed the Howto available at bioperl.org. I'm stacked when trying to add new track to my panel. So far, I can create the panel, add 2 tracks, then, probably mistaking, I can add more tracks to my panel. Here is the code. my $panel = Bio::Graphics::Panel->new( -length => $self->seq()->length(), -width => 800, -pad_top => 5, -pad_bottom => 5, -pad_left => 5, -pad_right => 5, #-key_style => 'between', ); my $bsg = Bio::SeqFeature::Generic->new( -start => 1, -seq => $self->seq()->seq(), -end => $self->seq()->length(), -display_name => $self->seq()->id(). " (".$self->seq->length()." na)", ); $bsg->attach_seq($self->seq()); #Display the reference sequence ############ #### Those 2 tracks are well displayed on the final image ########### $panel->add_track($bsg, -glyph => 'dna', -label => 1); $panel->add_track($bsg, -glyph => 'arrow', -tick => 2, -fgcolor => 'black'); #Build, if present, the single cut if(keys %$spositions){ #Create the specail track for the single cut my $strack = $panel->add_track( -glyph => 'crossbox', -label => 1, -fgcolor => 'red', -key => 'Single cut', -connector => 'dashed', ); foreach my $enz (sort { $a cmp $b } keys %{$spositions->{$strand}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->start(), -end => $spositions->{$strand}->{$enz}->{$enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->end(), -end => $spositions->{$strand}->{$enz}->{$enz}->end()); $strack->add_feature($bsfg); $strack->add_feature($bsfg2); } } #Build, if present, the double cut if(keys %$dpositions){ my $dtrack = $panel->add_track( -glyph => 'crossbox', -label => 1, -key => 'Double cut', -connector => 'dashed', ); foreach my $couple (sort { $a cmp $b } keys %{$dpositions->{$strand}}){ foreach my $cc_enz (sort { $a cmp $b } keys %{$dpositions->{$strand}->{$couple}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $couple, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->start(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $cc_enz, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->end(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->end()); $dtrack->add_feature($bsfg); $dtrack->add_feature($bsfg2); } } } print $panel->png(); Can somebody tell me what I'm missing or doing wrong? Thanks for any help Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From marcelo011982 at gmail.com Mon Aug 31 14:12:58 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Mon, 31 Aug 2009 15:12:58 -0300 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> Message-ID: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> done: #!/usr/bin/perl -w use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'Rpp2Blast.txt'); ... while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION #---------------------------------------------- my $accGB = $hit->description(); $accGB =~ m/(gb=.*?\s)/; #---------------------------------------------- print MYFILE ... $1,"\t" , #numero de acesso ao genbank ... $hsp->hit->end, "\t","\n"; ... } } } On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata wrote: > hi all.. > I was doing a script that take some information of the results of blastn > files. > Everythig was ok, but i have some dificult to pic the Genbank code number > (the 'gb' below). > I tried > > $obj->each_accession_number > $hit->name > > And some variation of this. > > > > ------------------------------ > >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h > segment 1 gmrtDrNS01 > Glycine max cDNA 3', mRNA sequence /clone_end=3' > /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 > Length = 853 > > Score = 1336 bits (674), Expect = 0.0 > Identities = 793/832 (95%), Gaps = 8/832 (0%) > Strand = Plus / Minus > > > Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt > 294917 > |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| > Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc > 794 > ---------------------------------------- > > > But, i still don't get it. > > thank you > with regards > Miwata > From jason at bioperl.org Mon Aug 31 15:49:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 31 Aug 2009 12:49:08 -0700 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> Message-ID: <4DBC8ED9-6D98-414A-A361-3FAB3EEE955C@bioperl.org> if you run blastall with -I T (show GI's in defline) you will also be able to get the genbank identifier out with $hit->ncbi_gi through some automagic parsing of the ID line -jason On Aug 31, 2009, at 11:12 AM, Marcelo Iwata wrote: > done: > > #!/usr/bin/perl -w > use strict; > use Bio::SearchIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => 'Rpp2Blast.txt'); > ... > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION > #---------------------------------------------- > my $accGB = $hit->description(); > $accGB =~ m/(gb=.*?\s)/; > #---------------------------------------------- > > > print MYFILE > ... > > $1,"\t" , #numero de acesso ao genbank > ... > $hsp->hit->end, "\t","\n"; > ... > > } > } > } > > > > On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata >wrote: > >> hi all.. >> I was doing a script that take some information of the results of >> blastn >> files. >> Everythig was ok, but i have some dificult to pic the Genbank code >> number >> (the 'gb' below). >> I tried >> >> $obj->each_accession_number >> $hit->name >> >> And some variation of this. >> >> >> >> ------------------------------ >>> gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water >>> stressed 5h >> segment 1 gmrtDrNS01 >> Glycine max cDNA 3', mRNA sequence /clone_end=3' >> /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 >> Length = 853 >> >> Score = 1336 bits (674), Expect = 0.0 >> Identities = 793/832 (95%), Gaps = 8/832 (0%) >> Strand = Plus / Minus >> >> >> Query: 294858 >> aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt >> 294917 >> |||||||||||| |||||| ||||||||||||||||| >> |||||||||||||||||||| >> Sbjct: 853 >> aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc >> 794 >> ---------------------------------------- >> >> >> But, i still don't get it. >> >> thank you >> with regards >> Miwata >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Russell.Smithies at agresearch.co.nz Mon Aug 31 17:43:25 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 1 Sep 2009 09:43:25 +1200 Subject: [Bioperl-l] Mapping of genome with cytoband In-Reply-To: <29549.68962.qm@web94610.mail.in2.yahoo.com> References: <29549.68962.qm@web94610.mail.in2.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB81F183@exchsth.agresearch.co.nz> Have you tried getting the data from UCSC (or the test site: http://genome-test.cse.ucsc.edu ) If you use Galaxy to get the data then convert to gff, it may save a bit of work. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shafeeq rim > Sent: Thursday, 27 August 2009 11:14 p.m. > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Mapping of genome with cytoband > > Hi, > > I need gene , mrna , cds , sts and exon files as per the mapping with > cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and > .gbk files but the genes and other features are not coming across the whole > chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from > .gbk / .gbs files the locations on chromosome 1 genes show only half way on > the ideogram graph. > > Thanks > > > > See the Web's breaking stories, chosen by people like you. Check out > Yahoo! Buzz. http://in.buzz.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Sat Aug 1 00:35:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 1 Aug 2009 00:35:04 -0400 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: References: Message-ID: <99E27D08408340B9B0611751A17DF266@NewLife> Sorry, I cut off the last script. The entire thing follows: /usr/local/bin/conv-ASMake.sh : #!/usr/bin/sed -f #converting an ActiveState PERL Makefile to run under cygwin make: s/^DIRFILESEP = ^\\/DIRFILESEP = \// s/^NOOP = rem/NOOP = :/ # -or- NOOP = echo -n # byebye volume s/C:/\/cygdrive\/c/ # sed to convert directory \ to / s/\([\)0-9a-zA-Z.]\)\\\([\(0-9a-zA-Z]\)/\1\/\2/g # convert full perl s/\/usr\/bin\/perl/\/cygdrive\/c\/Perl\/bin\/perl/ # a key conversion for DOC_INSTALL action /^DESTINSTALLVENDORHTMLDIR/ a\ DECYGDESTINSTALLARCHLIB = $(subst /cygdrive/c,c:,$(DESTINSTALLARCHLIB)) # --- MakeMaker tools_other section: # let cygwin do native linux commands /^MAKE/ c\ MAKE = make /^CHMOD/ c\ CHMOD = chmod /^CP/ c\ CP = cp /^MV/ c\ MV = mv /^NOOP/ c\ NOOP = : /^RM_F/ c\ RM_F = rm -f /^RM_RF/ c\ RM_RF = rm -rf /^TEST_F[^I]/ c\ TEST_F = test -f /^TOUCH/ c\ TOUCH = touch /^TEST_S/ c\ TEST_S = test -s /^DEV_NULL/ c\ DEV_NULL = > /dev/null 2>&1 /^ECHO[^_]/ c\ ECHO = echo /^ECHO_N/ c\ ECHO_N = echo -n # override OS-specific File::Spec /^MOD_INSTALL/ c\ MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e "use File::Spec::Cygwin;@File::Spec::ISA=('File::Spec::Cygwin');" -e "map { s[/cygdrive/c][] } @ARGV;install({@ARGV}, '$(VERBINST)', 0, '$(UNINST)');" -- /^FIXIN/ c\ FIXIN = $(PERLRUN) "-MExtUtils::MY" -e "MY->fixin(shift)" # remove cygwin volume prefix for doc installs /Appending installation info to/ s/DESTIN/DECYGDESTIN/ /perllocal\.pod/ s/DESTIN/DECYGDESTIN/ /NOECHO) \$(MKPATH/ s/DESTIN/DECYGDESTIN/ #end conv-ASMake.sh ----- Original Message ----- From: "Jonathan Cline" To: Cc: Sent: Friday, July 31, 2009 11:24 PM Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl >I recently mentioned working on Bio::Robotics for Tecan. Vendors > being MS-Win specific, the vendor software allows third-party software > communication through a named pipe (the literal filename is > "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific > and this pseudo-pipe is opened with sysopen() ). This is broken under > cygwin-perl due to cygwin's method of handling paths -- the sysopen > fails. However it works under ActiveState Perl and communication > through the named pipe (to the robot hardware) is OK. The standard > workaround is usually to use cygwin bash, and force the PATH to use > ActiveState perl. (Typical MS Windows incompatibility problem.) The > issue is: Perl module libraries for CPAN work under cygwin-perl > (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN > module use, or "make test", result in a bad list of incompatibility > problems. Yet ActiveState Perl is required for communicating to the > vendor application (unless there is some workaround to raw filesystem > access in cygwin-perl that I haven't found in 2 days of working this). > The stand-alone scripts I have work fine to access the named pipe > (using ActiveState Perl) since the standalone scripts have no module > INC dependencies, no CPAN module test harness, etc etc. > > This isn't specifically a Bio:: issue, though if anyone has > suggestions please email. I could try msys and see if it handles the > named-pipe-special-file better, if msys has an msys-perl distribution. > > -- > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jncline at gmail.com Sun Aug 2 23:32:20 2009 From: jncline at gmail.com (Jonathan Cline) Date: Sun, 02 Aug 2009 22:32:20 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> Message-ID: <4A765A44.7030902@gmail.com> Smithies, Russell wrote: > I "acquired" an old Biomek 1000 that I'm thinking of modernising. It was originally controlled by a monstrously large but slow pc (IBM Value Point Model 466DX2 computer with Microsoft Windows* Version 3.1) > My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) and use software like mach3 www.machsupport.com along with G-code to control it. > I come from an engineering background so it seemed like the easy way to me :-) > > Now I just need a bit of free time to get it working... > > --Russell > > > I agree, that's probably the best way to go. It's hard to know what amount of s/w processing was done on the host PC vs. the embedded controller. If you were able to connect directly to the robot hardware with serial port(s) or whatever it's using, it would be tough to find out the comm protocol unless someone has already reverse engineered it (which is doubtful). Also from what I have seen online, attempting to run the old software under virtual machine is unpredictable due to timing differences in the serial port communication. So removal of the old electronics is probably the best bet. If it has one arm, then it's much easier. As for robots with working workstation software, it seems the annoyance factor is that while the scripting languages are powerful (for GUI scripting that is), they are still relatively low level. Bio types with a bit of CS seem to immediately turn to visual basic, labview, or even excel spreadsheets and macros, in order to provide a higher level abstraction for the workstation software. To me, it seems natural that there should be a "protocol compiler" which takes biology protocols as input, and gives robot instructions as output (google "protolexer"). The huge bottleneck of course is that everyone's robotics work tables and equipment are somewhat unique to their needs. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >> Sent: Thursday, 30 July 2009 2:07 p.m. >> To: bioperl-l at lists.open-bio.org >> Cc: Jonathan Cline >> Subject: [Bioperl-l] Bio::Robotics namespace discussion >> >> I am writing a module for communication with biology robotics, as >> discussed recently on #bioperl, and I invite your comments. >> >> Currently this mode talks to a Tecan genesis workstation robot ( >> http://images.google.com/images?q=tecan genesis ). Other vendors are >> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >> 'net with the exception of some visual basic and labview scripts which I >> have found. There are some computational biologists who program for >> robots via high level s/w, but these scripts are not distributed as OSS. >> >> With Tecan, there is a datapipe interface for hardware communication, as >> an added $$ option from the vendor. I haven't checked other vendors to >> see if they likewise have an open communication path for third party >> software. By allowing third-party communication, then naturally the >> next step is to create a socket client-server; especially as the robot >> vendor only support MS Win and using the local machine has typical >> Microsoft issues (like losing real time communication with the hardware >> due to GUI animation, bad operating system stability, no unix except >> cygwin, etc). >> >> >> On Namespace: >> >> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >> s/w modules already called 'robots' (web spider robots, chat bots, www >> automate, etc) so I chose the longer name "robotics" to differentiate >> this module as manipulating real hardware. Bio::Robotics is the >> abstraction for generic robotics and Bio::Robotics::(vendor) is the >> manufacturer-specific implementation. Robot control is made more >> complex due to the very configurable nature of the work table (placement >> of equipment, type of equipment, type of attached arm, etc). The >> abstraction has to be careful not to generalize or assume too much. In >> some cases, the Bio::Robotics modules may expand to arbitrary equipment >> such as thermocyclers, tray holders, imagers, etc - that could be a >> future roadmap plan. >> >> Here is some theoretical example usage below, subject to change. At >> this time I am deciding how much state to keep within the Perl module. >> By keeping state, some robot programming might be simplified (avoiding >> deadlock or tracking tip state). In general I am aiming for a more >> "protocol friendly" method implementation. >> >> >> To use this software with locally-connected robotics hardware: >> >> use Bio::Robotics; >> >> my $tecan = Bio::Robotics->new("Tecan") || die; >> $tecan->attach() || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack1"); >> $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray", to >> => "DNATray"); >> ... >> >> To use this software with remote robotics hardware over the network: >> >> # On the local machine, run: >> use Bio::Robotics; >> >> my @connected_hardware = Bio::Robotics->query(); >> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >> @connected_hardware\n"; >> $tecan->attach() || die; >> $tecan->configure("my work table configuration file") || die; >> # Run the server and process commands >> while (1) { >> $error = $tecan->server(passwordplaintext => "0xd290"); >> if ($tecan->lastClientCommand() =~ /^shutdown/) { >> last; >> } >> } >> $tecan->detach(); >> exit(0); >> >> # On the remote machine (the client), run: >> use Bio::Robotics; >> >> my $server = "heavybio.dyndns.org:8080"; >> my $password = "0xd290"; >> my $tecan = Bio::Robotics->new("Tecan"); >> $tecan->connect($server, $mypassword) || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack200"); >> $tecan->pipette(aspirate => "1", dispense => "1", >> from => "sampleTray A1", to => "DNATray A2", >> volume => "45", liquid => "Buffer"); >> $tecan->pipette(drop => "1"); >> ... >> $tecan->disconnect(); >> exit(0); >> >> >> >> -- >> >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From dan.bolser at gmail.com Tue Aug 4 08:03:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:03:00 +0100 Subject: [Bioperl-l] problem with t/LocalDB/SeqFeature.t when host ne localhost In-Reply-To: References: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> Message-ID: <2c8757af0908040503oe2a258dkac4311bb099dc3ac@mail.gmail.com> 2009/7/31 Chris Fields : > Dan, > > Can you file this as a BioPerl bug? ?I'm planning on driving towards > releasing 1.6.1 alpha1 soon (next few weeks) and I would like to get this > one fixed. http://bugzilla.open-bio.org/show_bug.cgi?id=2899 Dan. From dan.bolser at gmail.com Tue Aug 4 08:14:02 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:14:02 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0908040514w198085cfgf4a1adc344095f36@mail.gmail.com> 2009/4/27 Heikki Lehvaslaiho : > Dan, > > Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. > > Test and extend, > > ? ?-Heikki Thanks for help with this. I finally got round to looking at the code (after several others had done the same). I have messed with the code a bit, and added a 'mask_below_threshold' method [1] and some tests to go with it (including some extra tests) [2]. Cheers, Dan. [1] http://bugzilla.open-bio.org/show_bug.cgi?id=2897 [2] http://bugzilla.open-bio.org/show_bug.cgi?id=2898 > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> I'll take your code and put it into bioperl-live rewritten the way I >> suggested and add few tests. >> >> That should get you started, >> >> ? -Heikki >> >> 2009/4/27 Dan Bolser : >>> Hi Heikki, >>> >>> Thanks very much for the advice on how to better implement the clear >>> range method within the Bio::Seq::Quality object. I can understand the >>> logic of what you have written, and it all sounds reasonable. The only >>> problem is that I am very inexperienced with working on object >>> oriented Perl (my 'one man' projects to date have never really >>> required me to think beyond scripts, and its been years since I >>> actually tried to code objects in Perl). >>> >>> To be specific, when you say, "Lets add a method that sets the >>> threshold and stores it internally as $self->_threshold", ignoring any >>> other functionality, what would that method look like? in particular, >>> how would $self->_threshold be implemented? >>> >>> I think once I see that detail, I can go ahead and try to code what >>> you suggested. >>> >>> >>> Similarly (Chris), where would I put the tests / how would they be implemented? >>> >>> >>> Thanks again for the feedback. >>> >>> All the best, >>> Dan. >>> >>> >>> >>> 2009/4/27 Heikki Lehvaslaiho : >>>> Dan, >>>> >>>> It looks like your method does two different things: >>>> >>>> 1. Returns the longest subsequence above the threshold >>>> 2. Analyses the the sequence for the number of ranges the current >>>> threshold creates. >>>> >>>> Why not separate these functions? >>>> >>>> Lets add a method that sets the threshold and stores it internally as >>>> $self->_threshold. Setting it to a new values should trigger emptying >>>> all the caches (see below.) >>>> >>>> Lets have two more public methods: >>>> >>>> 1. get_clean_range() - optional argument 'threshold' >>>> >>>> It returns the longest clean subseq. >>>> >>>> 2. count_clean_ranges() -again optional argument 'threshold' >>>> >>>> This returns the number of ranges detected. >>>> >>>> Both methods call first the public method threshold if the argument >>>> has been given and then an internal method ?_find_clean_ranges(). That >>>> method calculates all the ranges and stores them internally ?(as >>>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>>> that needs to be emptied whenever any of the critical values of the >>>> object changes: threshold, quality or seq. Create an internal method >>>> $self->_clear_cache, that does that. >>>> >>>> Now the quality new object does not get created until you call >>>> get_clean_range() which accesses the cached values (or creates them if >>>> they are not there). >>>> >>>> This design allows you to have no extra penalty for adding more >>>> methods that act on cached values. For example, it might be sensible >>>> thing to do ?at some point to look at all the ranges that are longer >>>> than some length. Then you could write in your program: >>>> >>>> >>>> $qual->threshold(10); >>>> if ($qual->count_clean_ranges = 1) { >>>> ?my $newqual = $qual->get_clean_range() >>>> ?# do your analysis >>>> } elsif ($qual->count_clean_ranges = 0) { >>>> ? # do some reporting and logging >>>> } else { ?# more than one ranges >>>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>>> ? # do some more work and possibly select the best one(s) >>>> } >>>> >>>> >>>> >>>> Yours, >>>> >>>> ? -Heikki >>>> >>>> 2009/4/24 Chris Fields : >>>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>>> possible, tests don't hurt either! >>>>> >>>>> chris >>>>> >>>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>>> >>>>>> Its a bit rough and ready, but it does what I need... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> =head2 get_clear_range >>>>>> >>>>>> Title ? ?: get_clear_range >>>>>> >>>>>> Title ? ?: subqual >>>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>>> Function : Get the clear range using the given quality score as a >>>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>>> >>>>>> Returns ?: a new Bio::Seq::Quality object >>>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>>> >>>>>> =cut >>>>>> >>>>>> sub get_clear_range >>>>>> { >>>>>> ? my $self = shift; >>>>>> ? my $qual = $self->qual; >>>>>> ? my $minQual = shift || 13; >>>>>> >>>>>> ? my (@ranges, $rangeFlag); >>>>>> >>>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Log the range >>>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? ? ? ?else{ >>>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? } >>>>>> ? ## Did we exit the last clear range? >>>>>> ? if(defined($rangeFlag)){ >>>>>> ? ? ? ?my $i = scalar(@$qual); >>>>>> ? ? ? ?## Log the range >>>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? } >>>>>> >>>>>> ? unless(@ranges){ >>>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>>> ? } >>>>>> >>>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>>> >>>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>>> >>>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>>> >>>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>>> ? ? ? ?} >>>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>>> >>>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>>> $_->[1]+1), >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>>> $_->[1]+1) >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>>> ? } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>>> in (apart from all the debugging output that I spit out). >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>>> >>>>>> 2009/4/24 Dan Bolser : >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I couldn't find out how to get the 'clear range' from a >>>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>>> >>>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>>> I take the time to finish that off. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Dan. >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>>> -- >>>> ? ?-Heikki >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +27 (0)714328090 >>>> Sent from Claremont, WC, South Africa >>>> >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From dan.bolser at gmail.com Tue Aug 4 12:32:31 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 17:32:31 +0100 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> Message-ID: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> 2009/7/28 shalabh sharma : > Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to find > overall percentage similarity between them. > How i can do that? Tried using blast? You can download that. Try asking in irc://irc.freenode.net/#bioinformatics Dan. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shameer at ncbs.res.in Tue Aug 4 12:43:40 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 4 Aug 2009 22:13:40 +0530 (IST) Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> Message-ID: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Hello Shalabh, You may try ALISTAT. Available as a part of SQUID library from Prof. Sean Eddy. Make an alignment of your 100 sequences and use alignment as input of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ Best, Khader Shameer > 2009/7/28 shalabh sharma : >> Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to >> find >> overall percentage similarity between them. >> How i can do that? > > Tried using blast? > > You can download that. > > > Try asking in irc://irc.freenode.net/#bioinformatics > > Dan. > > >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 4 13:36:34 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 Aug 2009 13:36:34 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Message-ID: <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> Hi All, thanks a lot. @Khader Shameer, ALISTAT is what i was looking for. But still it gives you the average identity, what i need exactly is the average similarity. Thanks Shalabh Sharma On Tue, Aug 4, 2009 at 12:43 PM, K. Shameer wrote: > Hello Shalabh, > > You may try ALISTAT. Available as a part of SQUID library from Prof. Sean > Eddy. Make an alignment of your 100 sequences and use alignment as input > of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ > > Best, > Khader Shameer > > > 2009/7/28 shalabh sharma : > >> Hi All, I have some protein sequences (around 100) i need to > >> find > >> overall percentage similarity between them. > >> How i can do that? > > > > Tried using blast? > > > > You can download that. > > > > > > Try asking in irc://irc.freenode.net/#bioinformatics > > > > Dan. > > > > > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From shalabh.sharma7 at gmail.com Wed Aug 5 09:31:21 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 Aug 2009 09:31:21 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> Message-ID: <9fcc48c70908050631q1a080b74x12e81985b455332e@mail.gmail.com> Hi, Thanks for the reply. I used clustalW for the MSA. Also i was just wondering that what if i use smith Waterman (EMBOSS' water) and pass the same library as query sequences and reference library, then just parse it and calculate average similarity.Is this right approach? Thanks Shalabh On Wed, Aug 5, 2009 at 3:10 AM, Dan Bolser wrote: > 2009/8/4 shalabh sharma : > > Hi All, thanks a lot. > > @Khader Shameer, ALISTAT is what i was looking for. But still it gives > you > > the average identity, what i need exactly is the average similarity. > > The problem is that identity is well defined. Similarity is more > vague, and at least depends on a particular alignment scoring matrix. > How did you align your sequences? > > Dan. > > >> > Try asking in irc://irc.freenode.net/#bioinformatics > >> > > > ;-) > From michael.watson at bbsrc.ac.uk Wed Aug 5 09:50:35 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 5 Aug 2009 14:50:35 +0100 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank Message-ID: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Hi I want to download GSS sequences using Bio::DB::GenBank. When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. I'm using bioperl 1.5.1. Any clues? Mick From rmb32 at cornell.edu Wed Aug 5 11:28:46 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 05 Aug 2009 08:28:46 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <4A79A52E.7000104@cornell.edu> I think you're looking for the -db => 'nucgss' option. I'll add a better listing of this (undocumented) options to the Bio::DB::Query::GenBank docs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu michael watson (IAH-C) wrote: > Hi > > I want to download GSS sequences using Bio::DB::GenBank. > > When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. > > I'm using bioperl 1.5.1. > > Any clues? > > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Wed Aug 5 12:16:04 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 5 Aug 2009 09:16:04 -0700 Subject: [Bioperl-l] Job opening at Genentech [SSF, CA]. Message-ID: <19065.45124.4999.922147@already.dhcp.gene.com> I have an opening in my group in the Bioinformatics department at Genentech [South San Francisco, CA]. At the moment (for the next year or so) our main focus is rebuilding and extending a system for collecting, processing, and disseminating information about mutations and variations (think web interfaces, relational databases, alignments, workflows/pipelines). In the future we'll pick up projects related to next-gen sequencing (Me too!!! In the future, what isn't related to next-gen?), data integration, and/or lab-specific projects. First and foremost I'm looking for someone who's sharp and who enjoys computers, biology, and technology; someone who gets excited about picking up new tools but who also has a sense of responsibility and restraint. I'm looking for someone who's familiar with several languages and tools; modern Perl complemented with C is my first choice these days, supplemented with R and (when necessary) anything from the rest of the programming language bestiary. There's a fair amount of Java flying around here too so familiarity with it and the JVM world will help. Relational databases are part of the picture: Oracle for the big stuff; SQLite, Postgresql, and MySQL play niche roles. I generally interact with them via ORM's, lately it's been Rose::DB::Object on the Perl side though I've been convinced to take another look at DBIx::Class. Most of my web apps use CGI::Application, as fastcgi's, mod_perl, or simple CGI scripts, but (as with ORM's) I may take another look at Catalyst. I'm looking for someone who's interested in building real software. We'll be putting together a set of tools and data that need to hang together and evolve for at least 4-5 years. Deploy and run won't cut it. Requirements will change, so it's important to me that we build things so they're as modular and flexible as possible. Testing, source control, and documentation matter. A strong candidate will have an understanding of basic bioinformatics concepts and the ability to pick up new biology and computer science concepts as necessary. At the junior end of the spectrum I'd expect a bachelor's degree + 3 years of experience, at the upper end would a masters + 5 years (or a PhD interested in moving towards the production side of the house). I can imagine running through one or more detail oriented interview questions that drilled down (or took of on a tangent) from the following: - What's the difference between Smith-Waterman, blast, sim4, gmap, and/or bowtie alignment algorithms or tools? Which would you use when, and why? - Why is Moose better than Class::Accessor? (yes, it's Perl centered, but it could spin out into any language [e.g. why is Java better than Perl?]). What's a MOP? Who cares? - CVS, subversion, git, mercurial. You've already picked one? Which one? Why? Why not? - XML or JSON or YAML. Pick one for moving data back and forth in an Ajax based interface. Why? Would it also work well in other contexts? - How would you store information about positional features on a genome so that you could get fast random access? How would your solution tie into a larger data context? Genentech's a great place to work: solid salaries, great benefits, Bay Area location (who could ask for more?). We're open source friendly and with the arrival Robert Gentleman (our new Director, of Bioconductor/R fame) likely to become more so. The recent Roche acquisition hasn't changed life much, it seems to mostly be a source of opportunities for those of us in Research. If you know anyone who fits the bill, have them drop me a note. Thanks! g. From hilgert at cshl.edu Wed Aug 5 16:27:28 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Wed, 5 Aug 2009 16:27:28 -0400 Subject: [Bioperl-l] Bio::SeqIO issue Message-ID: Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org From cjfields at illinois.edu Wed Aug 5 17:04:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:04:14 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > Is my impression correct that Bio::SeqIO just assumes that sequences > are > being submitted in FASTA format? No. See: http://www.bioperl.org/wiki/HOWTO:SeqIO SeqIO tries to guess at the format using the file extension, and if one isn't present makes use of Bio::Tools::GuessSeqFormat. It's possible that the extension is causing the problem, or that GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to guessing). In any case, it's always advisable to explicitly indicate the format when possible. Relevant lines: return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/i; ... return 'raw' if /\.(txt)$/i; > In our experience, implementing > Bio::SeqIO led to the first line of files being cut off, regardless of > whether the files were indeed fasta files or files that only contained > sequence. Files that only contain sequence are 'raw'. Ones in FASTA are 'fasta'. > Which, in the latter, led to sequence submissions that had the > first line of nucleotides removed. Has anyone tried to write a fix for > this? This sounds like a bug, but we have very little to go on beyond your description. What version of bioperl are you using, OS, etc? What does your data look like? File extension? chris > Thanks, > > Uwe > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Aug 5 17:03:04 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:03:04 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40624DA61@EX02.asurite.ad.asu.edu> SeqIO is just a base framework for reading/writing of files. If you want it to read a fasta format, then you tell it create it the object. $seqio = Bio::SeqIO->new(-format=>'fasta'); Will tell the program to use Bio::SeqIO::fasta for the object. Look at the docs for the various formats that Bio::SeqIO supports. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hilgert, Uwe Sent: Wednesday, August 05, 2009 1:27 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::SeqIO issue Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 5 17:37:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:37:52 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> Message-ID: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Uwe, Please keep replies on the list. It's very possible that's the issue; IIRC the fasta parser pulls out the full sequence in chunks (based on local $/ = "\n>") and splits the header off as the first line in that chunk. You could probably try leaving the format out and letting SeqIO guess it, or passing the file into Bio::Tools::GuessSeqFormat directly, but it's probably better to go through the files and add a file extension that corresponds to the format. chris On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > Thanks, Chris. The files have no extension, but we indicate what > format > to use, like in the manual: > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > I wonder now whether this could exactly cause the problem: as we are > telling that input files are in fasta format they are being treated as > such (=remove first line) - regardless of whether they really are > fasta? > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > Uwe Hilgert, Ph.D. > Dolan DNA Learning Center > Cold Spring Harbor Laboratory > > C: (516) 857-1693 > V: (516) 367-5185 > E: hilgert at cshl.edu > F: (516) 367-5182 > W: http://www.dnalc.org > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, August 05, 2009 5:04 PM > To: Hilgert, Uwe > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >> Is my impression correct that Bio::SeqIO just assumes that sequences >> are >> being submitted in FASTA format? > > No. See: > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > SeqIO tries to guess at the format using the file extension, and if > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > possible that the extension is causing the problem, or that > GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to > guessing). In any case, it's always advisable to explicitly indicate > the format when possible. > > Relevant lines: > > return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > i; > ... > return 'raw' if /\.(txt)$/i; > >> In our experience, implementing >> Bio::SeqIO led to the first line of files being cut off, regardless >> of >> whether the files were indeed fasta files or files that only >> contained >> sequence. > > Files that only contain sequence are 'raw'. Ones in FASTA are > 'fasta'. > >> Which, in the latter, led to sequence submissions that had the >> first line of nucleotides removed. Has anyone tried to write a fix >> for >> this? > > This sounds like a bug, but we have very little to go on beyond your > description. What version of bioperl are you using, OS, etc? What > does your data look like? File extension? > > chris > >> Thanks, >> >> Uwe >> >> >> >> >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> >> Uwe Hilgert, Ph.D. >> >> Dolan DNA Learning Center >> >> Cold Spring Harbor Laboratory >> >> >> >> V: (516) 367-5185 >> >> E: hilgert at cshl.edu >> >> F: (516) 367-5182 >> >> W: http://www.dnalc.org >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Wed Aug 5 17:45:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:45:03 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <1A4207F8295607498283FE9E93B775B40624DA9B@EX02.asurite.ad.asu.edu> I'm not sure, but I think the module is fasta, not Fasta. So it should be -format=>'fasta', unless you're on a case-insensitive system that is forgiving the capital... Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Wednesday, August 05, 2009 2:38 PM > To: Hilgert, Uwe > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and > splits the > header off as the first line in that chunk. You could probably try > leaving the format out and letting SeqIO guess it, or passing > the file > into Bio::Tools::GuessSeqFormat directly, but it's probably > better to > go through the files and add a file extension that > corresponds to the > format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > > > Thanks, Chris. The files have no extension, but we indicate what > > format > > to use, like in the manual: > > > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > > > I wonder now whether this could exactly cause the problem: as we are > > telling that input files are in fasta format they are being > treated as > > such (=remove first line) - regardless of whether they really are > > fasta? > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > C: (516) 857-1693 > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, August 05, 2009 5:04 PM > > To: Hilgert, Uwe > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > > > >> Is my impression correct that Bio::SeqIO just assumes that > sequences > >> are > >> being submitted in FASTA format? > > > > No. See: > > > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > > SeqIO tries to guess at the format using the file extension, and if > > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > > possible that the extension is causing the problem, or that > > GuessSeqFormat guessing wrong (it's apt to do that, as it's > forced to > > guessing). In any case, it's always advisable to > explicitly indicate > > the format when possible. > > > > Relevant lines: > > > > return 'fasta' if > /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > > i; > > ... > > return 'raw' if /\.(txt)$/i; > > > >> In our experience, implementing > >> Bio::SeqIO led to the first line of files being cut off, > regardless > >> of > >> whether the files were indeed fasta files or files that only > >> contained > >> sequence. > > > > Files that only contain sequence are 'raw'. Ones in FASTA are > > 'fasta'. > > > >> Which, in the latter, led to sequence submissions that had the > >> first line of nucleotides removed. Has anyone tried to > write a fix > >> for > >> this? > > > > This sounds like a bug, but we have very little to go on beyond your > > description. What version of bioperl are you using, OS, etc? What > > does your data look like? File extension? > > > > chris > > > >> Thanks, > >> > >> Uwe > >> > >> > >> > >> > >> > >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >> > >> Uwe Hilgert, Ph.D. > >> > >> Dolan DNA Learning Center > >> > >> Cold Spring Harbor Laboratory > >> > >> > >> > >> V: (516) 367-5185 > >> > >> E: hilgert at cshl.edu > >> > >> F: (516) 367-5182 > >> > >> W: http://www.dnalc.org > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Aug 5 18:53:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Aug 2009 18:53:56 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Aug 5 19:12:52 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 5 Aug 2009 19:12:52 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> If these items were included in a Bugzilla report, that would be most convenient (= most likely to get looked carefully) and is the best place for us to keep track of these kinds of issues-- http://bugzilla.bioperl.org/ cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 05, 2009 6:53 PM Subject: Re: [Bioperl-l] Bio::SeqIO issue >I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >>> guessing). In any case, it's always advisable to explicitly indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 6 00:43:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 23:43:45 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: The SeqIO::fasta parser sets: local $/ = "\n>"; then splits the resulting chunks of data (each corresponding to a full FASTA-formatted sequence) into two pieces: my ($top,$sequence) = split(/\n/,$entry,2); If there is no description line (e.g. the file is all raw sequence data) these lines would result in reading in the whole file, then split out the first line. chris On Aug 5, 2009, at 5:53 PM, Hilmar Lapp wrote: > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show > us your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the > line, or that the line endings in your data file are from a > different OS than the one you're running the script on. (Or that you > are running a very old version of BioPerl, which is entirely > possible if you installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls >> out the full sequence in chunks (based on local $/ = "\n>") and >> splits the header off as the first line in that chunk. You could >> probably try leaving the format out and letting SeqIO guess it, or >> passing the file into Bio::Tools::GuessSeqFormat directly, but it's >> probably better to go through the files and add a file extension >> that corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being >>> treated as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a >>>> fix for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 01:12:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 00:12:13 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> Message-ID: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be > most convenient (= most likely to get looked carefully) > and is the best place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From eigenrosen at gmail.com Thu Aug 6 03:12:24 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 00:12:24 -0700 Subject: [Bioperl-l] Trouble with Clustalw Message-ID: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> I'm a complete bioperl novice, trying to do Clustalw on some fasta files, and am running into trouble: ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 550. Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 551. Can't exec "align": No such file or directory at /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/ Root/Root.pm:328 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 STACK: TestClust:22 ----------------------------------------------------------- Here's my code: #!/usr/bin/perl -w use Bio::Perl; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::Seq; use strict; use warnings; my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); my @seq_array = read_all_sequences($ARGV[0],'fasta'); for (my $i = 0; $i < @seq_array; $i++){ (my $seq = $seq_array[$i]->seq()) =~ s/-//g; $seq_array[$i]->seq($seq); } write_sequence(">test",'fasta', at seq_array); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); my @align_array = $aln->each_seq(); write_sequence(">testfile",'fasta', at align_array); The loop is just there to take out some gaps that were placed in a blast previous to this. The write_sequence call confirms that @seq_array is a valid array of Bio:Seq objects at the time align calls it. Here's some output in "test": >A0220B0939one.1 FV584Q101DEWY9 TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >A0220B0939one.2 FV584Q101A4DG7 TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG ... Thanks, Mike From florian.mittag at uni-tuebingen.de Thu Aug 6 05:38:38 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 11:38:38 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907151500.21947.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> Message-ID: <200908061138.38809.florian.mittag@uni-tuebingen.de> Hi! I just noticed, that we didn't solve this problem completely. On Wednesday, 15. July 2009 15:00, Florian Mittag wrote: > > Well, it is like this with version 9.5 of DB2 Express-C: > > > > SELECT NULL FROM bioentry; > > > > yields: > > SQL0206N "NULL" is not valid in the context where it is used. > > SQLSTATE=42703 SQLCODE=-206 > > > > But if I do: > > > > SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; > > > > [...] > > > > It ran fine without the NULL column, but that isn't necessarily a sign of > > correctness. My problem was that (as stated above) the old version of DB2 > > requires you to cast the NULL value to a data type, which I wasn't able > > to determine from the code. With the new version, it should work, so I'll > > have to rerun my tests again and see if the problem is still there. > > You convinced me that the NULL column is supposed to be there, so I found > another workaround around line 1273 in BaseDriver.pm: > > if((! $attr) || (! $entitymap->{$tbl}) || > $dont_select_attrs->{$tbl .".". $attr}) { > #push(@attrs, "NULL"); > push(@attrs, "cast(NULL as VARCHAR(255))"); > } else { > > Since I don't know how to determine the datatype of the column that is set > to NULL, I simply chose VARCHAR and tested it. And it worked! (BTW: The > column set to NULL is named "rank" in the case below.) Although this solution works, it is not the best, because it breaks compatibility with all other database types, e.g., MySQL. Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" only when the driver is DB2? - Florian From hlapp at gmx.net Thu Aug 6 09:36:08 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:36:08 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: Why is specifying fasta format when your input is not in fasta format not a user error? I agree with the not removing newlines in raw format being a bug. -hilmar On Aug 6, 2009, at 1:12 AM, Chris Fields wrote: > Just to confirm: the following is using bioperl-live on my macbook > pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug > or a user issue (if it's the former, we can easily add an exception > indicating lack of a header). Note that 'raw' also fails for the > raw example below (doesn't appear to remove newlines). > > -c > > cjfields4:fasta cjfields$ cat raw_v_fasta.pl > #!/usr/bin/perl -w > > use strict; > use warnings; > use IO::String; > use Bio::SeqIO; > use Test::More qw(no_plan); > > my %seq; > > $seq{raw} = < MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > RAW > > $seq{fasta} = < >CATH_RAT > MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > FASTA > > my %newdata; > for my $input (sort keys %seq) { > my $fh = IO::String->new($seq{$input}); > my $seq = Bio::SeqIO->new(-format => 'fasta', > -fh => $fh)->next_seq; > $newdata{$input} = $seq->seq; > } > is($newdata{raw}, $newdata{fasta}, 'format'); > > cjfields4:fasta cjfields$ perl raw_v_fasta.pl > not ok 1 - format > # Failed test 'format' > # at raw_v_fasta.pl line 36. > # got: > 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > # expected: > 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > 1..1 > # Looks like you failed 1 test of 1. > > On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > >> If these items were included in a Bugzilla report, that would be >> most convenient (= most likely to get looked carefully) >> and is the best place for us to keep track of these kinds of >> issues-- http://bugzilla.bioperl.org/ >> cheers MAJ >> ----- Original Message ----- From: "Hilmar Lapp" >> To: "Chris Fields" >> Cc: "BioPerl List" >> Sent: Wednesday, August 05, 2009 6:53 PM >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> Uwe - I'd like you to go back to Chris' initial questions that >>> you haven't answered yet: "What version of bioperl are you using, >>> OS, etc? What does your data look like?" I'd add to that, can >>> you show us your full script, or a smaller code snippet that >>> reproduces the problem. >>> I suspect that either something in your script is swallowing the >>> line, or that the line endings in your data file are from a >>> different OS than the one you're running the script on. (Or that >>> you are running a very old version of BioPerl, which is entirely >>> possible if you installed through CPAN.) >>> -hilmar >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out the full sequence in chunks (based on local $/ = "\n>") and >>>> splits the header off as the first line in that chunk. You >>>> could probably try leaving the format out and letting SeqIO >>>> guess it, or passing the file into Bio::Tools::GuessSeqFormat >>>> directly, but it's probably better to go through the files and >>>> add a file extension that corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate >>>>> what format >>>>> to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated as >>>>> such (=remove first line) - regardless of whether they really >>>>> are fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences >>>>>> are >>>>>> being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>>> forced to >>>>> guessing). In any case, it's always advisable to explicitly >>>>> indicate >>>>> the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>>> $/ i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless of >>>>>> whether the files were indeed fasta files or files that only >>>>>> contained >>>>>> sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix for >>>>>> this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Aug 6 09:42:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:42:06 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200908061138.38809.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> Message-ID: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > only when the driver is DB2? Not yet, but that's the solution I had in mind, i.e., introducing a method in the Bio::DB::DBI::* (driver-specific) classes that returns whatever NULL as a SELECT field should be represented as. What will be very hard or nearly impossible to do is to cast to the actual type of the column, so if simply using VARCHAR(255) does the trick for DB2 that'd be great. BTW you did check that simply aliasing the column does not fix the problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will throw an error, right? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florian.mittag at uni-tuebingen.de Thu Aug 6 10:12:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 16:12:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> Message-ID: <200908061612.21852.florian.mittag@uni-tuebingen.de> On Thursday, 6. August 2009 15:42, Hilmar Lapp wrote: > On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > > only when the driver is DB2? > > Not yet, but that's the solution I had in mind, i.e., introducing a > method in the Bio::DB::DBI::* (driver-specific) classes that returns > whatever NULL as a SELECT field should be represented as. Sounds like a good idea! > What will be > very hard or nearly impossible to do is to cast to the actual type of > the column, so if simply using VARCHAR(255) does the trick for DB2 > that'd be great. Surprisingly, it does. At least, I haven't noticed any problems if the target data type is for example an integer. With all the trouble I have with DB2, I didn't expect this. > BTW you did check that simply aliasing the column does not fix the > problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will > throw an error, right? Yepp: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL AS col1, term.ontology_id FROM term WHERE identifier = ? [IBM][CLI Driver][DB2/LINUX] SQL0418N A statement contains a use of an untyped parameter marker or a null value that is not valid. - Florian From hilgert at cshl.edu Thu Aug 6 11:01:05 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:01:05 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: I'm not sure what version we have. Cornel may have installed it a while ago from CVS: Module id = Bio::Root::Build CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm INST_VERSION 1.006900 cpan> m Bio::Root::Version Module id = Bio::Root::Version CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm INST_VERSION 1.006900 cpan> m Bio::SeqIO Module id = Bio::SeqIO CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm INST_VERSION undef Cornel still has the checked-out "bioperl-live" directory and the last changes are from March this year. As per why he used "Fasta" instead of 'fasta" as the format parameter in Bio::SeqIO, it's because that what it says in the modules manual. He now tried 'fasta' instead and see no changes in behavior. Omitting the format parameter altogether, fasta-formatted sequence continues to be treated correctly, the first line being removed. However, raw sequence is being treated differently in that the first line is not being removed any more. Instead, the program returns the first line only. Which, in the example I am going to forward in my next message, will return 60 amino acids out of raw sequence of 300 aa. Can't win with raw sequence... The files may be created on different platforms, we didn't notice any difference between using files created on Windows or Linux. Thanks Uwe -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Wednesday, August 05, 2009 6:54 PM To: Chris Fields Cc: Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hilgert at cshl.edu Thu Aug 6 11:03:53 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:03:53 -0400 Subject: [Bioperl-l] FW: Bio::SeqIO issue Message-ID: If you don't specify any format only the first line gets returned: not ok 1 - format # Failed test 'format' # at test/test_fasta.pl line 35. # got: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. -----Original Message----- From: Hilgert, Uwe Sent: Thursday, August 06, 2009 9:12 AM To: Ghiban, Cornel Subject: FW: [Bioperl-l] Bio::SeqIO issue -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 1:12 AM To: Mark A. Jensen Cc: Hilgert, Uwe; BioPerl List; Hilmar Lapp Subject: Re: [Bioperl-l] Bio::SeqIO issue Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWT FSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCK FNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVG YGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be most > convenient (= most likely to get looked carefully) and is the best > place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From hlapp at gmx.net Thu Aug 6 11:18:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 11:18:06 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while > ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the > format parameter altogether, fasta-formatted sequence continues to be > treated correctly, the first line being removed. However, raw sequence > is being treated differently in that the first line is not being > removed > any more. Instead, the program returns the first line only. Which, in > the example I am going to forward in my next message, will return 60 > amino acids out of raw sequence of 300 aa. Can't win with raw > sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bosborne11 at verizon.net Thu Aug 6 11:20:49 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 11:20:49 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <2F73C3DC-D943-4EC3-834A-EA2984FDDB5D@verizon.net> Uwe et al, Yes, this argument works irrespective of case: The format name is case-insensitive: 'FASTA', 'Fasta' and 'fasta' are all valid. From Bio::SeqIO. Brian O. On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the From cjfields at illinois.edu Thu Aug 6 12:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:30:01 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> On Aug 6, 2009, at 8:36 AM, Hilmar Lapp wrote: > Why is specifying fasta format when your input is not in fast format > not a user error? Agreed. My point is should we worry about adding an exception (which may be a little more user-friendly). Right now the bad stuff happens silently. > I agree with the not removing newlines in raw format being a bug. > > -hilmar Acc. to the SeqIO::raw docs, this is a little trickier. The documented behavior explicitly indicates that each line (sans non- whitespace) is assumed to be a separate sequence, so changing that behavior breaks API. I suppose we can have $/ set locally to a cached $/ default value or undef: # assumes entire file is read in my $io = Bio::SeqIO->new(-format => 'raw', -gulp => 1); chris From hlapp at gmx.net Thu Aug 6 12:42:00 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 12:42:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> Message-ID: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> On Aug 6, 2009, at 12:30 PM, Chris Fields wrote: > Agreed. My point is should we worry about adding an exception > (which may be a little more user-friendly). Right now the bad stuff > happens silently. Great point. We don't want silent failures, do we. > >> I agree with the not removing newlines in raw format being a bug. >> >> -hilmar > > Acc. to the SeqIO::raw docs, this is a little trickier. The > documented behavior explicitly indicates that each line (sans non- > whitespace) is assumed to be a separate sequence, so changing that > behavior breaks API. Ah - true indeed. I like the optional argument feature - that way it's easy for the user to choose. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Thu Aug 6 12:49:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:49:53 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From biopython at maubp.freeserve.co.uk Thu Aug 6 12:51:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Aug 2009 17:51:34 +0100 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> Message-ID: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: > >>> I agree with the not removing newlines in raw format being a bug. >>> >>> ? ? ? ?-hilmar >> >> Acc. to the SeqIO::raw docs, this is a little trickier. ?The documented >> behavior explicitly indicates that each line (sans non-whitespace) is >> assumed to be a separate sequence, so changing that behavior breaks API. > > Ah - true indeed. I like the optional argument feature - that way it's easy > for the user to choose. > For reference, "raw" as a format in EMBOSS seems to give just one sequence regardless of any line breaks. Adding an optional argument might be clearest, but have you considered using the new BioPerl SeqIO variant argument to have two forms of raw (the original variant giving one sequence per line, and a new variant where you just get one sequence regardless of any line breaks)? Peter From cjfields at illinois.edu Thu Aug 6 12:58:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:58:07 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> Message-ID: On Aug 6, 2009, at 11:51 AM, Peter wrote: > On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: >> >>>> I agree with the not removing newlines in raw format being a bug. >>>> >>>> -hilmar >>> >>> Acc. to the SeqIO::raw docs, this is a little trickier. The >>> documented >>> behavior explicitly indicates that each line (sans non-whitespace) >>> is >>> assumed to be a separate sequence, so changing that behavior >>> breaks API. >> >> Ah - true indeed. I like the optional argument feature - that way >> it's easy >> for the user to choose. >> > > For reference, "raw" as a format in EMBOSS seems to give just one > sequence regardless of any line breaks. Yes, and that's the behavior I would expect, actually. > Adding an optional argument might be clearest, but have you considered > using the new BioPerl SeqIO variant argument to have two forms of raw > (the original variant giving one sequence per line, and a new variant > where you just get one sequence regardless of any line breaks)? > > Peter That's a good point. We'd have to keep 'raw' as the prior behavior, but 'raw-complete' could be used for such a circumstance ('raw-gulp' sounds just wrong ;) chris From rmb32 at cornell.edu Thu Aug 6 13:14:12 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 Aug 2009 10:14:12 -0700 Subject: [Bioperl-l] tigrxml parsing Message-ID: <4A7B0F64.9070205@cornell.edu> Hi all, Recently in #bioperl somebody came by trying to use Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz svn HEAD tigrxml.pm was not at all happy with these files, eventually dieing with ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: start is undefined STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 STACK: Bio::RangeI::contains Bio/RangeI.pm:255 STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/Generic.pm:783 STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/Base.pm:266 STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/Expat.pm:225 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/Expat.pm:45 STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm:2631 STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 STACK: /crypt/rob/test2.pl:10 ----------------------------------------------------------- Looking at the medicago XML and comparing it to the bioperl-live/t/data/test.tigrxml, the two look VERY different in structure. Lots of things that are attrs in test.tigrxml seem to be elements in the medicago XML, for example. So I guess the question is: is the medicago TIGR XML malformed? Can tigrxml.pm be expected to parse it? What, if anything, should be done about this? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From hilgert at cshl.edu Thu Aug 6 15:36:36 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 15:36:36 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Hmmm, I fail to see how supplying raw sequence could be a called "bad" input or a "problem". In our case, for example, not every user is a bioinformatics expert and Cornel was suggesting to account for that instead of trying to "train" the user to adhere to requirements that have not much to do with what s/he tries to accomplish. I don't really see data being modified, rather that the data format is being adopted to the needs of the software; which I would argue should be something the software is being able to take care of. Uwe -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 12:50 PM To: Ghiban, Cornel Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 16:09:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:09:22 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <6729F9CC-ACF9-4BC4-9905-7EA24C1DCA61@illinois.edu> If one supplies raw sequence (no descriptor) to a FASTA parser (requires a descriptor), then it is bad input. One can't reasonably expect the parser to work correctly under those circumstance. Garbage in, garbage out. The simplest and (IMHO) best solution under such circumstances is for the parser to die meaningfully ("Sequence is not FASTA format; '>' descriptor line is missing" or similar). Tacking a '>' onto bad data doesn't make it magically work, it's just bad data with a '>' appended. To take this one step further, what if this were genbank data? Or XML? A well-formed exception, though initially inconvenient to the user, will indicate the problem right away. Silently trying to fix the problem by appending '>' to bad input data wouldn't work, and the resulting failure downstream (likely from validate_seq) would obscure the real problem, being the user is using the wrong format parser. chris On Aug 6, 2009, at 2:36 PM, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being > adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > >> Hi, >> >> It doesn't matter what sequence we use. As Chris Fields's showed in >> his test, not having >> ">" as the 1st character on the first line is the problem. >> We always assumed the sequence is in FASTA format and this seems to >> be wrong. >> >> I think, the solution to our problem is to check whether the ">" >> symbol is present or not. >> If not present then it will be added. >> >> Thank you, >> Cornel Ghiban >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Thursday, August 06, 2009 11:18 AM >> To: Hilgert, Uwe >> Cc: Chris Fields; BioPerl List; Ghiban, Cornel >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> Uwe - could you send an actual data file (as an attachment) that >> reproduces the problem, or is that not possible? >> >> -hilmar >> >> On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: >> >>> I'm not sure what version we have. Cornel may have installed it a >>> while ago from CVS: >>> >>> Module id = Bio::Root::Build >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::Root::Version >>> Module id = Bio::Root::Version >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::SeqIO >>> Module id = Bio::SeqIO >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >>> INST_VERSION undef >>> >>> Cornel still has the checked-out "bioperl-live" directory and the >>> last >>> changes are from March this year. >>> >>> As per why he used "Fasta" instead of 'fasta" as the format >>> parameter >>> in Bio::SeqIO, it's because that what it says in the modules manual. >>> He now tried 'fasta' instead and see no changes in behavior. >>> Omitting >>> the format parameter altogether, fasta-formatted sequence continues >>> to >>> be treated correctly, the first line being removed. However, raw >>> sequence is being treated differently in that the first line is not >>> being removed any more. Instead, the program returns the first line >>> only. Which, in the example I am going to forward in my next >>> message, >>> will return 60 amino acids out of raw sequence of 300 aa. Can't win >>> with raw sequence... >>> >>> >>> The files may be created on different platforms, we didn't notice >>> any >>> difference between using files created on Windows or Linux. >>> >>> Thanks >>> Uwe >>> >>> >>> >>> >>> -----Original Message----- >>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>> Sent: Wednesday, August 05, 2009 6:54 PM >>> To: Chris Fields >>> Cc: Hilgert, Uwe; BioPerl List >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> >>> Uwe - I'd like you to go back to Chris' initial questions that you >>> haven't answered yet: "What version of bioperl are you using, OS, >>> etc? >>> What does your data look like?" I'd add to that, can you show us >>> your >>> full script, or a smaller code snippet that reproduces the problem. >>> >>> I suspect that either something in your script is swallowing the >>> line, >>> or that the line endings in your data file are from a different OS >>> than the one you're running the script on. (Or that you are >>> running a >>> very old version of BioPerl, which is entirely possible if you >>> installed through CPAN.) >>> >>> -hilmar >>> >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out >>>> the full sequence in chunks (based on local $/ = "\n>") and splits >>>> the header off as the first line in that chunk. You could probably >>>> try leaving the format out and letting SeqIO guess it, or passing >>>> the >>>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>>> better to go through the files and add a file extension that >>>> corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate what >>>>> format to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated >>>>> as such (=remove first line) - regardless of whether they really >>>>> are >>>>> fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe >>>>> Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences are being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>>> to guessing). In any case, it's always advisable to explicitly >>>>> indicate the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>>> i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless >>>>>> of whether the files were indeed fasta files or files that only >>>>>> contained sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix >>>>>> for this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:25:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:25:45 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> Message-ID: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Michael, Are you using ClustalW 2? I'm not sure but I don't think the wrapper has been updated for the latest version (I think parsing still works, though). chris On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > I'm a complete bioperl novice, trying to do Clustalw on some fasta > files, and am running into trouble: > > ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 550. > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 551. > Can't exec "align": No such file or directory at /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - > output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ > Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 > STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 > STACK: TestClust:22 > ----------------------------------------------------------- > > Here's my code: > > #!/usr/bin/perl -w > > use Bio::Perl; > use Bio::AlignIO; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::Seq; > use strict; > use warnings; > > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my @seq_array = read_all_sequences($ARGV[0],'fasta'); > > for (my $i = 0; $i < @seq_array; $i++){ > (my $seq = $seq_array[$i]->seq()) =~ s/-//g; > $seq_array[$i]->seq($seq); > } > > write_sequence(">test",'fasta', at seq_array); > > my $seq_array_ref = \@seq_array; > my $aln = $factory->align($seq_array_ref); > > my @align_array = $aln->each_seq(); > write_sequence(">testfile",'fasta', at align_array); > > > The loop is just there to take out some gaps that were placed in a > blast previous to this. The write_sequence call confirms that > @seq_array is a valid array of Bio:Seq objects at the time align > calls it. Here's some output in "test": > > >A0220B0939one.1 FV584Q101DEWY9 > TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC > CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT > TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT > TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG > CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG > CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA > CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA > CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT > AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG > >A0220B0939one.2 FV584Q101A4DG7 > TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG > ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC > AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG > TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG > GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA > GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT > CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT > CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT > ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG > ... > > Thanks, > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:30:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:30:30 -0500 Subject: [Bioperl-l] tigrxml parsing In-Reply-To: <4A7B0F64.9070205@cornell.edu> References: <4A7B0F64.9070205@cornell.edu> Message-ID: Robert, This popped up recently (may be related): http://thread.gmane.org/gmane.comp.lang.perl.bio.general/19782 http://bugzilla.open-bio.org/show_bug.cgi?id=2868 It might be possible to map this into bioperl, but someone needs to take it up. chris On Aug 6, 2009, at 12:14 PM, Robert Buels wrote: > Hi all, > > Recently in #bioperl somebody came by trying to use > Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz > > svn HEAD tigrxml.pm was not at all happy with these files, > eventually dieing with > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: start is undefined > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 > STACK: Bio::RangeI::contains Bio/RangeI.pm:255 > STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/ > Generic.pm:783 > STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 > STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 > STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/ > Base.pm:266 > STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/ > Expat.pm:225 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm: > 469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/ > Expat.pm:45 > STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 > STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm: > 2631 > STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 > STACK: /crypt/rob/test2.pl:10 > ----------------------------------------------------------- > > Looking at the medicago XML and comparing it to the bioperl-live/t/ > data/test.tigrxml, the two look VERY different in structure. Lots > of things that are attrs in test.tigrxml seem to be elements in the > medicago XML, for example. > > So I guess the question is: is the medicago TIGR XML malformed? > Can tigrxml.pm be expected to parse it? What, if anything, should > be done about this? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From eigenrosen at gmail.com Thu Aug 6 16:39:09 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 13:39:09 -0700 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Hi Chris, I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the top of the module being called. Mike On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the > wrapper has been updated for the latest version (I think parsing > still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > >> I'm a complete bioperl novice, trying to do Clustalw on some fasta >> files, and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >> Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a >> blast previous to this. The write_sequence call confirms that >> @seq_array is a valid array of Bio:Seq objects at the time align >> calls it. Here's some output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Thu Aug 6 16:49:56 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 6 Aug 2009 16:49:56 -0400 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <69367b8f0908061349i48f4d2b1tcbccb00d5a3de5ca@mail.gmail.com> Hi Micheal, Have you considered calling clustalw from perl's "system" command and passing in the files for alignment? Mgavi On Thu, Aug 6, 2009 at 4:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > > I'm a complete bioperl novice, trying to do Clustalw on some fasta files, >> and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf -output=gcg >> -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a blast >> previous to this. The write_sequence call confirms that @seq_array is a >> valid array of Bio:Seq objects at the time align calls it. Here's some >> output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Aug 6 17:00:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 16:00:37 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <2C8DF4CB-40B0-41DB-882A-AAF346A008B2@illinois.edu> Michael, No, I meant was what version of clustalw (the actual executable) you are using. This is the bioperl wrapper svn version. What happens if you enter 'clustalw' on the command line? Do you get: ************************************************************** ******** CLUSTAL 2.0.11 Multiple Sequence Alignments ******** ************************************************************** I think the above version has problems with bioperl, though I can't recall exactly what the problems were. chris On Aug 6, 2009, at 3:39 PM, Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at > the top of the module being called. > > Mike > On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has been updated for the latest version (I think parsing >> still works, though). >> >> chris >> >> On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: >> >>> I'm a complete bioperl novice, trying to do Clustalw on some fasta >>> files, and am running into trouble: >>> >>> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 550. >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 551. >>> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >>> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >>> Bio/Root/Root.pm:328 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >>> STACK: TestClust:22 >>> ----------------------------------------------------------- >>> >>> Here's my code: >>> >>> #!/usr/bin/perl -w >>> >>> use Bio::Perl; >>> use Bio::AlignIO; >>> use Bio::Tools::Run::Alignment::Clustalw; >>> use Bio::SimpleAlign; >>> use Bio::Seq; >>> use strict; >>> use warnings; >>> >>> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >>> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >>> >>> for (my $i = 0; $i < @seq_array; $i++){ >>> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >>> $seq_array[$i]->seq($seq); >>> } >>> >>> write_sequence(">test",'fasta', at seq_array); >>> >>> my $seq_array_ref = \@seq_array; >>> my $aln = $factory->align($seq_array_ref); >>> >>> my @align_array = $aln->each_seq(); >>> write_sequence(">testfile",'fasta', at align_array); >>> >>> >>> The loop is just there to take out some gaps that were placed in a >>> blast previous to this. The write_sequence call confirms that >>> @seq_array is a valid array of Bio:Seq objects at the time align >>> calls it. Here's some output in "test": >>> >>> >A0220B0939one.1 FV584Q101DEWY9 >>> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >>> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >>> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >>> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >>> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >>> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >>> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >>> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >>> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >>> >A0220B0939one.2 FV584Q101A4DG7 >>> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >>> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >>> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >>> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >>> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >>> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >>> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >>> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >>> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >>> ... >>> >>> Thanks, >>> Mike >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From bosborne11 at verizon.net Thu Aug 6 16:01:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 16:01:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Chris, Yes, I think so. By the way, this is related to an old bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1508 Brian O. > This is a simple validation issue: should we throw an exception on > bad input (no '>') From bix at sendu.me.uk Thu Aug 6 17:18:02 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 06 Aug 2009 22:18:02 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <4A7B488A.2060600@sendu.me.uk> Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the > top of the module being called. I'm guessing your error is caused simply by not having clustalw installed. BioPerl run modules provide perl wrappers to external executables. They don't replace the need for those executables. From cjfields at illinois.edu Thu Aug 6 20:47:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 19:47:47 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: I added the exception and tests to svn (r15895), so I closed that bug out. Almost forgot about that one, thanks for pointing it out! chris On Aug 6, 2009, at 3:01 PM, Brian Osborne wrote: > Chris, > > Yes, I think so. > > By the way, this is related to an old bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1508 > > > Brian O. > > >> This is a simple validation issue: should we throw an exception on >> bad input (no '>') > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 22:30:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 21:30:09 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A765A44.7030902@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: Jonathan, Just to make sure you aren't accidentally 'warnocked' by the core devs: Your code sounds quite nice! However, we will begin the process of massively restructuring bioperl pretty soon, so I don't think it's a good idea to gear your code towards fitting directly into core. The best alternative should be fairly obvious, which is to release it to CPAN listing BioPerl 1.6.0 as a dependency if it is required. Your modules may or may not need the Bio* namespace (that's up to you, actually); there are several non-bioperl modules that also share the Bio* namespace, and I believe there are modules that aren't Bio* that use BioPerl (Gbrowse comes to mind). If you're focusing on interaction with robotics, Robotics::Bio::X might be a better namespace for instance (b/c you could expand later into other possibly non-bio robotics interfaces). The cpan-discuss list is probably a good place to ask, or (after you register on PAUSE) you can register the module namespace and see if there are any objections to the request. chris On Aug 2, 2009, at 10:32 PM, Jonathan Cline wrote: > Smithies, Russell wrote: >> I "acquired" an old Biomek 1000 that I'm thinking of modernising. >> It was originally controlled by a monstrously large but slow pc >> (IBM Value Point Model 466DX2 computer with Microsoft Windows* >> Version 3.1) >> My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) >> and use software like mach3 www.machsupport.com along with G-code >> to control it. >> I come from an engineering background so it seemed like the easy >> way to me :-) >> >> Now I just need a bit of free time to get it working... >> >> --Russell >> >> >> > I agree, that's probably the best way to go. It's hard to know what > amount of s/w processing was done on the host PC vs. the embedded > controller. If you were able to connect directly to the robot > hardware > with serial port(s) or whatever it's using, it would be tough to find > out the comm protocol unless someone has already reverse engineered it > (which is doubtful). Also from what I have seen online, attempting > to > run the old software under virtual machine is unpredictable due to > timing differences in the serial port communication. So removal of > the > old electronics is probably the best bet. If it has one arm, then > it's > much easier. > > As for robots with working workstation software, it seems the > annoyance > factor is that while the scripting languages are powerful (for GUI > scripting that is), they are still relatively low level. Bio types > with > a bit of CS seem to immediately turn to visual basic, labview, or even > excel spreadsheets and macros, in order to provide a higher level > abstraction for the workstation software. To me, it seems natural > that > there should be a "protocol compiler" which takes biology protocols as > input, and gives robot instructions as output (google "protolexer"). > The huge bottleneck of course is that everyone's robotics work tables > and equipment are somewhat unique to their needs. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > > >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>> Sent: Thursday, 30 July 2009 2:07 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Cc: Jonathan Cline >>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>> >>> I am writing a module for communication with biology robotics, as >>> discussed recently on #bioperl, and I invite your comments. >>> >>> Currently this mode talks to a Tecan genesis workstation robot ( >>> http://images.google.com/images?q=tecan genesis ). Other vendors >>> are >>> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >>> 'net with the exception of some visual basic and labview scripts >>> which I >>> have found. There are some computational biologists who program for >>> robots via high level s/w, but these scripts are not distributed >>> as OSS. >>> >>> With Tecan, there is a datapipe interface for hardware >>> communication, as >>> an added $$ option from the vendor. I haven't checked other >>> vendors to >>> see if they likewise have an open communication path for third party >>> software. By allowing third-party communication, then naturally the >>> next step is to create a socket client-server; especially as the >>> robot >>> vendor only support MS Win and using the local machine has typical >>> Microsoft issues (like losing real time communication with the >>> hardware >>> due to GUI animation, bad operating system stability, no unix except >>> cygwin, etc). >>> >>> >>> On Namespace: >>> >>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are >>> many >>> s/w modules already called 'robots' (web spider robots, chat bots, >>> www >>> automate, etc) so I chose the longer name "robotics" to >>> differentiate >>> this module as manipulating real hardware. Bio::Robotics is the >>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>> manufacturer-specific implementation. Robot control is made more >>> complex due to the very configurable nature of the work table >>> (placement >>> of equipment, type of equipment, type of attached arm, etc). The >>> abstraction has to be careful not to generalize or assume too >>> much. In >>> some cases, the Bio::Robotics modules may expand to arbitrary >>> equipment >>> such as thermocyclers, tray holders, imagers, etc - that could be a >>> future roadmap plan. >>> >>> Here is some theoretical example usage below, subject to change. At >>> this time I am deciding how much state to keep within the Perl >>> module. >>> By keeping state, some robot programming might be simplified >>> (avoiding >>> deadlock or tracking tip state). In general I am aiming for a more >>> "protocol friendly" method implementation. >>> >>> >>> To use this software with locally-connected robotics hardware: >>> >>> use Bio::Robotics; >>> >>> my $tecan = Bio::Robotics->new("Tecan") || die; >>> $tecan->attach() || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack1"); >>> $tecan->pipette(aspirate => "1", dispense => "1", from => >>> "sampleTray", to >>> => "DNATray"); >>> ... >>> >>> To use this software with remote robotics hardware over the network: >>> >>> # On the local machine, run: >>> use Bio::Robotics; >>> >>> my @connected_hardware = Bio::Robotics->query(); >>> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >>> @connected_hardware\n"; >>> $tecan->attach() || die; >>> $tecan->configure("my work table configuration file") || die; >>> # Run the server and process commands >>> while (1) { >>> $error = $tecan->server(passwordplaintext => "0xd290"); >>> if ($tecan->lastClientCommand() =~ /^shutdown/) { >>> last; >>> } >>> } >>> $tecan->detach(); >>> exit(0); >>> >>> # On the remote machine (the client), run: >>> use Bio::Robotics; >>> >>> my $server = "heavybio.dyndns.org:8080"; >>> my $password = "0xd290"; >>> my $tecan = Bio::Robotics->new("Tecan"); >>> $tecan->connect($server, $mypassword) || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack200"); >>> $tecan->pipette(aspirate => "1", dispense => "1", >>> from => "sampleTray A1", to => "DNATray A2", >>> volume => "45", liquid => "Buffer"); >>> $tecan->pipette(drop => "1"); >>> ... >>> $tecan->disconnect(); >>> exit(0); >>> >>> >>> >>> -- >>> >>> ## Jonathan Cline >>> ## jcline at ieee.org >>> ## Mobile: +1-805-617-0223 >>> ######################## >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Aug 7 05:19:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Aug 2009 10:19:14 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? ?I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris That shouldn't matter, according to Des Higgins ClustalW 2 is intended to be completely compatible with ClustalW 1.83, including the command line options. They will be adding new stuff in ClustalW 3. The only think to worry about with ClustalW 2 is parsing the output, as the header line of the alignments has changed very slightly. I can tell you from personal experience that the Biopython command line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for example, and would expect the same to be true for BioPerl. Peter From paola.bisignano at gmail.com Fri Aug 7 08:11:58 2009 From: paola.bisignano at gmail.com (Paola Bisignano via Scour) Date: Fri, 7 Aug 2009 05:11:58 -0700 Subject: [Bioperl-l] Scour Friend Invite Message-ID: <4a7c1a0e5b82d@gmail.com> Hey, Check out: http://scour.com/invite/paola82/ I'm using a new search engine called Scour.com. It shows Google/Yahoo/MSN results and user comments all on one page. Best of all we get rewarded for using it by collecting points with every search, comment and vote. The points are redeemable for Visa gift cards. Join through my invite link so we can be friends and search socially! I know you'll like it, - Paola Bisignano This message was sent to you as a friend referral to join scour.com, please feel free to review our http://scour.com/privacy page and our http://scour.com/communityguidelines/antispam page. If you prefer not to receive invitations from ANY scour members, please click here - http://www.scour.com/unsub/e/YmlvcGVybC1sQGxpc3RzLm9wZW4tYmlvLm9yZw== Write to us at: Scour, Inc., 15303 Ventura Blvd. Suite 220, Sherman Oaks, CA 91403, USA. campaignid: scour200908070001 Scour.com From hlapp at gmx.net Fri Aug 7 09:21:51 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 7 Aug 2009 09:21:51 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4a7c1a0e5b82d@gmail.com> References: <4a7c1a0e5b82d@gmail.com> Message-ID: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Just FYI, I am addressing this offline. Note to everyone: we don't tolerate this and it will get you removed from the list immediately (and banned for the second offense). This is a large list. You better spend the time and be very careful who you send this kind of stuff to before you waste everyone else's. -hilmar From stefan.kirov at bms.com Fri Aug 7 10:25:52 2009 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 07 Aug 2009 10:25:52 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> References: <4a7c1a0e5b82d@gmail.com> <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Message-ID: <4A7C3970.10501@bms.com> Hilmar Lapp wrote: > Just FYI, I am addressing this offline. Note to everyone: we don't > tolerate this and it will get you removed from the list immediately > (and banned for the second offense). This is a large list. You better > spend the time and be very careful who you send this kind of stuff to > before you waste everyone else's. > > -hilmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > It is quite possible this guy has no idea scour is spamming people on his behalf. It seems to me there should be spam-filter trained to take care of these guys. As a reference: http://forums.digitalpoint.com/showthread.php?t=955786 http://markmail.org/message/fzlutwd3mkforbsu -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From jdalzell03 at qub.ac.uk Mon Aug 3 19:18:24 2009 From: jdalzell03 at qub.ac.uk (Johnathan Dalzell) Date: Tue, 4 Aug 2009 00:18:24 +0100 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 Message-ID: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl 5.10 and the activePerl equivalent. I'm wrking through vista, and ovver multiple times, this is the furthest I can get through installation.... Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] a - will install all scripts Do you want to run tests that require connection to servers across the internet (likely to cause some failures)? y/n [n] y - will run internet-requiring tests Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/lib/Data/Dumper.pm lin e 190, line 9. Creating new 'Build' script for 'BioPerl' version '1.006000' ---- Unsatisfied dependencies detected during ---- ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- SOAP::Lite [requires] GraphViz [requires] Convert::Binary::C [requires] Algorithm::Munkres [requires] XML::Twig [requires] DB_File [requires] Set::Scalar [requires] XML::Parser::PerlSAX [requires] XML::Writer [requires] XML::SAX::Writer [requires] Clone [requires] XML::DOM::XPath [requires] PostScript::TextBlock [requires] Running Build test Delayed until after prerequisites Running Build install Delayed until after prerequisites Running install for module 'SOAP::Lite' Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP-Lite-0.710.08.tar.gz ok CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz We are about to install SOAP::Lite and for your convenience will provide you with list of modules and prerequisites, so you'll be able to choose only modules you need for your configuration. XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by default. Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. Press to see the detailed list. Feature Prerequisites Install? ----------------------------- ---------------------------- -------- Core Package [*] Scalar::Util always [*] Test::More [*] URI [*] MIME::Base64 [*] version [*] XML::Parser (v2.23) Client HTTP support [*] LWP::UserAgent always Client HTTPS support [ ] Crypt::SSLeay [ no ] Client SMTP/sendmail support [ ] MIME::Lite [ no ] Client FTP support [*] IO::File [ yes ] [*] Net::FTP Standalone HTTP server [*] HTTP::Daemon [ yes ] Apache/mod_perl server [ ] Apache [ no ] FastCGI server [ ] FCGI [ no ] POP3 server [ ] MIME::Parser [ no ] [*] Net::POP3 IO server [*] IO::File [ yes ] MQ transport support [ ] MQSeries [ no ] JABBER transport support [ ] Net::Jabber [ no ] MIME messages [ ] MIME::Parser [ no ] DIME messages [*] IO::Scalar (v2.105) [ no ] [ ] DIME::Tools (v0.03) [ ] Data::UUID (v0.11) SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] Compression support for HTTP [*] Compress::Zlib [ yes ] MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] --- An asterix '[*]' indicates if the module is currently installed. Do you want to proceed with this configuration? [yes] yes Checking if your kit is complete... Looks good Writing Makefile for SOAP::Lite cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport\TCP.pm cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport\POP3.pm cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema19 99.pm cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema20 01.pm cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport\MQ.pm cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport\FTP.pm cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP\Transport\JABBER.pm cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_2.pm cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport\IO.pm cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_1.pm cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP\Transport\LOCAL.pm cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP\Transport\MAILTO.pm cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/SOAPsh.pl blib\script\S OAPsh.pl pl2bat.bat blib\script\SOAPsh.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/stubmaker.pl blib\scrip t\stubmaker.pl pl2bat.bat blib\script\stubmaker.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/XMLRPCsh.pl blib\script \XMLRPCsh.pl pl2bat.bat blib\script\XMLRPCsh.pl MKUTTER/SOAP-Lite-0.710.08.tar.gz C:\strawberry\c\bin\dmake.EXE -- OK Running make test C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib\lib' , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/013-array-deserializati on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03-server.t t/04-attach. t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08-schema.t t/096_characters.t t /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t t/IO/SessionSet.t t/SO AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/Deserializer/XMLSchema199 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t t /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/SOAP/Transport/FTP.t t/S OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t t/SOAP/Transport/MAILT O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/HTTP/CGI.t t/XML/Parser /Lite.t t/XMLRPC/Lite.t t/01-core.t .................................. ok t/010-serializer.t ........................... ok t/012-cloneable.t ............................ ok t/013-array-deserialization.t ................ ok t/014_UNIVERSAL_use.t ........................ ok t/015_UNIVERSAL_can.t ........................ ok t/02-payload.t ............................... ok t/03-server.t ................................ ok t/04-attach.t ................................ skipped: Could not find MIME::Parser - is M IME::Tools installed? Aborting. t/05-customxml.t ............................. ok t/06-modules.t ............................... ok t/07-xmlrpc_payload.t ........................ ok t/08-schema.t ................................ ok t/096_characters.t ........................... skipped: (no reason given) t/097_kwalitee.t ............................. skipped: (no reason given) t/098_pod.t .................................. skipped: (no reason given) t/099_pod_coverage.t ......................... skipped: (no reason given) t/IO/SessionData.t ........................... ok t/IO/SessionSet.t ............................ ok t/SOAP/Data.t ................................ ok t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok t/SOAP/Lite/Packager.t ....................... ok t/SOAP/Schema/WSDL.t ......................... ok t/SOAP/Serializer.t .......................... 1/12 Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Lite .pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. t/SOAP/Serializer.t .......................... ok t/SOAP/Transport/FTP.t ....................... 1/7 Use of uninitialized value in split at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 55. substr outside of string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SO AP/Transport/FTP.pm line 56. Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/perl/lib/IO/Socket/INET. pm line 117. Use of uninitialized value $server in concatenation (.) or string at C:\strawberry\cpan\bu ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. t/SOAP/Transport/FTP.t ....................... ok t/SOAP/Transport/HTTP.t ...................... ok t/SOAP/Transport/HTTP/CGI.t .................. everytime I get to the CGI.t at the end here the installation won't move! Any suggestions would be greatly appreciated, I've been trying to force it through, literally for 5 hours now.... cheers, jonny From ghiban at cshl.edu Thu Aug 6 12:04:38 2009 From: ghiban at cshl.edu (Ghiban, Cornel) Date: Thu, 6 Aug 2009 12:04:38 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Message-ID: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Hi, It doesn't matter what sequence we use. As Chris Fields's showed in his test, not having ">" as the 1st character on the first line is the problem. We always assumed the sequence is in FASTA format and this seems to be wrong. I think, the solution to our problem is to check whether the ">" symbol is present or not. If not present then it will be added. Thank you, Cornel Ghiban -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Thursday, August 06, 2009 11:18 AM To: Hilgert, Uwe Cc: Chris Fields; BioPerl List; Ghiban, Cornel Subject: Re: [Bioperl-l] Bio::SeqIO issue Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format parameter > in Bio::SeqIO, it's because that what it says in the modules manual. > He now tried 'fasta' instead and see no changes in behavior. Omitting > the format parameter altogether, fasta-formatted sequence continues to > be treated correctly, the first line being removed. However, raw > sequence is being treated differently in that the first line is not > being removed any more. Instead, the program returns the first line > only. Which, in the example I am going to forward in my next message, > will return 60 amino acids out of raw sequence of 300 aa. Can't win > with raw sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, etc? > What does your data look like?" I'd add to that, can you show us your > full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing the >> file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>> Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences are being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to guessing). In any case, it's always advisable to explicitly >>> indicate the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, regardless >>>> of whether the files were indeed fasta files or files that only >>>> contained sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 8 08:38:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 Aug 2009 08:38:46 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4A7C3970.10501@bms.com> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> Message-ID: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Thanks Stefan--this makes a lot more sense to me than supposing a priori that a previous legitimate user of this list is spamming bioperl-l intentionally. I would prefer to initially give the benefit of the doubt to the intelligence of the users, rather than scare people off who are likely to be already mortified that their emails have been commandeered like this. I would definitely support an spam filter that works. MAJ ----- Original Message ----- From: "Stefan Kirov" To: "Hilmar Lapp" Cc: "BioPerl List" Sent: Friday, August 07, 2009 10:25 AM Subject: Re: [Bioperl-l] Scour Friend Invite > Hilmar Lapp wrote: >> Just FYI, I am addressing this offline. Note to everyone: we don't >> tolerate this and it will get you removed from the list immediately >> (and banned for the second offense). This is a large list. You better >> spend the time and be very careful who you send this kind of stuff to >> before you waste everyone else's. >> >> -hilmar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > It is quite possible this guy has no idea scour is spamming people on > his behalf. It seems to me there should be spam-filter trained to take > care of these guys. > As a reference: > http://forums.digitalpoint.com/showthread.php?t=955786 > http://markmail.org/message/fzlutwd3mkforbsu > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 10:18:59 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 10:18:59 -0400 Subject: [Bioperl-l] SeqIO documentation Message-ID: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Chris, Since we've been discussing formats I just wanted to mention that I've changed this documentation from SeqIO.pm: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then Fasta format is assumed. To: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then SeqIO will throw a fatal error. The code is clear, if SeqIO can't figure out what the format is then it dies, "fasta" is not the default format. Brian O. From cjfields at illinois.edu Sat Aug 8 12:23:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:23:44 -0500 Subject: [Bioperl-l] SeqIO documentation In-Reply-To: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> References: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Message-ID: Brian, That fits current behavior, so yes that makes sense. chris On Aug 8, 2009, at 9:18 AM, Brian Osborne wrote: > Chris, > > Since we've been discussing formats I just wanted to mention that > I've changed this documentation from SeqIO.pm: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then Fasta > format is assumed. > > To: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then SeqIO > will throw a fatal error. > > The code is clear, if SeqIO can't figure out what the format is then > it dies, "fasta" is not the default format. > > > Brian O. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:24:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:24:48 -0500 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Message-ID: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite > > >> Hilmar Lapp wrote: >>> Just FYI, I am addressing this offline. Note to everyone: we don't >>> tolerate this and it will get you removed from the list immediately >>> (and banned for the second offense). This is a large list. You >>> better >>> spend the time and be very careful who you send this kind of stuff >>> to >>> before you waste everyone else's. >>> >>> -hilmar >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> It is quite possible this guy has no idea scour is spamming people on >> his behalf. It seems to me there should be spam-filter trained to >> take >> care of these guys. >> As a reference: >> http://forums.digitalpoint.com/showthread.php?t=955786 >> http://markmail.org/message/fzlutwd3mkforbsu >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:26:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:55 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> Message-ID: <0A43205F-828F-4CC9-ADC3-EBCE92690765@illinois.edu> On Aug 7, 2009, at 4:19 AM, Peter wrote: > On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields > wrote: >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has >> been updated for the latest version (I think parsing still works, >> though). >> >> chris > > That shouldn't matter, according to Des Higgins ClustalW 2 is intended > to be completely compatible with ClustalW 1.83, including the command > line options. They will be adding new stuff in ClustalW 3. The only > think to worry about with ClustalW 2 is parsing the output, as the > header line of the alignments has changed very slightly. > > I can tell you from personal experience that the Biopython command > line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for > example, and would expect the same to be true for BioPerl. > > Peter I would think so as well, but I encountered some issues on my OS using ClustalW 2 with the last release: http://bugzilla.open-bio.org/show_bug.cgi?id=2728 I think it's something small, like something hard-coded in (version maybe) that's causing the problem, just didn't have time to check. chris From cjfields at illinois.edu Sat Aug 8 12:26:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:38 -0500 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <0963ED84-359B-465B-9BA2-956A0AB23587@illinois.edu> Have you tried installing SOAP::Lite directly? That seems to be the hanging point. The funny thing is this is somehow assigning everything as a requirement (SOAP::Lite is a 'recommends'). Worth investigating, but I don't have access to a Windows box (either for XP, Vista, or Win7). Hopefully we'll get a PPM up soon; it's in the roadmap for 1.6.1. In the meantime, (as a strictly temporary measure) have you tried setting PERL5LIB to point to a local copy of bioperl-1.6? chris On Aug 3, 2009, at 6:18 PM, Johnathan Dalzell wrote: > Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl > 5.10 and the activePerl equivalent. I'm wrking through vista, and > ovver multiple times, this is the furthest I can get through > installation.... > > > Install [a]ll Bioperl scripts, [n]one, or choose groups > [i]nteractively? [a] a > - will install all scripts > Do you want to run tests that require connection to servers across > the internet > (likely to cause some failures)? y/n [n] y > - will run internet-requiring tests > Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/ > lib/Data/Dumper.pm lin > e 190, line 9. > Creating new 'Build' script for 'BioPerl' version '1.006000' > ---- Unsatisfied dependencies detected during ---- > ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- > SOAP::Lite [requires] > GraphViz [requires] > Convert::Binary::C [requires] > Algorithm::Munkres [requires] > XML::Twig [requires] > DB_File [requires] > Set::Scalar [requires] > XML::Parser::PerlSAX [requires] > XML::Writer [requires] > XML::SAX::Writer [requires] > Clone [requires] > XML::DOM::XPath [requires] > PostScript::TextBlock [requires] > Running Build test > Delayed until after prerequisites > Running Build install > Delayed until after prerequisites > Running install for module 'SOAP::Lite' > Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP- > Lite-0.710.08.tar.gz > ok > CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > We are about to install SOAP::Lite and for your convenience will > provide > you with list of modules and prerequisites, so you'll be able to > choose > only modules you need for your configuration. > XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by > default. > Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. > Press to see the detailed list. > Feature Prerequisites Install? > ----------------------------- ---------------------------- -------- > Core Package [*] Scalar::Util always > [*] Test::More > [*] URI > [*] MIME::Base64 > [*] version > [*] XML::Parser (v2.23) > Client HTTP support [*] LWP::UserAgent always > Client HTTPS support [ ] Crypt::SSLeay [ no ] > Client SMTP/sendmail support [ ] MIME::Lite [ no ] > Client FTP support [*] IO::File [ yes ] > [*] Net::FTP > Standalone HTTP server [*] HTTP::Daemon [ yes ] > Apache/mod_perl server [ ] Apache [ no ] > FastCGI server [ ] FCGI [ no ] > POP3 server [ ] MIME::Parser [ no ] > [*] Net::POP3 > IO server [*] IO::File [ yes ] > MQ transport support [ ] MQSeries [ no ] > JABBER transport support [ ] Net::Jabber [ no ] > MIME messages [ ] MIME::Parser [ no ] > DIME messages [*] IO::Scalar (v2.105) [ no ] > [ ] DIME::Tools (v0.03) > [ ] Data::UUID (v0.11) > SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] > Compression support for HTTP [*] Compress::Zlib [ yes ] > MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] > --- An asterix '[*]' indicates if the module is currently installed. > Do you want to proceed with this configuration? [yes] yes > Checking if your kit is complete... > Looks good > Writing Makefile for SOAP::Lite > cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod > cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm > cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm > cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm > cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm > cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm > cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm > cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport > \TCP.pm > cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm > cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport > \POP3.pm > cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm > cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod > cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm > cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm > cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm > cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm > cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm > cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod > cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod > cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod > cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm > cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm > cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod > cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm > cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm > cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod > cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema19 > 99.pm > cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm > cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm > cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod > cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport > \HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema20 > 01.pm > cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod > cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm > cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm > cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport > \MQ.pm > cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport > \FTP.pm > cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP > \Transport\JABBER.pm > cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm > cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod > cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm > cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_2.pm > cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport > \IO.pm > cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_1.pm > cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP > \Transport\LOCAL.pm > cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm > cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod > cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm > cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP > \Transport\MAILTO.pm > cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > SOAPsh.pl blib\script\S > OAPsh.pl > pl2bat.bat blib\script\SOAPsh.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > stubmaker.pl blib\scrip > t\stubmaker.pl > pl2bat.bat blib\script\stubmaker.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > XMLRPCsh.pl blib\script > \XMLRPCsh.pl > pl2bat.bat blib\script\XMLRPCsh.pl > MKUTTER/SOAP-Lite-0.710.08.tar.gz > C:\strawberry\c\bin\dmake.EXE -- OK > Running make test > C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib\lib' > , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/ > 013-array-deserializati > on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03- > server.t t/04-attach. > t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08- > schema.t t/096_characters.t t > /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t > t/IO/SessionSet.t t/SO > AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/ > Deserializer/XMLSchema199 > 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/ > Deserializer/XMLSchemaSOAP1_1.t t > /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/ > SOAP/Transport/FTP.t t/S > OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t > t/SOAP/Transport/MAILT > O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/ > HTTP/CGI.t t/XML/Parser > /Lite.t t/XMLRPC/Lite.t > t/01-core.t .................................. ok > t/010-serializer.t ........................... ok > t/012-cloneable.t ............................ ok > t/013-array-deserialization.t ................ ok > t/014_UNIVERSAL_use.t ........................ ok > t/015_UNIVERSAL_can.t ........................ ok > t/02-payload.t ............................... ok > t/03-server.t ................................ ok > t/04-attach.t ................................ skipped: Could not > find MIME::Parser - is M > IME::Tools installed? Aborting. > t/05-customxml.t ............................. ok > t/06-modules.t ............................... ok > t/07-xmlrpc_payload.t ........................ ok > t/08-schema.t ................................ ok > t/096_characters.t ........................... skipped: (no reason > given) > t/097_kwalitee.t ............................. skipped: (no reason > given) > t/098_pod.t .................................. skipped: (no reason > given) > t/099_pod_coverage.t ......................... skipped: (no reason > given) > t/IO/SessionData.t ........................... ok > t/IO/SessionSet.t ............................ ok > t/SOAP/Data.t ................................ ok > t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok > t/SOAP/Lite/Packager.t ....................... ok > t/SOAP/Schema/WSDL.t ......................... ok > t/SOAP/Serializer.t .......................... 1/12 Use of > uninitialized value $values[0] > in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08- > wfOzhM\blib\lib/SOAP/Lite > .pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > t/SOAP/Serializer.t .......................... ok > t/SOAP/Transport/FTP.t ....................... 1/7 Use of > uninitialized value in split at > C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/ > Transport/FTP.pm line 55. > substr outside of string at C:\strawberry\cpan\build\SOAP- > Lite-0.710.08-wfOzhM\blib\lib/SO > AP/Transport/FTP.pm line 56. > Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/ > perl/lib/IO/Socket/INET. > pm line 117. > Use of uninitialized value $server in concatenation (.) or string at > C:\strawberry\cpan\bu > ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. > t/SOAP/Transport/FTP.t ....................... ok > t/SOAP/Transport/HTTP.t ...................... ok > t/SOAP/Transport/HTTP/CGI.t .................. > > everytime I get to the CGI.t at the end here the installation won't > move! Any suggestions would be greatly appreciated, I've been > trying to force it through, literally for 5 hours now.... > > cheers, > jonny > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 12:42:12 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 12:42:12 -0400 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <979637B9-F2EC-47A0-9283-440AA2558481@verizon.net> Jonathan, It looks like you're not the only one having problems with SOAP::Lite on Windows. For a possible workaround: http://objectmix.com/perl/638075-how-install-soap-lite-windows.html Brian O. On Aug 3, 2009, at 7:18 PM, Johnathan Dalzell wrote: > SOAP/Transport/HTTP/CGI From stefan.kirov at bms.com Sat Aug 8 16:45:32 2009 From: stefan.kirov at bms.com (Kirov, Stefan) Date: Sat, 8 Aug 2009 16:45:32 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife>, <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> Message-ID: There is indeed, actually my mail with the same header was held for a while. In any case I think these pay-to-search/invite-colleagues/et spam-whole-address-book sites should be banned if they are not formally not spam, since the user is at least partially aware of the effect. I am not sure if this is a good solution, I am just frustrated, because these companies are quite unethical. Maybe not as unethical as others (few come to my mind, but will not name them :-)), but still... On the other hand they have not been a real problem before. As long as this is not a frequent thing I guess the filter is doing a great job. Stefan ________________________________________ From: Chris Fields [cjfields at illinois.edu] Sent: Saturday, August 08, 2009 12:24 PM To: Mark A. Jensen Cc: Kirov, Stefan; Hilmar Lapp; BioPerl List Subject: Re: [Bioperl-l] Scour Friend Invite I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. From j_martin at lbl.gov Sat Aug 8 22:41:53 2009 From: j_martin at lbl.gov (Joel Martin) Date: Sat, 8 Aug 2009 19:41:53 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <20090809024152.GA26943@eniac.jgi-psf.org> Hello, It sounds like you want a layer to to figure out what they're giving your program before you open it, you could use Bio::Tools::GuessSeqFormat and spare your user the pain of knowledge. It seems reasonable that coddling happens only when requested. use IO::String; use Bio::SeqIO; use Bio::Tools::GuessSeqFormat; my @files = ( 'NC_000913.fasta', '.gb' ); for my $file ( @files ) { my ( $string, $strio, $out ); $strio = IO::String->new( $string ); $out = Bio::SeqIO->new ( -fh => $strio, -format => 'raw' ); my $guesser = new Bio::Tools::GuessSeqFormat( -file => $file ); my $in = Bio::SeqIO->new( -format => $guesser->guess , -file => $file ); while ( my $seq = $in->next_seq() ) { $out->write_seq( $seq ); print substr($string, 0, 30), "\n"; } } Joel On Thu, Aug 06, 2009 at 03:36:36PM -0400, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > > > Hi, > > > > It doesn't matter what sequence we use. As Chris Fields's showed in > > his test, not having > > ">" as the 1st character on the first line is the problem. > > We always assumed the sequence is in FASTA format and this seems to > > be wrong. > > > > I think, the solution to our problem is to check whether the ">" > > symbol is present or not. > > If not present then it will be added. > > > > Thank you, > > Cornel Ghiban > > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Thursday, August 06, 2009 11:18 AM > > To: Hilgert, Uwe > > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > Uwe - could you send an actual data file (as an attachment) that > > reproduces the problem, or is that not possible? > > > > -hilmar > > > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > > > >> I'm not sure what version we have. Cornel may have installed it a > >> while ago from CVS: > >> > >> Module id = Bio::Root::Build > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::Root::Version > >> Module id = Bio::Root::Version > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::SeqIO > >> Module id = Bio::SeqIO > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > >> INST_VERSION undef > >> > >> Cornel still has the checked-out "bioperl-live" directory and the > >> last > >> changes are from March this year. > >> > >> As per why he used "Fasta" instead of 'fasta" as the format parameter > >> in Bio::SeqIO, it's because that what it says in the modules manual. > >> He now tried 'fasta' instead and see no changes in behavior. Omitting > >> the format parameter altogether, fasta-formatted sequence continues > >> to > >> be treated correctly, the first line being removed. However, raw > >> sequence is being treated differently in that the first line is not > >> being removed any more. Instead, the program returns the first line > >> only. Which, in the example I am going to forward in my next message, > >> will return 60 amino acids out of raw sequence of 300 aa. Can't win > >> with raw sequence... > >> > >> > >> The files may be created on different platforms, we didn't notice any > >> difference between using files created on Windows or Linux. > >> > >> Thanks > >> Uwe > >> > >> > >> > >> > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Wednesday, August 05, 2009 6:54 PM > >> To: Chris Fields > >> Cc: Hilgert, Uwe; BioPerl List > >> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >> > >> I don't think that can be the problem. If anything, providing the > >> format ought to be better in terms of result than not providing it? > >> > >> Uwe - I'd like you to go back to Chris' initial questions that you > >> haven't answered yet: "What version of bioperl are you using, OS, > >> etc? > >> What does your data look like?" I'd add to that, can you show us your > >> full script, or a smaller code snippet that reproduces the problem. > >> > >> I suspect that either something in your script is swallowing the > >> line, > >> or that the line endings in your data file are from a different OS > >> than the one you're running the script on. (Or that you are running a > >> very old version of BioPerl, which is entirely possible if you > >> installed through CPAN.) > >> > >> -hilmar > >> > >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> > >>> Uwe, > >>> > >>> Please keep replies on the list. > >>> > >>> It's very possible that's the issue; IIRC the fasta parser pulls out > >>> the full sequence in chunks (based on local $/ = "\n>") and splits > >>> the header off as the first line in that chunk. You could probably > >>> try leaving the format out and letting SeqIO guess it, or passing > >>> the > >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably > >>> better to go through the files and add a file extension that > >>> corresponds to the format. > >>> > >>> chris > >>> > >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >>> > >>>> Thanks, Chris. The files have no extension, but we indicate what > >>>> format to use, like in the manual: > >>>> > >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > >>>> > >>>> I wonder now whether this could exactly cause the problem: as we > >>>> are > >>>> telling that input files are in fasta format they are being treated > >>>> as such (=remove first line) - regardless of whether they really > >>>> are > >>>> fasta? > >>>> > >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe > >>>> Hilgert, Ph.D. > >>>> Dolan DNA Learning Center > >>>> Cold Spring Harbor Laboratory > >>>> > >>>> C: (516) 857-1693 > >>>> V: (516) 367-5185 > >>>> E: hilgert at cshl.edu > >>>> F: (516) 367-5182 > >>>> W: http://www.dnalc.org > >>>> > >>>> -----Original Message----- > >>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>> Sent: Wednesday, August 05, 2009 5:04 PM > >>>> To: Hilgert, Uwe > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >>>> > >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >>>> > >>>>> Is my impression correct that Bio::SeqIO just assumes that > >>>>> sequences are being submitted in FASTA format? > >>>> > >>>> No. See: > >>>> > >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO > >>>> > >>>> SeqIO tries to guess at the format using the file extension, and if > >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > >>>> possible that the extension is causing the problem, or that > >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced > >>>> to guessing). In any case, it's always advisable to explicitly > >>>> indicate the format when possible. > >>>> > >>>> Relevant lines: > >>>> > >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > >>>> i; > >>>> ... > >>>> return 'raw' if /\.(txt)$/i; > >>>> > >>>>> In our experience, implementing > >>>>> Bio::SeqIO led to the first line of files being cut off, > >>>>> regardless > >>>>> of whether the files were indeed fasta files or files that only > >>>>> contained sequence. > >>>> > >>>> Files that only contain sequence are 'raw'. Ones in FASTA are > >>>> 'fasta'. > >>>> > >>>>> Which, in the latter, led to sequence submissions that had the > >>>>> first line of nucleotides removed. Has anyone tried to write a fix > >>>>> for this? > >>>> > >>>> This sounds like a bug, but we have very little to go on beyond > >>>> your > >>>> description. What version of bioperl are you using, OS, etc? What > >>>> does your data look like? File extension? > >>>> > >>>> chris > >>>> > >>>>> Thanks, > >>>>> > >>>>> Uwe > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >>>>> > >>>>> Uwe Hilgert, Ph.D. > >>>>> > >>>>> Dolan DNA Learning Center > >>>>> > >>>>> Cold Spring Harbor Laboratory > >>>>> > >>>>> > >>>>> > >>>>> V: (516) 367-5185 > >>>>> > >>>>> E: hilgert at cshl.edu > >>>>> > >>>>> F: (516) 367-5182 > >>>>> > >>>>> W: http://www.dnalc.org > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Sun Aug 9 06:38:30 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 11:38:30 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EA726.60303@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > OK, I propose to look into these. Almost certainly I'll be doing "convert > run/db/network to Module::Build". I'll try to resolve the bugs you've > mentioned. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. Chris already started on "convert run/db/network to Module::Build" for some reason, but his attempt doesn't actually result in any modules getting installed (setting pm_files() like that isn't enough). The easiest, cleanest and most standard solution is to create a lib directory and svn move Bio into it. Does anyone have an objection to me doing this for the network, db and run packages? It will only affect developers currently working on code in those packages, and they just need to be aware that an svn update will be rather dramatic after my change. From cjfields at illinois.edu Sun Aug 9 09:05:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:05:17 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7EA726.60303@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> Message-ID: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> ... > > Chris already started on "convert run/db/network to Module::Build" > for some reason, but his attempt doesn't actually result in any > modules getting installed (setting pm_files() like that isn't enough). > > The easiest, cleanest and most standard solution is to create a lib > directory and svn move Bio into it. Does anyone have an objection to > me doing this for the network, db and run packages? It will only > affect developers currently working on code in those packages, and > they just need to be aware that an svn update will be rather > dramatic after my change. If it stimulates you into doing this then I'm all for it, but I've waited on getting this fixed long enough I decided to take it on myself to work on it, using the simplest ones. You had mentioned several times you would do this and I hadn't seen any progress. The point: I would really like to get another point release out before we work on splitting things up. Simple as that. From what I have seen (with my few tests) everything (modules, scripts) gets copied into blib just fine and the temp folder for script generation gets cleaned up; I haven't progressed beyond to the installation step, but there isn't anything to me that indicates it wouldn't work. I won't be available until Wed. at the earliest for additional comment (out of town, no internet connection). chris From bix at sendu.me.uk Sun Aug 9 09:15:07 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 14:15:07 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> Message-ID: <4A7ECBDB.9030505@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> The easiest, cleanest and most standard solution is to create a lib >> directory and svn move Bio into it. Does anyone have an objection to >> me doing this for the network, db and run packages? It will only >> affect developers currently working on code in those packages, and >> they just need to be aware that an svn update will be rather dramatic >> after my change. > > From what I have seen (with my few tests) everything (modules, scripts) > gets copied into blib just fine and the temp folder for script > generation gets cleaned up; I haven't progressed beyond to the > installation step, but there isn't anything to me that indicates it > wouldn't work. ./Build testinstall will show you it doesn't work as-is. If you're in a rush I'll just do the svn moves and we can revert later if anyone complains. From cjfields at illinois.edu Sun Aug 9 09:19:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:19:30 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <2790F9A5-43E8-47E5-B5AA-98239B95EF04@illinois.edu> On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. > > If you're in a rush I'll just do the svn moves and we can revert > later if anyone complains. Works for me. The sooner it gets done the better (next week, would be nice, but two is fine so we don't rush it too much). I'll be working on several other bits, including FASTQ, when I get back Wed, then I'll merge over and work on the next point release. chris From cjfields at illinois.edu Sun Aug 9 09:34:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:34:07 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. Sorry, I'll be leaving in the next hour, but for the above, did you mean './Build fakeinstall'? As long as you're moving everything into /lib (which I fully support), we should consider hard_coding scripts into bp_foo.PLS syntax seeing as we're going through additional trouble of converting them over. That is, unless there is a specific purpose to keeping them without the 'bp_'. chris From bix at sendu.me.uk Sun Aug 9 10:00:18 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 15:00:18 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <4A7ED672.20701@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>>> The easiest, cleanest and most standard solution is to create a lib >>>> directory and svn move Bio into it. Does anyone have an objection to >>>> me doing this for the network, db and run packages? It will only >>>> affect developers currently working on code in those packages, and >>>> they just need to be aware that an svn update will be rather >>>> dramatic after my change. >>> >>> From what I have seen (with my few tests) everything (modules, >>> scripts) gets copied into blib just fine and the temp folder for >>> script generation gets cleaned up; I haven't progressed beyond to the >>> installation step, but there isn't anything to me that indicates it >>> wouldn't work. >> >> ./Build testinstall will show you it doesn't work as-is. > > Sorry, I'll be leaving in the next hour, but for the above, did you mean > './Build fakeinstall'? Yes, sorry. > As long as you're moving everything into /lib (which I fully support), > we should consider hard_coding scripts into bp_foo.PLS syntax seeing as > we're going through additional trouble of converting them over. That > is, unless there is a specific purpose to keeping them without the 'bp_'. (The final suffix is supposed to be .pl - we convert from PLS to pl in core, no conversion needed in db) Yes, for only a handful of scripts, it actually makes sense to flatten them all into a new bin directory, which is the default script location for Module::Build. So for example I'd do: svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl etc. From bix at sendu.me.uk Sun Aug 9 12:13:03 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 17:13:03 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EF58F.9000909@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. These issues should now be resolved. I'll note that for future cases similar to 3), if a user chooses to install an optional dependency using CPAN/CPANPLUS and the installation of that external module causes an infinite loop, it's an issue of that module or CPAN/CPANPLUS, not BioPerl. The solution from our end is to tell the user to choose not to install that dependency or ask on the CPAN mailing list if they really need it. (I've often got stuck in infinite loops just trying to install Bundle::CPAN! CPAN itself will detect infinite loops after a while and kill itself.) From jdalzell03 at qub.ac.uk Sun Aug 9 05:06:26 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Sun, 9 Aug 2009 02:06:26 -0700 (PDT) Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <24885345.post@talk.nabble.com> Thanks for the replies, I emailed Chris and Brian individually, but I guess it would be helpfull if I threw my solution to "the dogs" In the end I found that by downloading subversion (you need to sign up to collabnet for a user account first), and following the installation instructions of the relevant subversion pages on the bioperl site (http://www.bioperl.org/wiki/Using_Subversion), that It downloaded fine first time. No need for CPAN, or a PPM, just copy paste 'svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live' into your command line, and it auto installs in under 30 seconds...definately the way to go for anyone else out there trying to bust-a-move on a Win machine. At time of writing, I have also installed BioPerl-db (same as above, copy and paste 'svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db' into command line), and BioPerl-run (I typed in 'svn co svn://code.open-bio.org/bioperl/bioperl-run/trunk bio' (I THINK), and it worked fine. The relevant installation instructions don't give an explicit command for BP-run installation, but I think that matches the branches and trunk in the subversion repository (if not, sorry, but you can cross ref its position in there easily by following the links). Both have worked without problem on Strawberry Perl 5.10 through WinVista, so far. Jonny -- View this message in context: http://www.nabble.com/bioperl-1.6-installation-on-vista-with-perl-5.10-tp24875623p24885345.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From mwhagen85 at gmail.com Mon Aug 10 14:54:53 2009 From: mwhagen85 at gmail.com (OjoLoco) Date: Mon, 10 Aug 2009 11:54:53 -0700 (PDT) Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits Message-ID: <24905417.post@talk.nabble.com> Hello all, I have found matching sequences between two genomes and I would now like to create a graphic that contains a heat map-like track that will show areas of the genome that were found more often than others. For every nt I have the number of times it was found, so if it was found very often it would be a darker color than say a nt that wasn't found at all. Is there any way to achieve this using built in BioPerl graphics? Thank you for your time. -- View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Mon Aug 10 15:22:36 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 10 Aug 2009 15:22:36 -0400 Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits In-Reply-To: <24905417.post@talk.nabble.com> References: <24905417.post@talk.nabble.com> Message-ID: Hi, You should be able to do that with wiggle_density and wiggle_xyplot glyphs. See http://gmod.org/wiki/GBrowse/Uploading_Wiggle_Tracks for instructions on constructing wiggle plots. After you have a wiggle plot, you'll need the wiggle2gff3.pl script (which is part of GBrowse, but it will should run fine on its own), which you can get from GMOD's cvs: http://gmod.cvs.sourceforge.net/viewvc/*checkout*/gmod/Generic-Genome-Browser/bin/wiggle2gff3.pl which will convert the wig file to a binary file. Then you can create Bio::SeqFeatureI objects that will work with Bio::Graphics to draw the density or xyplot. Note as well that Bio::Graphics is no longer part of the main BioPerl distribution, so you'll need to get the most recent version from CPAN. Also, fair warning: I've never actually done this; I've only used wiggle plots in the context of GBrowse, but it should work pretty much as described. Scott On Aug 10, 2009, at 2:54 PM, OjoLoco wrote: > > Hello all, > I have found matching sequences between two genomes and I would > now like > to create a graphic that contains a heat map-like track that will > show areas > of the genome that were found more often than others. For every nt > I have > the number of times it was found, so if it was found very often it > would be > a darker color than say a nt that wasn't found at all. Is there any > way to > achieve this using built in BioPerl graphics? Thank you for your time. > -- > View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jdalzell03 at qub.ac.uk Tue Aug 11 11:07:52 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:07:52 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <24919498.post@talk.nabble.com> Hi, trying to run the example given for Bio::Tools::HMM on the Bioperl site, and when I try to run it, I get this in the command line... "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. Compilation failed in require at HMM.txt line 4. BEGIN failed--compilation aborted at HMM.txt line 4." I have installed the entire bioperl-ext package through subversion, and it looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it won't work. Am I missing something? I'm under the impression that the C-compiler comes with bioperl-ext (which installed with no reported problems)? I concede that I am extrememly new to both Perl in general and Bioperl more specifically, but I have followed the instructions which I can find. I have the bioperl core installed in addition to bioperl-db and bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that most work through Linux systems...I am at times sorely tempted myself. Any suggestions would be welcomed gratefully, cheers, Jonny ps. this is the partial script I was trying to run... #!/usr/bin/perl -w usr strict; use Bio::Tools::HMM; use Bio::SeqIO; use Bio::Matrix::Scoring; #Create a HMM object #ACGT are the bases NC mean non-coding and coding $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); #Initialise some training observation sequences $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); @seqs = ($seq1, $seq2); #Train the HMM with the observation sequences $hmm ->baum_welch_training(\@seqs); #Get parameters $init = $hmm->init_prob; #Returns an array reference $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring I realise that this is incomplete. -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shameer at ncbs.res.in Tue Aug 11 13:07:20 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 11 Aug 2009 22:37:20 +0530 (IST) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Hello Jonny, Are you sure that you have a compiled version of HMMER installed in your machine ? -- K. Shameer > Hi, > > trying to run the example given for Bio::Tools::HMM on the Bioperl site, > and > when I try to run it, I get this in the command line... > > "The C-compiled engine for Hidden Markov Model (HMM) has not been > installed. > Please read the install the bioperl-ext package > > BEGIN failed--compilation aborted at > C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. > Compilation failed in require at HMM.txt line 4. > BEGIN failed--compilation aborted at HMM.txt line 4." > > I have installed the entire bioperl-ext package through subversion, and it > looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it > won't work. Am I missing something? I'm under the impression that the > C-compiler comes with bioperl-ext (which installed with no reported > problems)? I concede that I am extrememly new to both Perl in general and > Bioperl more specifically, but I have followed the instructions which I > can > find. I have the bioperl core installed in addition to bioperl-db and > bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that > most > work through Linux systems...I am at times sorely tempted myself. > > Any suggestions would be welcomed gratefully, > cheers, > Jonny > > ps. this is the partial script I was trying to run... > > #!/usr/bin/perl -w > > usr strict; > use Bio::Tools::HMM; > use Bio::SeqIO; > use Bio::Matrix::Scoring; > > #Create a HMM object > #ACGT are the bases NC mean non-coding and coding > $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); > > #Initialise some training observation sequences > $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); > $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); > @seqs = ($seq1, $seq2); > > #Train the HMM with the observation sequences > $hmm ->baum_welch_training(\@seqs); > > #Get parameters > $init = $hmm->init_prob; #Returns an array reference > $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring > $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring > > I realise that this is incomplete. > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jdalzell03 at qub.ac.uk Tue Aug 11 11:14:59 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:14:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24919603.post@talk.nabble.com> I should point out perhaps that CPAN is not an option on a Win setup...it has never worked for anything I have tried to install. Although I'm using Strawberry Perl now, I had no success getting bioperl or any of its components through the activestate PPM either (One of the reasons I ended up going to Strawberry). The only option I have for installation is the subversion server. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919603.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 11:42:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:42:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24920117.post@talk.nabble.com> I realise that this looks like there is a problem with Bio::Tools::HMM when looking at the source code, but I've even tried replacing the HMM.pm file I had with the HMM.pm script at http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, and now I'm getting... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: C:/strawberry/perl/lib C:/strawberry/perl/site/ lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." ?? jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24920117.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 14:52:21 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 11:52:21 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Message-ID: <24923606.post@talk.nabble.com> Hi, I'm as sure as I can be. I look in the HHMER folder and it contains "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something to do with @INC, but I put "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at the top of my script, which definately encompasses the directory it should be in, and I still get... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib C:/strawberry/perl/site/lib/ Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." I'm out of ideas. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From rmb32 at cornell.edu Tue Aug 11 15:23:56 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:23:56 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24920117.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> Message-ID: <4A81C54C.5020905@cornell.edu> Jonny, For quicker help you might want to try #bioperl on freenode. That said, the problem here is that when you get code from subversion, you are not really 'installing' it, you are just copying it to your machine. Part of the installation process is compiling these things, and for that you need a working C compiler. I don't know anything about using BioPerl on Windows, but as a general recommendation I would say go back to the CPAN and/or ppm directions and getting those working. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Jonny Dalzell wrote: > I realise that this looks like there is a problem with Bio::Tools::HMM when > looking at the source code, but I've even tried replacing the HMM.pm file I > had with the HMM.pm script at > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, > and now I'm getting... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: > C:/strawberry/perl/lib C:/strawberry/perl/site/ > lib .) at HMM.txt line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > ?? > > jonny From maj at fortinbras.us Tue Aug 11 15:22:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 15:22:42 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <7C7654A8A64E49158F6761EE09C9F297@NewLife> Jonny, You need the HMMER application, which is not part of BioPerl. See http://hmmer.janelia.org/ for download options. MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 2:52 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Tue Aug 11 15:48:11 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:48:11 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81C54C.5020905@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> Message-ID: <4A81CAFB.5050903@cornell.edu> Elaborating more, the 'C-compiled engine' error comes because Bio::Ext::HMM is not installed, because bioperl-ext is not installed (correctly), because Bio::Ext::HMM is an XS extension written in C. Which needs to be compiled. With a C compiler. As part of some kind of installation process, not just copying the files to a machine with subversion. Rob Robert Buels wrote: > Jonny, > > For quicker help you might want to try #bioperl on freenode. > > That said, the problem here is that when you get code from subversion, > you are not really 'installing' it, you are just copying it to your > machine. Part of the installation process is compiling these things, > and for that you need a working C compiler. > > I don't know anything about using BioPerl on Windows, but as a general > recommendation I would say go back to the CPAN and/or ppm directions and > getting those working. > > Rob > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From bix at sendu.me.uk Tue Aug 11 16:11:43 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 11 Aug 2009 21:11:43 +0100 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <4A81D07F.6000703@sendu.me.uk> Jonny Dalzell wrote: > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. lib (or at least one entry in your PERL5LIB) needs to point to the directory that contains the Bio directory. So: use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; Now it will be able to locate Bio::Tools::Hmm. You'll still get your original error because you don't have Hmmer installed. See Mark's reply. From jdalzell03 at qub.ac.uk Tue Aug 11 16:29:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:29:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81D07F.6000703@sendu.me.uk> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> Message-ID: <24925178.post@talk.nabble.com> Hi, thanks. I did install HHMER from the site Mark suggested, and it is within the directories that perl recognizes when reading the script...still I get "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package" Is it possible that this module simply won't run through windows? jonny Sendu Bala-2 wrote: > > Jonny Dalzell wrote: >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >> something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >> the top of my script, which definately encompasses the directory it >> should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >> HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. > > lib (or at least one entry in your PERL5LIB) needs to point to the > directory that contains the Bio directory. So: > > use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > > Now it will be able to locate Bio::Tools::Hmm. You'll still get your > original error because you don't have Hmmer installed. See Mark's reply. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 16:31:36 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:31:36 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81CAFB.5050903@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> Message-ID: <24925211.post@talk.nabble.com> OK, so is there any particular C-compiler which I should use? Thanks, jonny Robert Buels wrote: > > Elaborating more, the 'C-compiled engine' error comes because > Bio::Ext::HMM is not installed, because bioperl-ext is not installed > (correctly), because Bio::Ext::HMM is an XS extension written in C. > Which needs to be compiled. With a C compiler. As part of some kind of > installation process, not just copying the files to a machine with > subversion. > > Rob > > Robert Buels wrote: >> Jonny, >> >> For quicker help you might want to try #bioperl on freenode. >> >> That said, the problem here is that when you get code from subversion, >> you are not really 'installing' it, you are just copying it to your >> machine. Part of the installation process is compiling these things, >> and for that you need a working C compiler. >> >> I don't know anything about using BioPerl on Windows, but as a general >> recommendation I would say go back to the CPAN and/or ppm directions and >> getting those working. >> >> Rob >> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Tue Aug 11 17:05:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 17:05:10 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925178.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: Jonny, It will run in Win/Vis but there are some caveats. The BioPerl package has some plain C components, as Rob pointed out. These need to be compiled, and the objects/libraries put in the right place. CPAN will cause this to happen when you have a compiler available; ActiveState .ppm will download the binaries directly from the repository (my understanding, anyway). CPAN is always available by doing > perl -MCPAN -e shell but you may not have a C compiler around. This is a little tricky. You can either explore Visual C/C++ options from MS here http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, and install Cygwin (www.cygwin.com), which creates a linux-like environment with GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful as the real thing, I grant. Which bring me to a third possibility, that I haven't tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot system (https://help.ubuntu.com/community/WindowsDualBoot). MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 4:29 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > thanks. I did install HHMER from the site Mark suggested, and it is within > the directories that perl recognizes when reading the script...still I get > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > Please read the install the bioperl-ext package" > > Is it possible that this module simply won't run through windows? > > jonny > > > > Sendu Bala-2 wrote: >> >> Jonny Dalzell wrote: >>> Hi, >>> >>> I'm as sure as I can be. I look in the HHMER folder and it contains >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >>> something >>> to do with @INC, but I put >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >>> the top of my script, which definately encompasses the directory it >>> should >>> be in, and I still get... >>> >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >>> C:/strawberry/perl/site/lib/ >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >>> HMM.txt >>> line 5. >>> BEGIN failed--compilation aborted at HMM.txt line 5." >>> >>> I'm out of ideas. >> >> lib (or at least one entry in your PERL5LIB) needs to point to the >> directory that contains the Bio directory. So: >> >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; >> >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your >> original error because you don't have Hmmer installed. See Mark's reply. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue Aug 11 17:39:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 12 Aug 2009 09:39:30 +1200 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB6F93AA@exchsth.agresearch.co.nz> Dev-C++ http://www.bloodshed.net/devcpp.html is a good (i.e. free under GPL) Windows compiler I've used before. Might save having to install Cygwin. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Wednesday, 12 August 2009 9:05 a.m. > To: Jonny Dalzell; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Jonny, > It will run in Win/Vis but there are some caveats. The BioPerl package has > some > plain C components, as Rob pointed out. These need to be compiled, and the > objects/libraries put in the right place. CPAN will cause this to happen when > you have a compiler available; ActiveState .ppm will download the binaries > directly from the repository (my understanding, anyway). CPAN is always > available by doing > > > perl -MCPAN -e shell > > but you may not have a C compiler around. This is a little tricky. You can > either explore Visual C/C++ options from MS here > http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, > and install Cygwin (www.cygwin.com), which creates a linux-like environment > with > GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful > as > the real thing, I grant. Which bring me to a third possibility, that I haven't > tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot > system (https://help.ubuntu.com/community/WindowsDualBoot). > MAJ > ----- Original Message ----- > From: "Jonny Dalzell" > To: > Sent: Tuesday, August 11, 2009 4:29 PM > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > > > > > Hi, > > > > thanks. I did install HHMER from the site Mark suggested, and it is within > > the directories that perl recognizes when reading the script...still I get > > > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > > Please read the install the bioperl-ext package" > > > > Is it possible that this module simply won't run through windows? > > > > jonny > > > > > > > > Sendu Bala-2 wrote: > >> > >> Jonny Dalzell wrote: > >>> Hi, > >>> > >>> I'm as sure as I can be. I look in the HHMER folder and it contains > >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > >>> something > >>> to do with @INC, but I put > >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > >>> the top of my script, which definately encompasses the directory it > >>> should > >>> be in, and I still get... > >>> > >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > >>> C:/strawberry/perl/site/lib/ > >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > >>> HMM.txt > >>> line 5. > >>> BEGIN failed--compilation aborted at HMM.txt line 5." > >>> > >>> I'm out of ideas. > >> > >> lib (or at least one entry in your PERL5LIB) needs to point to the > >> directory that contains the Bio directory. So: > >> > >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > >> > >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your > >> original error because you don't have Hmmer installed. See Mark's reply. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > -- > > View this message in context: > > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista-- > tp24919498p24925178.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Tue Aug 11 19:44:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:44:23 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext that generates HMM's (XS-based bindings I think). I have managed to compile it successfully on Ubuntu and Mac OS X, but WinVista is a whole different bag-o-worms altogether (untested AFAIK). For the record, I do not recommend using it; I'm unsure about it's maintenance status, so it may be released separately. It would be best to use something better supported, such as the HMMER wrapper in bioperl-run and the hmmer parsers in bioperl-core. We may also have wrappers for similar code available in biolib at some future point. chris On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ > Tools/";" at > the top of my script, which definately encompasses the directory it > should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ > per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 11 19:48:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:48:08 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925211.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> <24925211.post@talk.nabble.com> Message-ID: <3A5CA958-3B03-4252-B78F-07BBFF1FA355@illinois.edu> Any C-based code should use the same compiler used from whatever perl version you are running. ActiveState supports both VC/C++ (as Mark indicates) or mingw/gcc. I think Strawberry supports mainly the latter. Though you can use CygWin, I think a native Win module is the best way to go if possible. It will likely be a tricky road, so keep us updated and we'll attempt to help out the best we can. chris On Aug 11, 2009, at 3:31 PM, Jonny Dalzell wrote: > > OK, > > so is there any particular C-compiler which I should use? > > Thanks, > jonny > > > > Robert Buels wrote: >> >> Elaborating more, the 'C-compiled engine' error comes because >> Bio::Ext::HMM is not installed, because bioperl-ext is not installed >> (correctly), because Bio::Ext::HMM is an XS extension written in C. >> Which needs to be compiled. With a C compiler. As part of some >> kind of >> installation process, not just copying the files to a machine with >> subversion. >> >> Rob >> >> Robert Buels wrote: >>> Jonny, >>> >>> For quicker help you might want to try #bioperl on freenode. >>> >>> That said, the problem here is that when you get code from >>> subversion, >>> you are not really 'installing' it, you are just copying it to your >>> machine. Part of the installation process is compiling these >>> things, >>> and for that you need a working C compiler. >>> >>> I don't know anything about using BioPerl on Windows, but as a >>> general >>> recommendation I would say go back to the CPAN and/or ppm >>> directions and >>> getting those working. >>> >>> Rob >>> >>> >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Aug 11 20:09:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 20:09:01 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> Message-ID: <69BDE54FD5C943669BCD41A9A607634A@NewLife> [OOps. Sorry about that. The compiler ideas still apply however.] ----- Original Message ----- From: "Chris Fields" To: "Jonny Dalzell" Cc: Sent: Tuesday, August 11, 2009 7:44 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext > that generates HMM's (XS-based bindings I think). I have managed to compile > it successfully on Ubuntu and Mac OS X, but WinVista is a whole different > bag-o-worms altogether (untested AFAIK). > > For the record, I do not recommend using it; I'm unsure about it's > maintenance status, so it may be released separately. It would be best to > use something better supported, such as the HMMER wrapper in bioperl-run and > the hmmer parsers in bioperl-core. We may also have wrappers for similar > code available in biolib at some future point. > > chris > > On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > >> >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ Tools/";" at >> the top of my script, which definately encompasses the directory it should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. >> >> Jonny >> -- >> View this message in context: >> http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Aug 12 12:44:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 Aug 2009 11:44:37 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ED672.20701@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> <4A7ED672.20701@sendu.me.uk> Message-ID: <1F099DCC-073E-470E-873A-608E674375C1@illinois.edu> On Aug 9, 2009, at 9:00 AM, Sendu Bala wrote: > Chris Fields wrote: > ... >> As long as you're moving everything into /lib (which I fully >> support), we should consider hard_coding scripts into bp_foo.PLS >> syntax seeing as we're going through additional trouble of >> converting them over. That is, unless there is a specific purpose >> to keeping them without the 'bp_'. > > (The final suffix is supposed to be .pl - we convert from PLS to pl > in core, no conversion needed in db) Yes, had that reversed in my commit. Thanks. > Yes, for only a handful of scripts, it actually makes sense to > flatten them all into a new bin directory, which is the default > script location for Module::Build. > > So for example I'd do: > svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl > etc. Yes, exactly. It seems we're going out of our way to keep things as they were previously when using ExtUtil::MakeMaker/Makefile.PL. I'm not quite sure why we've bent over backwards to work around these issues when it is much easier to stick to simple standards that 99% of CPAN uses: scripts in bin (or whatever dir is passed to script_files), modules in lib. I'm not complaining, just haven't heard an explanation about that one way or the other. chris From rmb32 at cornell.edu Thu Aug 13 14:59:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 11:59:00 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A79A52E.7000104@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> Message-ID: <4A846274.4000600@cornell.edu> OK, commit 15927 adds some more info about -db options for Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, nuccore, nucgss, nucest, and unigene, and including a link to an (XML) page from NCBI that lists inputs that NCBI accepts. Could somebody who knows more about eUtils than me also review this patch and make corrections if necessary? Rob Robert Buels wrote: > I think you're looking for the -db => 'nucgss' option. > > I'll add a better listing of this (undocumented) options to the > Bio::DB::Query::GenBank docs. > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jdalzell03 at qub.ac.uk Thu Aug 13 15:27:14 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Thu, 13 Aug 2009 12:27:14 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24957222.post@talk.nabble.com> Fellows, thanks very much for the input. However, today I saw fit to dual-boot with ubuntu. I've installed everything, but I still get the same "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package " message! Is it ridiculous of me to expect ubuntu to take care of this for me? How do I go about compiling the HMM? Thanks in advance, Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24957222.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Aug 13 15:41:21 2009 From: jonathanmflowers at gmail.com (Jonathan Flowers) Date: Thu, 13 Aug 2009 12:41:21 -0700 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO Message-ID: Hi, I am trying to parse BLAST reports written in XML using Bio::SearchIO. When running the following code on a set of reports (multiple query results in a single file), I only get one ResultI object. I tried running the same code on a file in 'blast' format and obtained the expected results (ie one ResultI object for each query), suggesting that the issue is with blastxml. I found an old thread on this listserv where someone had had a similar problem, but could not find how it was resolved. I am using Bioperl 1.5.2 and the XML reports were generated using blastall with the -m7 option. my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'blastreport.xml' ); while( my $result = $in->next_result ) { print $result->query_name,"\n"; while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #do something with hsp } } } Thanks Jonathan From rmb32 at cornell.edu Thu Aug 13 17:37:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 14:37:21 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24957222.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> Message-ID: <4A848791.4010402@cornell.edu> Jonny Dalzell wrote: > Is it ridiculous of me to expect ubuntu to take care of this for me? How do > I go about compiling the HMM? Yes. This is a very specialized thing that you're doing, and Ubuntu does not have the resources to package every single thing. Unfortunately, it looks like bioperl-ext package is not installable under Ubuntu 9.04 anyway, which is what I'm running. For others on this list, if somebody is interested in doing maintaining it, I'd be happy to help out by testing on Debian-based Linux platforms. We need to clarify this package's maintenance status: if there is nobody interested in maintaining it, I would recommend that bioperl-ext be removed from distribution. It's not in anybody's interest to have unmaintained software out there causing confusion. So Jonny, in short, I would say "do not use bioperl-ext". Step back. What are you trying to accomplish? Chris already recommended some alternative methods in his email of 8/11 on this subject. Perhaps we can guide you to some software that is actively maintained and will meet your needs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 13 18:06:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:06:29 -0500 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A846274.4000600@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> <4A846274.4000600@cornell.edu> Message-ID: <916D0E26-EBB5-4E28-99AD-F689639BB93A@illinois.edu> It looks fine. As for the databases, you can always get the latest databases using a script from bioperl-live, which uses Bio::DB::EUtilities to access them directly (scripts/DB_EUtilities/ einfo.PLS, which should install as bp_einfo.pl). (looking at the below, what is blastdbinfo?) cjfields4:DB_EUtilities cjfields$ perl einfo.PLS pubmed protein nucleotide nuccore nucgss nucest structure genome biosystems blastdbinfo books cancerchromosomes cdd gap domains gene genomeprj gensat geo gds homologene journals mesh ncbisearch nlmcatalog omia omim pepdome pmc popset probe proteinclusters pcassay pccompound pcsubstance snp sra taxonomy toolkit unigene chris On Aug 13, 2009, at 1:59 PM, Robert Buels wrote: > OK, commit 15927 adds some more info about -db options for > Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, > nuccore, nucgss, nucest, and unigene, and including a link to an > (XML) page from NCBI that lists inputs that NCBI accepts. > > Could somebody who knows more about eUtils than me also review this > patch and make corrections if necessary? > > Rob > > Robert Buels wrote: >> I think you're looking for the -db => 'nucgss' option. >> I'll add a better listing of this (undocumented) options to the >> Bio::DB::Query::GenBank docs. >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:08:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:08:37 -0500 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO In-Reply-To: References: Message-ID: <65CC2787-7F0A-43C1-A840-554A2E4FD76A@illinois.edu> You should update to bioperl 1.6; I believe I fixed this issue after the 1.5.2 release. chris On Aug 13, 2009, at 2:41 PM, Jonathan Flowers wrote: > Hi, > > I am trying to parse BLAST reports written in XML using > Bio::SearchIO. When > running the following code on a set of reports (multiple query > results in a > single file), I only get one ResultI object. I tried running the > same code > on a file in 'blast' format and obtained the expected results (ie one > ResultI object for each query), suggesting that the issue is with > blastxml. > I found an old thread on this listserv where someone had had a similar > problem, but could not find how it was resolved. > > I am using Bioperl 1.5.2 and the XML reports were generated using > blastall > with the -m7 option. > > my $in = new Bio::SearchIO(-format => 'blastxml', -file => > 'blastreport.xml' ); > while( my $result = $in->next_result ) { > print $result->query_name,"\n"; > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #do something with hsp > } > } > } > > Thanks > > Jonathan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:18:57 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:18:57 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A848791.4010402@cornell.edu> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> <4A848791.4010402@cornell.edu> Message-ID: On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > Jonny Dalzell wrote: >> Is it ridiculous of me to expect ubuntu to take care of this for >> me? How do >> I go about compiling the HMM? > Yes. This is a very specialized thing that you're doing, and Ubuntu > does not have the resources to package every single thing. > > Unfortunately, it looks like bioperl-ext package is not installable > under Ubuntu 9.04 anyway, which is what I'm running. For others on > this list, if somebody is interested in doing maintaining it, I'd be > happy to help out by testing on Debian-based Linux platforms. We > need to clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that bioperl- > ext be removed from distribution. It's not in anybody's interest to > have unmaintained software out there causing confusion. I have cc'd Yee Man Chan for this. If there isn't a response or the message bounces, we do one of two things: 1) consider it deprecated (probably safest). 2) spin it out into a separate module. Just tried to comile it myself and am getting errors (using 64bit perl 5.10), so I think, unless someone wants to take this on, option #1 is best. > So Jonny, in short, I would say "do not use bioperl-ext". In general, that's a safe bet. We're moving most of our C/C++ bindings to BioLib. > Step back. What are you trying to accomplish? Chris already > recommended some alternative methods in his email of 8/11 on this > subject. Perhaps we can guide you to some software that is actively > maintained and will meet your needs. > > Rob Exactly. Lots of other (better supported!) options out there. HMMER, SeqAn, and others. chris From cjfields at illinois.edu Thu Aug 13 20:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 19:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <650586.94518.qm@web30407.mail.mud.yahoo.com> References: <650586.94518.qm@web30407.mail.mud.yahoo.com> Message-ID: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> (just to point out to everyone, Yee Man's contact information was in the POD) Yee Man, I have the output in the below link: http://gist.github.com/167542 There are similar problems popping up on 32- and 64-bit perl 5.10.0, Mac OS X 10.5. Haven't had time to debug it unfortunately. I think we should seriously consider spinning this code off into it's own distribution for CPAN. It's unfortunately bit-rotting away in bioperl-ext. If you want to continue supporting it I can help set that up. chris On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > Hi > > So is this an HMM only problem? Or does it apply to other bioperl- > ext modules? > > What exactly are the compilation errors for HMM? I believe my > implementation is just a simple one based on Rabiner's paper. > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > ~murphyk%2FBayes > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > I don't think I did anything fancy that makes it machine > dependent or non-ANSI C. > > Yee Man > > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Jonny Dalzell" , "BioPerl List" > >, "Yee Man Chan" >> Date: Thursday, August 13, 2009, 3:18 PM >> >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >> >>> Jonny Dalzell wrote: >>>> Is it ridiculous of me to expect ubuntu to take >> care of this for me? How do >>>> I go about compiling the HMM? >>> Yes. This is a very specialized thing that >> you're doing, and Ubuntu does not have the resources to >> package every single thing. >>> >>> Unfortunately, it looks like bioperl-ext package is >> not installable under Ubuntu 9.04 anyway, which is what I'm >> running. For others on this list, if somebody is >> interested in doing maintaining it, I'd be happy to help out >> by testing on Debian-based Linux platforms. We need to >> clarify this package's maintenance status: if there is >> nobody interested in maintaining it, I would recommend that >> bioperl-ext be removed from distribution. It's not in >> anybody's interest to have unmaintained software out there >> causing confusion. >> >> I have cc'd Yee Man Chan for this. If there isn't a >> response or the message bounces, we do one of two things: >> >> 1) consider it deprecated (probably safest). >> 2) spin it out into a separate module. >> >> Just tried to comile it myself and am getting errors (using >> 64bit perl 5.10), so I think, unless someone wants to take >> this on, option #1 is best. >> >>> So Jonny, in short, I would say "do not use >> bioperl-ext". >> >> In general, that's a safe bet. We're moving most of >> our C/C++ bindings to BioLib. >> >>> Step back. What are you trying to >> accomplish? Chris already recommended some alternative >> methods in his email of 8/11 on this subject. Perhaps >> we can guide you to some software that is actively >> maintained and will meet your needs. >>> >>> Rob >> >> Exactly. Lots of other (better supported!) options >> out there. HMMER, SeqAn, and others. >> >> chris >> > > > From ymc at yahoo.com Thu Aug 13 19:58:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 16:58:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <650586.94518.qm@web30407.mail.mud.yahoo.com> Hi So is this an HMM only problem? Or does it apply to other bioperl-ext modules? What exactly are the compilation errors for HMM? I believe my implementation is just a simple one based on Rabiner's paper. http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg I don't think I did anything fancy that makes it machine dependent or non-ANSI C. Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Jonny Dalzell" , "BioPerl List" , "Yee Man Chan" > Date: Thursday, August 13, 2009, 3:18 PM > > On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > > > Jonny Dalzell wrote: > >> Is it ridiculous of me to expect ubuntu to take > care of this for me?? How do > >> I go about compiling the HMM? > > Yes.? This is a very specialized thing that > you're doing, and Ubuntu does not have the resources to > package every single thing. > > > > Unfortunately, it looks like bioperl-ext package is > not installable under Ubuntu 9.04 anyway, which is what I'm > running.? For others on this list, if somebody is > interested in doing maintaining it, I'd be happy to help out > by testing on Debian-based Linux platforms.? We need to > clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that > bioperl-ext be removed from distribution.? It's not in > anybody's interest to have unmaintained software out there > causing confusion. > > I have cc'd Yee Man Chan for this.? If there isn't a > response or the message bounces, we do one of two things: > > 1) consider it deprecated (probably safest). > 2) spin it out into a separate module. > > Just tried to comile it myself and am getting errors (using > 64bit perl 5.10), so I think, unless someone wants to take > this on, option #1 is best. > > > So Jonny, in short, I would say "do not use > bioperl-ext". > > In general, that's a safe bet.? We're moving most of > our C/C++ bindings to BioLib. > > > Step back.? What are you trying to > accomplish?? Chris already recommended some alternative > methods in his email of 8/11 on this subject.? Perhaps > we can guide you to some software that is actively > maintained and will meet your needs. > > > > Rob > > Exactly.? Lots of other (better supported!) options > out there.? HMMER, SeqAn, and others. > > chris > From agulyaskov at mail.rockefeller.edu Thu Aug 13 20:40:22 2009 From: agulyaskov at mail.rockefeller.edu (Attila Gulyas-Kovacs) Date: Thu, 13 Aug 2009 20:40:22 -0400 Subject: [Bioperl-l] bus error when indexing large file Message-ID: <4A84B276.2040706@mail.rockefeller.edu> Dear all, I can index the SwissProt database without problem but I get bus error when I try to index the much larger TrEMBL database. Indexing failed with both the swissprot and fasta format (using Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke up TrEMBL into multiple files ('chunks'), about the size of the SwissProt database. Then I could could create separate indeces for each chunk. But I got bus error when I passed all chunks simultaneously to my script (below) to create a single index. Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. What do you suggest? Attila #! /usr/bin/perl use warnings; use strict; use Bio::Index::Swissprot; my $index_file_name = shift; my $inx = Bio::Index::Swissprot->new( -filename => $index_file_name, -write_flag => 1); $inx->make_index(@ARGV); -- Attila Gulyas-Kovacs Postdoctoral Associate Rockefeller University Gadsby Lab (Cardiac/Membrane Physiology) D.W. Bronk Building, Room 307 1230 York Avenue New York, NY, 10065 Tel: (212)327-8617 Fax: (212)327-7589 From ymc at yahoo.com Fri Aug 14 00:15:41 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 21:15:41 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> Message-ID: <528790.13637.qm@web30404.mail.mud.yahoo.com> Hi all Based on my understanding of the warning messages, the problem seems to come from the "typemap" file when I cast the return from SvIV from an integer to a pointer. I suppose this might cause problems in 64-bit machines. But when I look at perlguts and perlxs, it does seem to me that the way I did in typemap is the suggested way to do it because the IV type is "guaranteed to be big enough to hold a pointer". Nevertheless, I modified my typemap file to look exactly like what's in perlxs. (See PS) Does anyone know how to deal with this problem? Or can anyone of you give me access to a 64-bit machine to sort this out? Thank you! Yee Man PS This is a typemap file using exactly the same lines suggested by perlxs. It works in my 32-bit machine. Can someone try it on a 64-bit machine? Thanks ================================================ TYPEMAP HMM * T_HMM INPUT T_HMM if (sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG)) $var = ($type)SvIV((SV*)SvRV( $arg )); else{ warn( \"${Package}::$func_name() -- $var is not a blessed SV referenc e\" ); XSRETURN_UNDEF; } OUTPUT T_HMM sv_setref_pv($arg, "Bio::Ext::HMM::HMM", (void*) $var); ======================================================== --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > From ymc at yahoo.com Fri Aug 14 04:27:11 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 01:27:11 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <168012.97676.qm@web30405.mail.mud.yahoo.com> Ah.. I find that the typemap can become as simple as this ===================== TYPEMAP HMM * T_PTROBJ ===================== Then the generated HMM.c will have a function called INT2PTR to do the pointer conversion. I believe this should solve the warnings. Attached are the updated HMM.xs and typemap. Can someone with a 64-bit machine give it a try? Thank you Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5588 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: typemap Type: application/octet-stream Size: 26 bytes Desc: not available URL: From cjfields at illinois.edu Fri Aug 14 10:20:21 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:20:21 -0500 Subject: [Bioperl-l] bus error when indexing large file In-Reply-To: <4A84B276.2040706@mail.rockefeller.edu> References: <4A84B276.2040706@mail.rockefeller.edu> Message-ID: I can attempt to reproduce this (I have very similar specs). I'm wondering if it has something to do with large file support. Have you tried the perl packaged with Mac OS X? I think it's perl 5.8.8. chris On Aug 13, 2009, at 7:40 PM, Attila Gulyas-Kovacs wrote: > Dear all, > > I can index the SwissProt database without problem but I get bus > error when I try to index the much larger TrEMBL database. Indexing > failed with both the swissprot and fasta format (using > Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke > up TrEMBL into multiple files ('chunks'), about the size of the > SwissProt database. Then I could could create separate indeces for > each chunk. But I got bus error when I passed all chunks > simultaneously to my script (below) to create a single index. > Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. > > What do you suggest? > > Attila > > > #! /usr/bin/perl > use warnings; > use strict; > use Bio::Index::Swissprot; > my $index_file_name = shift; > my $inx = Bio::Index::Swissprot->new( > -filename => $index_file_name, > -write_flag => 1); > $inx->make_index(@ARGV); > > -- > Attila Gulyas-Kovacs > Postdoctoral Associate > > Rockefeller University > Gadsby Lab (Cardiac/Membrane Physiology) > D.W. Bronk Building, Room 307 1230 York Avenue > New York, NY, 10065 > Tel: (212)327-8617 > Fax: (212)327-7589 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Aug 14 10:10:33 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 16:10:33 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence Message-ID: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Hi everyone, I'm using Bio::AlignIO to read in a series of multiple alignments. Occasionally, an alignment will have a sequence which consists entirely of gaps (these are actually trimmed sub-alignments; that's why). Each time I read in such an alignment, an error will be raised when the Bio::LocatableSeq object is created for the all-gap sequence (actually, the error comes from the superclass Bio::PrimarySeq). To my way of thinking, an alignment is not invalid if it contains such all-gap sequences, so there shouldn't be an error. This could be done by having Bio::AlignIO::* passing the -nowarnonempty flag when creating the sequence objects. Any thoughts on this? Is there a better way to suppress the warning than changing the behavior of all the AlignIO modules? Dave From cjfields at illinois.edu Fri Aug 14 10:42:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:42:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Dave, Is this using bioperl-live? I recall this being a problem but I thought it was addressed in svn (and soon in the next point release). chris On Aug 14, 2009, at 9:10 AM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists > entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when > the > Bio::LocatableSeq object is created for the all-gap sequence > (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be > done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating > the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning > than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Fri Aug 14 10:44:42 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 14 Aug 2009 16:44:42 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <716af09c0908140744i4447dffg205ec07daeaaa571@mail.gmail.com> Hi Dave, I have observed the same (with bioperl 1.52) for the same reason. It would be nice not to have these errors as also in my view an all-gaps sequence is a sequence. I also found that sometimes parsing such alignments fails when the all-gaps sequence is the last in the alignment (bug 2744, in Bio::LocatableSeq). Regards, Bernd On Fri, Aug 14, 2009 at 4:10 PM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when the > Bio::LocatableSeq object is created for the all-gap sequence (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Aug 14 11:12:35 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 17:12:35 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Message-ID: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> > > Is this using bioperl-live? Sorry, should've said before. Yes, it's bioperl-live (r15927). I recall this being a problem but I thought it was addressed in svn (and > soon in the next point release). Hmm, the only recent somewhat related change I see (in Bio::AlignIO::*, anyway) is: ------------------------------------------------------------------------ r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 lines deprecate no_sequences/no_residues in main trunk (we can switch the version to 1.7 if deemed necessary) ------------------------------------------------------------------------ Perhaps this is what you were thinking of? Dave From cjfields at illinois.edu Fri Aug 14 11:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <168012.97676.qm@web30405.mail.mud.yahoo.com> References: <168012.97676.qm@web30405.mail.mud.yahoo.com> Message-ID: Yee Man, I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 64-bit) and on dev.open-bio.org (which is perl 5.8.8, appears to be 32-bit). The patch results in cleaning up warnings for 5.10.0 but results in similar warnings for 5.8.8 (linux or OS X). On OS X perl 5.8.8, this sometimes passes (note the first attempt fails, the second succeeds), so it's not entirely a 32-bit issue: http://gist.github.com/167860 OS X and perl 5.10.0, this always fails as the previous gist shows, but demonstrates similar behavior (multiple attempts to test get different responses): http://gist.github.com/167542 On linux, everything passes with or w/o the patched files (patched files have warnings as indicated above): Specs for all three perl executables (they vary a bit): http://gist.github.com/167883 chris On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > Ah.. I find that the typemap can become as simple as this > ===================== > TYPEMAP > HMM * T_PTROBJ > ===================== > > Then the generated HMM.c will have a function called INT2PTR to do > the pointer conversion. I believe this should solve the warnings. > > Attached are the updated HMM.xs and typemap. Can someone with a 64- > bit machine give it a try? > > Thank you > Yee Man > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" > >, "BioPerl List" >> Date: Thursday, August 13, 2009, 5:31 PM >> (just to point out to everyone, Yee >> Man's contact information was in the POD) >> >> Yee Man, >> >> I have the output in the below link: >> >> http://gist.github.com/167542 >> >> There are similar problems popping up on 32- and 64-bit >> perl 5.10.0, Mac OS X 10.5. Haven't had time to debug >> it unfortunately. >> >> I think we should seriously consider spinning this code off >> into it's own distribution for CPAN. It's >> unfortunately bit-rotting away in bioperl-ext. If you >> want to continue supporting it I can help set that up. >> >> chris >> >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >> >>> Hi >>> >>> So is this an HMM only problem? Or does >> it apply to other bioperl-ext modules? >>> >>> What exactly are the compilation errors >> for HMM? I believe my implementation is just a simple one >> based on Rabiner's paper. >>> >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>> ~murphyk%2FBayes >>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>> >>> I don't think I did anything fancy that >> makes it machine dependent or non-ANSI C. >>> >>> Yee Man >>> >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Jonny Dalzell" , >> "BioPerl List" , >> "Yee Man Chan" >>>> Date: Thursday, August 13, 2009, 3:18 PM >>>> >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >>>> >>>>> Jonny Dalzell wrote: >>>>>> Is it ridiculous of me to expect ubuntu to >> take >>>> care of this for me? How do >>>>>> I go about compiling the HMM? >>>>> Yes. This is a very specialized thing >> that >>>> you're doing, and Ubuntu does not have the >> resources to >>>> package every single thing. >>>>> >>>>> Unfortunately, it looks like bioperl-ext >> package is >>>> not installable under Ubuntu 9.04 anyway, which is >> what I'm >>>> running. For others on this list, if >> somebody is >>>> interested in doing maintaining it, I'd be happy >> to help out >>>> by testing on Debian-based Linux platforms. >> We need to >>>> clarify this package's maintenance status: if >> there is >>>> nobody interested in maintaining it, I would >> recommend that >>>> bioperl-ext be removed from distribution. >> It's not in >>>> anybody's interest to have unmaintained software >> out there >>>> causing confusion. >>>> >>>> I have cc'd Yee Man Chan for this. If there >> isn't a >>>> response or the message bounces, we do one of two >> things: >>>> >>>> 1) consider it deprecated (probably safest). >>>> 2) spin it out into a separate module. >>>> >>>> Just tried to comile it myself and am getting >> errors (using >>>> 64bit perl 5.10), so I think, unless someone wants >> to take >>>> this on, option #1 is best. >>>> >>>>> So Jonny, in short, I would say "do not use >>>> bioperl-ext". >>>> >>>> In general, that's a safe bet. We're moving >> most of >>>> our C/C++ bindings to BioLib. >>>> >>>>> Step back. What are you trying to >>>> accomplish? Chris already recommended some >> alternative >>>> methods in his email of 8/11 on this >> subject. Perhaps >>>> we can guide you to some software that is >> actively >>>> maintained and will meet your needs. >>>>> >>>>> Rob >>>> >>>> Exactly. Lots of other (better supported!) >> options >>>> out there. HMMER, SeqAn, and others. >>>> >>>> chris >>>> >>> >>> >>> >> >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Aug 14 11:53:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:53:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> Message-ID: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> On Aug 14, 2009, at 10:12 AM, Dave Messina wrote: > Is this using bioperl-live? > > Sorry, should've said before. Yes, it's bioperl-live (r15927). > > > I recall this being a problem but I thought it was addressed in svn > (and soon in the next point release). > > Hmm, the only recent somewhat related change I see (in > Bio::AlignIO::*, anyway) is: > > ------------------------------------------------------------------------ > r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 > lines > > deprecate no_sequences/no_residues in main trunk (we can switch the > version to 1.7 if deemed necessary) > ------------------------------------------------------------------------ > > > Perhaps this is what you were thinking of? > > Dave Maybe not, then (for some reason I thought this was fixed within LocatableSeq). I know that it is possible to have an all-gap LocatableSeq; this works, but the default start/end/length aren't correct, which is part of Bernd's bug: use Modern::Perl; use Bio::LocatableSeq; my $seq = Bio::LocatableSeq->new( -seq => '-------------', -alphabet => 'dna', ); say $seq->start; # 1 say $seq->end; # undef (?) say $seq->length; # 13, counts the gaps The problem is, to fix all this relies on a whole slew of refactors for LocatableSeq and SimpleAlign. Some of this touches root components as well, so it'll need to be tried on a branch and will very likely result in some API changes (and thus may not be included in 1.6). I'll start a branch to get the process started. chris From jncline at gmail.com Fri Aug 14 15:41:21 2009 From: jncline at gmail.com (Jonathan Cline) Date: Fri, 14 Aug 2009 14:41:21 -0500 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: <99E27D08408340B9B0611751A17DF266@NewLife> References: <99E27D08408340B9B0611751A17DF266@NewLife> Message-ID: <4A85BDE1.5020002@gmail.com> Mark A. Jensen wrote: > Sorry, I cut off the last script. The entire thing follows: > This is exactly what I was looking for - thanks. A method to modify Makefile.PL, install in Activestate, etc is great. Perhaps your method could also be improved for portability by using `cygpath` although few cygwin installs modify this beyond the default (to get rid of hardcoded "/cygdrive/x/"). I will definitely save your code for later. I've implemented another workaround, which is to use Win32::Pipe and other Win32:: methods. This has problems of it's own (support is not 100%) and error-free implementation not as easy as requiring Activestate Perl, however it should work with both Activestate and cygwin-perl (and Unix). ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## > ----- Original Message ----- From: "Jonathan Cline" > To: > Cc: > Sent: Friday, July 31, 2009 11:24 PM > Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl > > >> I recently mentioned working on Bio::Robotics for Tecan. Vendors >> being MS-Win specific, the vendor software allows third-party software >> communication through a named pipe (the literal filename is >> "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific >> and this pseudo-pipe is opened with sysopen() ). This is broken under >> cygwin-perl due to cygwin's method of handling paths -- the sysopen >> fails. However it works under ActiveState Perl and communication >> through the named pipe (to the robot hardware) is OK. The standard >> workaround is usually to use cygwin bash, and force the PATH to use >> ActiveState perl. (Typical MS Windows incompatibility problem.) The >> issue is: Perl module libraries for CPAN work under cygwin-perl >> (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN >> module use, or "make test", result in a bad list of incompatibility >> problems. Yet ActiveState Perl is required for communicating to the >> vendor application (unless there is some workaround to raw filesystem >> access in cygwin-perl that I haven't found in 2 days of working this). >> The stand-alone scripts I have work fine to access the named pipe >> (using ActiveState Perl) since the standalone scripts have no module >> INC dependencies, no CPAN module test harness, etc etc. >> >> This isn't specifically a Bio:: issue, though if anyone has >> suggestions please email. I could try msys and see if it handles the >> named-pipe-special-file better, if msys has an msys-perl distribution. >> >> -- >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Fri Aug 14 19:29:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 18:29:43 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring Message-ID: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> As we have pretty much everything in place for another point release (which I will start merging over this weekend into the 1.6 branch), I have gone ahead and made two branches for refactoring some of the more important pieces of bioperl code. Both refactors may require API changes; if so these will be part of a 1.7 release. 1) GFF - entail refactoring bioperl code to better handle GFF2/3. This is a large section of code, so small incremental changes may be merged to trunk over time (and thus may involve several branches). Included is refactoring of feature typing to be more consistent and lightweight, and will initially involve Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be deprecated in the process). See the following for additional details: http://www.bioperl.org/wiki/GFF_Refactor 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to address significant bugs but will also entail cleaning up SimpleAlign methods (factoring out more utility-like methods into Bio::Align::AlignUtils or similar). This also may involve several branches. See the following for additional details: http://www.bioperl.org/wiki/Align_Refactor Any help/suggestions for the above two would be greatly appreciated! Robert Buels may be heading up the initial FeatureIO work; I will likely start on LocatableSeq/Align (Mark, wanna help?). chris From maj at fortinbras.us Fri Aug 14 19:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 19:45:01 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Hey Chris et al, I'm there on LocatableSeq, definitely. I do have one project to finish this weekend before I move to that: I'm planning to move Chase Miller's excellent NeXML read/write implementation into the trunk, complete with tests. If we can get it to pass the test suite, is there room in the point release for it? MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, August 14, 2009 7:29 PM Subject: [Bioperl-l] GFF and LocatableSeq refactoring > As we have pretty much everything in place for another point release > (which I will start merging over this weekend into the 1.6 branch), I > have gone ahead and made two branches for refactoring some of the more > important pieces of bioperl code. Both refactors may require API > changes; if so these will be part of a 1.7 release. > > 1) GFF - entail refactoring bioperl code to better handle GFF2/3. > > This is a large section of code, so small incremental changes may be > merged to trunk over time (and thus may involve several branches). > Included is refactoring of feature typing to be more consistent and > lightweight, and will initially involve Bio::FeatureIO and > Bio::SeqFeature::Annotated (which may be deprecated in the process). > See the following for additional details: > > http://www.bioperl.org/wiki/GFF_Refactor > > 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI > (SimpleAlign) and LocatableSeq. This is primarily to address > significant bugs but will also entail cleaning up SimpleAlign methods > (factoring out more utility-like methods into Bio::Align::AlignUtils > or similar). This also may involve several branches. See the > following for additional details: > > http://www.bioperl.org/wiki/Align_Refactor > > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will > likely start on LocatableSeq/Align (Mark, wanna help?). > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Fri Aug 14 19:50:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 Aug 2009 16:50:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <4A85F83A.30800@cornell.edu> Chris Fields wrote: > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will likely > start on LocatableSeq/Align (Mark, wanna help?). Sure, I'll head up the gff_refactor branch work. If you're interested in what changes are being planned for Bio::SeqFeature::*, Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the implementation plan Chris and I developed just now on IRC, which is at http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan Now soliciting suggestions, comments, and assistance. Rob From cjfields at illinois.edu Fri Aug 14 21:03:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 20:03:41 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Mark, re: NeXML, yes, of course. There'll be an alpha release or two prior to core 1.6.1 (I need to test the Build.PL/Bio::Root::Build changes Sendu added in). chris On Aug 14, 2009, at 6:45 PM, Mark A. Jensen wrote: > Hey Chris et al, I'm there on LocatableSeq, definitely. I do have > one project to finish this weekend before I move to that: I'm > planning to move Chase Miller's > excellent NeXML read/write implementation into the trunk, complete > with tests. If we can get it to pass the test suite, is there room > in the point release for it? > MAJ > ----- Original Message ----- From: "Chris Fields" > > To: "BioPerl List" > Sent: Friday, August 14, 2009 7:29 PM > Subject: [Bioperl-l] GFF and LocatableSeq refactoring > > >> As we have pretty much everything in place for another point >> release (which I will start merging over this weekend into the 1.6 >> branch), I have gone ahead and made two branches for refactoring >> some of the more important pieces of bioperl code. Both refactors >> may require API changes; if so these will be part of a 1.7 release. >> 1) GFF - entail refactoring bioperl code to better handle GFF2/3. >> This is a large section of code, so small incremental changes may >> be merged to trunk over time (and thus may involve several >> branches). Included is refactoring of feature typing to be more >> consistent and lightweight, and will initially involve >> Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be >> deprecated in the process). See the following for additional >> details: >> http://www.bioperl.org/wiki/GFF_Refactor >> 2) Align/LocatableSeq - dealing with inconsistencies in >> Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to >> address significant bugs but will also entail cleaning up >> SimpleAlign methods (factoring out more utility-like methods into >> Bio::Align::AlignUtils or similar). This also may involve several >> branches. See the following for additional details: >> http://www.bioperl.org/wiki/Align_Refactor >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Fri Aug 14 22:32:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 22:32:01 -0400 Subject: [Bioperl-l] on BP documentation Message-ID: <1F899AA92F94415186CB0B25306F1114@NewLife> Hi All -- Off-list, an old colleague of mine had this insightful, if damning, comment: >I guess that from my perspective, after doing this stuff for >about 10 years, I personally would prefer to see a "summer of >documentation" for the bio* languages (or at least bioperl, as that is >the only one I ever look at). From my own experiences, and from those >of many colleagues, the documentation for bioperl has gone from >mediocre to quite poor in the last few years. I largely think the >wikification of the docs are to blame for this. Even SeqIO is hard >to figure out now--it took me an hour the other day to figure out that >"desc" returns the full Fasta header, and I had to get that from the >module code + trial-and-error, instead of the online docs. There is >far too much inside baseball going on in the documentation scheme. >So I worry more about the constant adding of features at the expense >of documenting what is already there. This is just my 2 cents, and it >is disappointing to see a downward trend for bioperl in this regard. I would be really interested in all responses from the list users. I must agree that BP docs are rather a rat's nest and of varying quality, but taken in toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount of useful and sophisticated information available. I think there are approaches we can take to reorganize and standardize the accession of it to make it more useful and inviting. I disagree with my pal about the wikification, but I wager that the power of the wiki could be leveraged to greater advantage (right, Dan?). I think that what we all as developers love is to code, and detest is to document. Since BP is all-volunteer, and volunteers tend to do what they like -- the beauty of open source, btw -- documentation reorg and cleanup probably must devolve to the Core. I am willing to lead such an effort, which will take some time, and more time the fewer volunteers there are. First let's hear some thoughts, and 'let it all hang out', as they said in my mom's era. cheers Mark From cjfields at illinois.edu Fri Aug 14 23:41:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 22:41:10 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> On Aug 14, 2009, at 9:32 PM, Mark A. Jensen wrote: > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >> I guess that from my perspective, after doing this stuff for >> about 10 years, I personally would prefer to see a "summer of >> documentation" for the bio* languages (or at least bioperl, as that >> is >> the only one I ever look at). From my own experiences, and from >> those >> of many colleagues, the documentation for bioperl has gone from >> mediocre to quite poor in the last few years. I largely think the >> wikification of the docs are to blame for this. Even SeqIO is hard >> to figure out now--it took me an hour the other day to figure out >> that >> "desc" returns the full Fasta header, and I had to get that from the >> module code + trial-and-error, instead of the online docs. There is >> far too much inside baseball going on in the documentation scheme. > >> So I worry more about the constant adding of features at the expense >> of documenting what is already there. This is just my 2 cents, and >> it >> is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I > must agree > that BP docs are rather a rat's nest and of varying quality, but > taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal > about the > wikification, but I wager that the power of the wiki could be > leveraged > to greater advantage (right, Dan?). To me good documentation should be a combination of both wiki docs (HOWTOs, scraps, cookbook-y code) and inline POD. We can't forsake one for the other. If I had a preference, I would take more up-to- date POD over wiki (maybe adding a Status: for the methods), but a good HOWTO goes a long way in helping. It's just too hard to cover every use case. It's unfortunate that documentation is very poor for many modules, but at the same time it's also exceptionally hard to write documentation for modules one has had no part in developing. I think this is the main reason the docs are in the state they are in (not to point the finger of blame at anyone, I'm just as much to blame). > I think that what we all as developers love is to code, and detest > is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it > all hang out', > as they said in my mom's era. > > cheers > Mark Two things: 1) Take advantage of the proposed restructuring effort (as well as some of the refactoring are doing) to add decent documentation where possible. This means updating method docs and updating the HOWTO's as needed, or adding new HOWTO's (Jason has indicated this in the past). 2) Pinpoint areas where docs are desperately needed first. Other wiki docs could also use updating. As an example, the above author's question on FASTA and desc() is actually answered in the FAQ, but the question doesn't make it easy to find: http://www.bioperl.org/wiki/FAQ#I_would_like_to_make_my_own_custom_fasta_header_-_how_do_I_do_this.3F chris From David.Messina at sbc.su.se Sat Aug 15 03:49:59 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 09:49:59 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <628aabb70908150049h64f83b8ewb30d916f0534e40d@mail.gmail.com> > > To me good documentation should be a combination of both wiki docs (HOWTOs, > scraps, cookbook-y code) and inline POD. We can't forsake one for the > other. > I think this notion is already kinda there de facto (inside baseball? :)), but perhaps we should make clear the idea that: - POD is the reference manual, with each method's capabilities described comprehensively and in detail. - The wiki is tutorials (bptutorial, Jason's slides), use cases (HOWTOs and Scrapbook), and FAQ And actually all the POD is accessible online from the wiki at doc.bioperl.org, too (although maybe a little hard to find -- it's under Developer--API Docs). > If I had a preference, I would take more up-to-date POD over wiki (maybe > adding a Status: for the methods), but a good HOWTO goes a long way in > helping. It's just too hard to cover every use case. > I'd agree with this, too, partly because I think the HOWTOs are in pretty good shape, covering the most common stuff pretty well, and partly because I think the reference manual has to be complete, both for a user coming to find out how to use it and for authors ensuring that their internal model of how the code works actually hangs together. Mark, one attack point for a documentation improvement effort would be to take a survey of the PODs and see how well they are fulfilling the role of a reference manual. But part of a good reference manual is knowing how to find what you're looking for, and indeed I think that's maybe the main overall problem with trying to document anything as big and complicated as BioPerl. So for me, the organization of our copious docs might benefit from some attention. The goal of providing a way to find information better handled by the wiki, which does searching and crossreferencing much better than POD. To take your friend's FASTA header example, I might expect to be able to search for 'FASTA' or 'FASTA header' on the wiki and find something which guides me to the answer. A search for 'FASTA' gives a list of pointers, including the 'FASTA sequence format' page. That page almost gives the right answer (see the Note section), but perhaps it might be a nice place to say that in BioPerl, a FASTA sequence is a Bio::Seq, and that the header is $seq->desc and the seq is $seq->seq. And there could be an equivalent page for the other common formats, breaking down how the format maps to an object. [...] it's also exceptionally hard to write documentation for modules one > has had no part in developing. I think this is the main reason the docs are > in the state they are in (not to point the finger of blame at anyone, I'm > just as much to blame). Absolutely, and maybe a first step would be to contact the authors of a module with out-of-date docs and ask for them to fix it, in the same way one would go to the author with a bug in their code. Core+volunteers will certainly be needed for organizing the effort and assessing the state of BioPerl documentation as a whole, but give authors the opportunity to take care of their code, too. Two things: > > 1) Take advantage of the proposed restructuring effort (as well as some of > the refactoring are doing) to add decent documentation where possible. This > means updating method docs and updating the HOWTO's as needed, or adding new > HOWTO's (Jason has indicated this in the past). > This is a great idea. > 2) Pinpoint areas where docs are desperately needed first. > > Other wiki docs could also use updating. As an example, the above author's > question on FASTA and desc() is actually answered in the FAQ, Absolutely. Maybe some of the FAQs could actually be added back to the relevant PODs? Dave From David.Messina at sbc.su.se Sat Aug 15 04:00:50 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 10:00:50 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> Message-ID: <628aabb70908150100ka8c21aahe2bf7d636fa94112@mail.gmail.com> > > I know that it is possible to have an all-gap LocatableSeq You can, but to avoid the "can't guess alphabet" error I'm getting you have to set the alphabet manually (which AlignIO does not). I'll start a branch to get the process started. Terrific! In the meantime, then, I'll just use the -nowarnonempty workaround in my local copy of AlignIO. Dave From bernd.web at gmail.com Sat Aug 15 07:17:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Sat, 15 Aug 2009 13:17:44 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Hi >>? Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. I was a bit surprised about $seq->desc retrieving the entire FASTA header line Actually, in Bioperl 1.52 at least $seq->desc returns the description only, so without the ID. Thus, to get the entire FASTA header line $seq->id . " " $seq->desc would be needed. For the modules I use (mainly related to sequences, such as SeqIO, SimpleAlign), I'd be happy to contribute on docs, checking docs, or examples. Regards, Bernd From sanjaysingh765 at gmail.com Sat Aug 15 09:38:18 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Sat, 15 Aug 2009 19:08:18 +0530 Subject: [Bioperl-l] BLINK PARSER Message-ID: Hi, I want to submit query to NCBI'S BLINK and parsed the result for the best hit. is there anyone have script to do so.i would be very grateful if someone would like to share it with me. regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From jimhu at tamu.edu Sat Aug 15 11:01:15 2009 From: jimhu at tamu.edu (Jim Hu) Date: Sat, 15 Aug 2009 10:01:15 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? Message-ID: Over on the Gbrowse list, Don Gilbert explained to me why genbank2gff3.pl is having problems with prokaryotic genomes. Has anyone written an alternative? Jim Hu ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Sat Aug 15 11:27:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:27:01 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: References: Message-ID: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> We (bioperl devs and users) would be very interested to have something like this included. I ran into a similar problem with genbank2gff3 a year ago with some of our work here on Archaea. I managed to get enough data out to get gbrowse up-and-running, but it required quite a bit of hand-editing. In fact, seeing as we're refactoring GFF and other aspects of Features in bioperl, this may be the best time to add something in. chris On Aug 15, 2009, at 10:01 AM, Jim Hu wrote: > Over on the Gbrowse list, Don Gilbert explained to me why > genbank2gff3.pl is having problems with prokaryotic genomes. Has > anyone written an alternative? > > Jim Hu > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 15 11:55:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:55:44 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Message-ID: On Aug 15, 2009, at 6:17 AM, Bernd Web wrote: > Hi > >>> Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. > I was a bit surprised about $seq->desc retrieving the entire FASTA > header line > Actually, in Bioperl 1.52 at least $seq->desc returns the description > only, so without the ID. Thus, to get the entire FASTA header line > $seq->id . " " $seq->desc would be needed. Odd, not seeing where a change was made that would cause this behavior. Can you post an example? > For the modules I use (mainly related to sequences, such as SeqIO, > SimpleAlign), I'd be happy to contribute on docs, checking docs, or > examples. > > Regards, > Bernd Would be nice to have an Align/SimpleAlign HOWTO, but seeing as we want to refactor large chunks of that code, it might be slightly premature. That is, unless we want to document what behavior we expect to see as a sort of ROADMAP (maybe as part of the refactoring page). That could then be converted over to a HOWTO. Feel free to chip in on this in any way possible. The more documentation the better. chris From rmb32 at cornell.edu Sat Aug 15 12:44:03 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 09:44:03 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <85143.35343.qm@web30404.mail.mud.yahoo.com> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A86E5D3.3030906@cornell.edu> The usual procedure for developing code is to exchange code via commits to a version control system. Yee, do you know how to use Subversion? Does Yee need a commit bit? Rob Yee Man Chan wrote: > Hi Chris > > I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) > > Please let me know if it works for you. > > Sorry for the bug... > Yee Man > > --- On Fri, 8/14/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" >> Date: Friday, August 14, 2009, 8:31 AM >> Yee Man, >> >> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >> appears to be 32-bit). The patch results in cleaning >> up warnings for 5.10.0 but results in similar warnings for >> 5.8.8 (linux or OS X). >> >> On OS X perl 5.8.8, this sometimes passes (note the first >> attempt fails, the second succeeds), so it's not entirely a >> 32-bit issue: >> >> http://gist.github.com/167860 >> >> OS X and perl 5.10.0, this always fails as the previous >> gist shows, but demonstrates similar behavior (multiple >> attempts to test get different responses): >> >> http://gist.github.com/167542 >> >> On linux, everything passes with or w/o the patched files >> (patched files have warnings as indicated above): >> >> Specs for all three perl executables (they vary a bit): >> >> http://gist.github.com/167883 >> >> chris >> >> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >> >>> Ah.. I find that the typemap can become as simple as >> this >>> ===================== >>> TYPEMAP >>> HMM * T_PTROBJ >>> ===================== >>> >>> Then the generated HMM.c will have a function called >> INT2PTR to do the pointer conversion. I believe this should >> solve the warnings. >>> Attached are the updated HMM.xs and typemap. Can >> someone with a 64-bit machine give it a try? >>> Thank you >>> Yee Man >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>> Date: Thursday, August 13, 2009, 5:31 PM >>>> (just to point out to everyone, Yee >>>> Man's contact information was in the POD) >>>> >>>> Yee Man, >>>> >>>> I have the output in the below link: >>>> >>>> http://gist.github.com/167542 >>>> >>>> There are similar problems popping up on 32- and >> 64-bit >>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >> to debug >>>> it unfortunately. >>>> >>>> I think we should seriously consider spinning this >> code off >>>> into it's own distribution for CPAN. It's >>>> unfortunately bit-rotting away in >> bioperl-ext. If you >>>> want to continue supporting it I can help set that >> up. >>>> chris >>>> >>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>> >>>>> Hi >>>>> >>>>> So is this an HMM only >> problem? Or does >>>> it apply to other bioperl-ext modules? >>>>> What exactly are the >> compilation errors >>>> for HMM? I believe my implementation is just a >> simple one >>>> based on Rabiner's paper. >>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>> >>>>> I don't think I did >> anything fancy that >>>> makes it machine dependent or non-ANSI C. >>>>> Yee Man >>>>> >>>>> --- On Thu, 8/13/09, Chris Fields >>>> wrote: >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Jonny Dalzell" , >>>> "BioPerl List" , >>>> "Yee Man Chan" >>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>> >>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >> wrote: >>>>>>> Jonny Dalzell wrote: >>>>>>>> Is it ridiculous of me to expect >> ubuntu to >>>> take >>>>>> care of this for me? How do >>>>>>>> I go about compiling the HMM? >>>>>>> Yes. This is a very specialized >> thing >>>> that >>>>>> you're doing, and Ubuntu does not have >> the >>>> resources to >>>>>> package every single thing. >>>>>>> Unfortunately, it looks like >> bioperl-ext >>>> package is >>>>>> not installable under Ubuntu 9.04 anyway, >> which is >>>> what I'm >>>>>> running. For others on this list, >> if >>>> somebody is >>>>>> interested in doing maintaining it, I'd be >> happy >>>> to help out >>>>>> by testing on Debian-based Linux >> platforms. >>>> We need to >>>>>> clarify this package's maintenance status: >> if >>>> there is >>>>>> nobody interested in maintaining it, I >> would >>>> recommend that >>>>>> bioperl-ext be removed from distribution. >>>> It's not in >>>>>> anybody's interest to have unmaintained >> software >>>> out there >>>>>> causing confusion. >>>>>> >>>>>> I have cc'd Yee Man Chan for this. >> If there >>>> isn't a >>>>>> response or the message bounces, we do one >> of two >>>> things: >>>>>> 1) consider it deprecated (probably >> safest). >>>>>> 2) spin it out into a separate module. >>>>>> >>>>>> Just tried to comile it myself and am >> getting >>>> errors (using >>>>>> 64bit perl 5.10), so I think, unless >> someone wants >>>> to take >>>>>> this on, option #1 is best. >>>>>> >>>>>>> So Jonny, in short, I would say "do >> not use >>>>>> bioperl-ext". >>>>>> >>>>>> In general, that's a safe bet. We're >> moving >>>> most of >>>>>> our C/C++ bindings to BioLib. >>>>>> >>>>>>> Step back. What are you trying >> to >>>>>> accomplish? Chris already >> recommended some >>>> alternative >>>>>> methods in his email of 8/11 on this >>>> subject. Perhaps >>>>>> we can guide you to some software that is >>>> actively >>>>>> maintained and will meet your needs. >>>>>>> Rob >>>>>> Exactly. Lots of other (better >> supported!) >>>> options >>>>>> out there. HMMER, SeqAn, and >> others. >>>>>> chris >>>>>> >>>>> >>>>> >>>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj at fortinbras.us Sat Aug 15 13:40:26 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 13:40:26 -0400 Subject: [Bioperl-l] BLINK PARSER In-Reply-To: References: Message-ID: <34DBCBEA5E2D49A892E5077AA780BA4E@NewLife> Hi Sanjay- I'm not sure BioPerl has an interface specifically for BLINK (I will be corrected if I'm wrong, so stay tuned). If you can obtain the "raw" blast output for the protein you're interested in ( doing [BLINK] then [Other Views: BLAST] then [Format:Show: Alignment as Plain text] ) that text can be parsed using the Bio::SearchIO tools, and you can use Bio::Search::Tiling to obtain the 'best' hsps. This may not be too helpful, I'm afraid, but it is where I would start. Mark ----- Original Message ----- From: "sanjay singh" To: Sent: Saturday, August 15, 2009 9:38 AM Subject: [Bioperl-l] BLINK PARSER > Hi, > I want to submit query to NCBI'S BLINK and parsed the result for the best > hit. is there anyone have script to do so.i would be very grateful if > someone would like to share it with me. > regards > sanjay > > -- > Happy moments , praise God. > Difficult moments, seek God. > Quiet moments, worship God. > Painful moments, trust God. > Every moment, thank God > > Sanjay Kumar Singh > Bose Institute > 93\1,A.P.C.Road > Kolkata-700 009 > West Bengal > India > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 15 15:11:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 14:11:48 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A86E5D3.3030906@cornell.edu> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> <4A86E5D3.3030906@cornell.edu> Message-ID: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> I'm not sure, but it makes more sense to commit these changes directly. Yee, need us to set you up with a commit bit? If so, fill out the information on this page: http://www.bioperl.org/wiki/SVN_Account_Request and forward it to support at open-bio.org. I'll sponsor you. chris On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > The usual procedure for developing code is to exchange code via > commits to a version control system. Yee, do you know how to use > Subversion? Does Yee need a commit bit? > > Rob > > Yee Man Chan wrote: >> Hi Chris >> I find that there is a memory access bug in my code. Attached is >> the fixed HMM.xs. This file together with the simpler typemap >> should fix all problems. (I hope..) >> Please let me know if it works for you. >> Sorry for the bug... >> Yee Man >> --- On Fri, 8/14/09, Chris Fields wrote: >>> From: Chris Fields >>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >>> WinVista? >>> To: "Yee Man Chan" >>> Cc: "Robert Buels" , "Jonny Dalzell" >> >, "BioPerl List" >>> Date: Friday, August 14, 2009, 8:31 AM >>> Yee Man, >>> >>> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >>> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >>> appears to be 32-bit). The patch results in cleaning >>> up warnings for 5.10.0 but results in similar warnings for >>> 5.8.8 (linux or OS X). >>> >>> On OS X perl 5.8.8, this sometimes passes (note the first >>> attempt fails, the second succeeds), so it's not entirely a >>> 32-bit issue: >>> >>> http://gist.github.com/167860 >>> >>> OS X and perl 5.10.0, this always fails as the previous >>> gist shows, but demonstrates similar behavior (multiple >>> attempts to test get different responses): >>> >>> http://gist.github.com/167542 >>> >>> On linux, everything passes with or w/o the patched files >>> (patched files have warnings as indicated above): >>> >>> Specs for all three perl executables (they vary a bit): >>> >>> http://gist.github.com/167883 >>> >>> chris >>> >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >>> >>>> Ah.. I find that the typemap can become as simple as >>> this >>>> ===================== >>>> TYPEMAP >>>> HMM * T_PTROBJ >>>> ===================== >>>> >>>> Then the generated HMM.c will have a function called >>> INT2PTR to do the pointer conversion. I believe this should >>> solve the warnings. >>>> Attached are the updated HMM.xs and typemap. Can >>> someone with a 64-bit machine give it a try? >>>> Thank you >>>> Yee Man >>>> --- On Thu, 8/13/09, Chris Fields >>> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >>> package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >>> "Jonny Dalzell" , >>> "BioPerl List" >>>>> Date: Thursday, August 13, 2009, 5:31 PM >>>>> (just to point out to everyone, Yee >>>>> Man's contact information was in the POD) >>>>> >>>>> Yee Man, >>>>> >>>>> I have the output in the below link: >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> There are similar problems popping up on 32- and >>> 64-bit >>>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >>> to debug >>>>> it unfortunately. >>>>> >>>>> I think we should seriously consider spinning this >>> code off >>>>> into it's own distribution for CPAN. It's >>>>> unfortunately bit-rotting away in >>> bioperl-ext. If you >>>>> want to continue supporting it I can help set that >>> up. >>>>> chris >>>>> >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> So is this an HMM only >>> problem? Or does >>>>> it apply to other bioperl-ext modules? >>>>>> What exactly are the >>> compilation errors >>>>> for HMM? I believe my implementation is just a >>> simple one >>>>> based on Rabiner's paper. >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>> ~murphyk%2FBayes >>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>> >>>>>> I don't think I did >>> anything fancy that >>>>> makes it machine dependent or non-ANSI C. >>>>>> Yee Man >>>>>> >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >>> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Robert Buels" >>>>>>> Cc: "Jonny Dalzell" , >>>>> "BioPerl List" , >>>>> "Yee Man Chan" >>>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>>> >>>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >>> wrote: >>>>>>>> Jonny Dalzell wrote: >>>>>>>>> Is it ridiculous of me to expect >>> ubuntu to >>>>> take >>>>>>> care of this for me? How do >>>>>>>>> I go about compiling the HMM? >>>>>>>> Yes. This is a very specialized >>> thing >>>>> that >>>>>>> you're doing, and Ubuntu does not have >>> the >>>>> resources to >>>>>>> package every single thing. >>>>>>>> Unfortunately, it looks like >>> bioperl-ext >>>>> package is >>>>>>> not installable under Ubuntu 9.04 anyway, >>> which is >>>>> what I'm >>>>>>> running. For others on this list, >>> if >>>>> somebody is >>>>>>> interested in doing maintaining it, I'd be >>> happy >>>>> to help out >>>>>>> by testing on Debian-based Linux >>> platforms. >>>>> We need to >>>>>>> clarify this package's maintenance status: >>> if >>>>> there is >>>>>>> nobody interested in maintaining it, I >>> would >>>>> recommend that >>>>>>> bioperl-ext be removed from distribution. >>>>> It's not in >>>>>>> anybody's interest to have unmaintained >>> software >>>>> out there >>>>>>> causing confusion. >>>>>>> >>>>>>> I have cc'd Yee Man Chan for this. >>> If there >>>>> isn't a >>>>>>> response or the message bounces, we do one >>> of two >>>>> things: >>>>>>> 1) consider it deprecated (probably >>> safest). >>>>>>> 2) spin it out into a separate module. >>>>>>> >>>>>>> Just tried to comile it myself and am >>> getting >>>>> errors (using >>>>>>> 64bit perl 5.10), so I think, unless >>> someone wants >>>>> to take >>>>>>> this on, option #1 is best. >>>>>>> >>>>>>>> So Jonny, in short, I would say "do >>> not use >>>>>>> bioperl-ext". >>>>>>> >>>>>>> In general, that's a safe bet. We're >>> moving >>>>> most of >>>>>>> our C/C++ bindings to BioLib. >>>>>>> >>>>>>>> Step back. What are you trying >>> to >>>>>>> accomplish? Chris already >>> recommended some >>>>> alternative >>>>>>> methods in his email of 8/11 on this >>>>> subject. Perhaps >>>>>>> we can guide you to some software that is >>>>> actively >>>>>>> maintained and will meet your needs. >>>>>>>> Rob >>>>>>> Exactly. Lots of other (better >>> supported!) >>>>> options >>>>>>> out there. HMMER, SeqAn, and >>> others. >>>>>>> chris >>>>>>> >>>>>> >>>>>> >>>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam >>> protection around >>>> http://mail.yahoo.com >>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From hlapp at gmx.net Sat Aug 15 15:41:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 15:41:56 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: On Aug 14, 2009, at 11:41 PM, Chris Fields wrote: > I would take more up-to-date POD over wiki (maybe adding a Status: > for the methods), but a good HOWTO goes a long way in helping. It's > just too hard to cover every use case. I'd very much second this. An API documentation should arguably be written by the developer(s) and hence I would expect to find in the PODs. Use-cases, however, and how to solve those in BioPerl can and should be contributed by everyone, and the wiki is just way better at facilitating this. As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 15:53:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 15:53:31 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: ----- Original Message ----- From: "Hilmar Lapp" ... > As for the FASTA example, I can understand - I've heard repeatedly > from people that one of the things that they are missing is > documentation for every SeqIO format we support (such as GenBank, > UniProt, FASTA, etc) about where to find a particular piece of the > format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ From hlapp at gmx.net Sat Aug 15 16:09:14 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:09:14 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > I'm planning to move Chase Miller's excellent NeXML read/write > implementation into the trunk, complete with tests. If we can get it > to pass the test suite, is there room in the point release for it? We've in the past stayed away from adding new features to stable branches with the exception of new methods in existing classes and that didn't do anything complicated. I'm not sure I remember everything but I think the NeXML support does exceed that level, doesn't it? Can it be rolled into its own pre- release that is a drop-in to an existing 1.6.x installation for those who want to go there? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 15 16:12:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:12:35 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A85F83A.30800@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: Great! Two suggestions: > ? deprecate the get_Annotations(Str) method in favor of > get_annotation(Str), which adheres better to standard perl method > naming Yes, but also is then inconsistent with existing BioPerl naming, with the method name indicating what type of object you get back (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in Bio::SeqI). > ? finally, split Bio::FeatureIO modules off into their own CPAN > distribution Wouldn't one start with this? -hilmar On Aug 14, 2009, at 7:50 PM, Robert Buels wrote: > Chris Fields wrote: >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). > > Sure, I'll head up the gff_refactor branch work. If you're > interested in what changes are being planned for Bio::SeqFeature::*, > Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the > implementation plan Chris and I developed just now on IRC, which is at > > http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan > > Now soliciting suggestions, comments, and assistance. > > Rob > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Sat Aug 15 16:24:35 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 13:24:35 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <4A871983.4010702@cornell.edu> Hilmar Lapp wrote: > I'm not sure I remember everything but I think the NeXML support does > exceed that level, doesn't it? Can it be rolled into its own pre-release > that is a drop-in to an existing 1.6.x installation for those who want > to go there? So split it out into its own CPAN dist. Rob From maj at fortinbras.us Sat Aug 15 16:36:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 16:36:47 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Yes, I'd say the Nexml support exceeds the 'complicated' test. There are no modifications to existing modules (except for the addition of annotation attributes to members of the Bio::PopGen model, which are don't-cares to anything out there currently). The manifest of a NeXML drop-in would look like Bio/NexmlIO.pm Bio/Nexml/Factory.pm Bio/SeqIO/nexml.pm Bio/AlignIO/nexml.pm Bio/TreeIO/nexml.pm and, if I get it completed, support for arbitrary characters via Bio::PopGen Bio/PopGen/IO/nexml.pm (all based on hacks of Chase's code, btw; we thought it would round out the package nicely...) Of course, the big dependency that not everyone will need or want is Rutger's Bio::Phylo, so the Nexml support will have to be optional even in 1.7, I think. I am adding run-time checks for Bio::Phylo in the modules so they die relatively gracefully and informatively, rather than just barf. Also, the tests will have appropriate skip blocks. I do want to get the code into bioperl-live, however, unless there's a gotcha there I'm not seeing-- cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:09 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > >> I'm planning to move Chase Miller's excellent NeXML read/write >> implementation into the trunk, complete with tests. If we can get it to pass >> the test suite, is there room in the point release for it? > > > We've in the past stayed away from adding new features to stable branches > with the exception of new methods in existing classes and that didn't do > anything complicated. > > I'm not sure I remember everything but I think the NeXML support does exceed > that level, doesn't it? Can it be rolled into its own pre- release that is a > drop-in to an existing 1.6.x installation for those who want to go there? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From hlapp at gmx.net Sat Aug 15 16:49:22 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:49:22 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Message-ID: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > I do want to get the code into bioperl-live, however, unless there's > a gotcha there I'm not seeing-- That sounds great to me, though it may make some of Chris' hair stand on end if he wants this to go into a separate module from the start :) Maybe a phylogenetics module can be carved out that this would become part of? Though I recall someone saying recently that Bio::Species and by extension Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to split out. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 17:07:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 17:07:30 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> Message-ID: <659CA35CE3AD464AA516D18B313311BE@NewLife> I'm all for an attempt to split out phylogenetic stuff, it seems natural, and think in terms of a phylo package dependent upon a sequence package, and if necessary vice versa -- although if the Bio::Species - Bio::Tree::Node connection is relatively loose, perhaps we can refactor to make some attributes/methods optional features that carp when the phylo package is not installed. (Roles, anyone?) However, probably 1.6.x doesn't sound like the place to do that! I myself wouldn't have any problem waiting till 1.7 for 'official' Nexml support--but I hope Chase will chime in on that. What does Chris think? MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:49 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > >> I do want to get the code into bioperl-live, however, unless there's a >> gotcha there I'm not seeing-- > > > That sounds great to me, though it may make some of Chris' hair stand on end > if he wants this to go into a separate module from the start :) Maybe a > phylogenetics module can be carved out that this would become part of? Though > I recall someone saying recently that Bio::Species and by extension > Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to > split out. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From rmb32 at cornell.edu Sat Aug 15 17:23:40 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:23:40 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: <4A87275C.5040300@cornell.edu> Hilmar Lapp wrote: >> ? deprecate the get_Annotations(Str) method in favor of >> get_annotation(Str), which adheres better to standard perl method naming > > Yes, but also is then inconsistent with existing BioPerl naming, with > the method name indicating what type of object you get back > (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in > Bio::SeqI). Blech. OK never mind about the method rename then. > >> ? finally, split Bio::FeatureIO modules off into their own CPAN >> distribution > > Wouldn't one start with this? Yeah....I've kind of been vacillating back and forth about whether it would be best to *start* with this, or to end with this. Probably makes more sense to start with it, since it gives more freedom to add dependencies on more CPAN stuff without worrying too much. Like...oh...I don't know...Moose? Thoughts on this? Rob From rmb32 at cornell.edu Sat Aug 15 17:25:51 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:25:51 -0700 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> Message-ID: <4A8727DF.7000204@cornell.edu> Chris Fields wrote: > In fact, seeing as we're refactoring GFF and other aspects of Features > in bioperl, this may be the best time to add something in. Reading that thread, it sounds like most of the issues revolve around when and how to use the unflattener. Perhaps just adding another command line switch or two to the script would be appropriate? Editorializing a bit, it's really disheartening that Genbank stores features in such a lossy way. Rob From cjfields at illinois.edu Sat Aug 15 22:05:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:05:41 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <241652.96493.qm@web30404.mail.mud.yahoo.com> References: <241652.96493.qm@web30404.mail.mud.yahoo.com> Message-ID: I'm still seeing the same errors on Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl (v5.8.8) passes fine now (as well as perl 5.8.8 on dev.open-bio.org). I'm wondering if this is a problem with my local perl build. I'm very tempted to push the HMM-related code into a separate distribution (bioperl-hmm) and make a CPAN release out of it so it gets wider testing via CPAN testers; it would just require a minimum bioperl 1.6 installation for Bio::Tools::HMM and any related modules. Yee, would that be okay with you? chris On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > I just committed HMM.xs and typemap to SVN. Can you test it to > confirm it works in 64-bit machines? > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 12:11 PM >> I'm not sure, but it makes more sense >> to commit these changes directly. Yee, need us to set >> you up with a commit bit? If so, fill out the >> information on this page: >> >> http://www.bioperl.org/wiki/SVN_Account_Request >> >> and forward it to support at open-bio.org. >> I'll sponsor you. >> >> chris >> >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >> >>> The usual procedure for developing code is to exchange >> code via commits to a version control system. Yee, do >> you know how to use Subversion? Does Yee need a commit bit? >>> >>> Rob >>> >>> Yee Man Chan wrote: >>>> Hi Chris >>>> I find that there is a memory >> access bug in my code. Attached is the fixed HMM.xs. This >> file together with the simpler typemap should fix all >> problems. (I hope..) >>>> Please let me know if it works >> for you. >>>> Sorry for the bug... >>>> Yee Man >>>> --- On Fri, 8/14/09, Chris Fields >> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>>> Date: Friday, August 14, 2009, 8:31 AM >>>>> Yee Man, >>>>> >>>>> I tested this out locally (perl 5.8.8 32-bit, >> perl 5.10.0 >>>>> 64-bit) and on dev.open-bio.org (which is perl >> 5.8.8, >>>>> appears to be 32-bit). The patch results >> in cleaning >>>>> up warnings for 5.10.0 but results in similar >> warnings for >>>>> 5.8.8 (linux or OS X). >>>>> >>>>> On OS X perl 5.8.8, this sometimes passes >> (note the first >>>>> attempt fails, the second succeeds), so it's >> not entirely a >>>>> 32-bit issue: >>>>> >>>>> http://gist.github.com/167860 >>>>> >>>>> OS X and perl 5.10.0, this always fails as the >> previous >>>>> gist shows, but demonstrates similar behavior >> (multiple >>>>> attempts to test get different responses): >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> On linux, everything passes with or w/o the >> patched files >>>>> (patched files have warnings as indicated >> above): >>>>> >>>>> Specs for all three perl executables (they >> vary a bit): >>>>> >>>>> http://gist.github.com/167883 >>>>> >>>>> chris >>>>> >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan >> wrote: >>>>> >>>>>> Ah.. I find that the typemap can become as >> simple as >>>>> this >>>>>> ===================== >>>>>> TYPEMAP >>>>>> HMM * T_PTROBJ >>>>>> ===================== >>>>>> >>>>>> Then the generated HMM.c will have a >> function called >>>>> INT2PTR to do the pointer conversion. I >> believe this should >>>>> solve the warnings. >>>>>> Attached are the updated HMM.xs and >> typemap. Can >>>>> someone with a 64-bit machine give it a try? >>>>>> Thank you >>>>>> Yee Man >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>>> "Jonny Dalzell" , >>>>> "BioPerl List" >>>>>>> Date: Thursday, August 13, 2009, 5:31 >> PM >>>>>>> (just to point out to everyone, Yee >>>>>>> Man's contact information was in the >> POD) >>>>>>> >>>>>>> Yee Man, >>>>>>> >>>>>>> I have the output in the below link: >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> There are similar problems popping up >> on 32- and >>>>> 64-bit >>>>>>> perl 5.10.0, Mac OS X 10.5. >> Haven't had time >>>>> to debug >>>>>>> it unfortunately. >>>>>>> >>>>>>> I think we should seriously consider >> spinning this >>>>> code off >>>>>>> into it's own distribution for >> CPAN. It's >>>>>>> unfortunately bit-rotting away in >>>>> bioperl-ext. If you >>>>>>> want to continue supporting it I can >> help set that >>>>> up. >>>>>>> chris >>>>>>> >>>>>>> On Aug 13, 2009, at 6:58 PM, Yee Man >> Chan wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> So is this >> an HMM only >>>>> problem? Or does >>>>>>> it apply to other bioperl-ext >> modules? >>>>>>>> What >> exactly are the >>>>> compilation errors >>>>>>> for HMM? I believe my implementation >> is just a >>>>> simple one >>>>>>> based on Rabiner's paper. >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>> ~murphyk%2FBayes >>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>> >>>>>>>> I don't >> think I did >>>>> anything fancy that >>>>>>> makes it machine dependent or non-ANSI >> C. >>>>>>>> Yee Man >>>>>>>> >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Robert Buels" >>>>>>>>> Cc: "Jonny Dalzell" , >>>>>>> "BioPerl List" , >>>>>>> "Yee Man Chan" >>>>>>>>> Date: Thursday, August 13, >> 2009, 3:18 PM >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 4:37 PM, >> Robert Buels >>>>> wrote: >>>>>>>>>> Jonny Dalzell wrote: >>>>>>>>>>> Is it ridiculous of me >> to expect >>>>> ubuntu to >>>>>>> take >>>>>>>>> care of this for me? How >> do >>>>>>>>>>> I go about compiling >> the HMM? >>>>>>>>>> Yes. This is a very >> specialized >>>>> thing >>>>>>> that >>>>>>>>> you're doing, and Ubuntu does >> not have >>>>> the >>>>>>> resources to >>>>>>>>> package every single thing. >>>>>>>>>> Unfortunately, it looks >> like >>>>> bioperl-ext >>>>>>> package is >>>>>>>>> not installable under Ubuntu >> 9.04 anyway, >>>>> which is >>>>>>> what I'm >>>>>>>>> running. For others on >> this list, >>>>> if >>>>>>> somebody is >>>>>>>>> interested in doing >> maintaining it, I'd be >>>>> happy >>>>>>> to help out >>>>>>>>> by testing on Debian-based >> Linux >>>>> platforms. >>>>>>> We need to >>>>>>>>> clarify this package's >> maintenance status: >>>>> if >>>>>>> there is >>>>>>>>> nobody interested in >> maintaining it, I >>>>> would >>>>>>> recommend that >>>>>>>>> bioperl-ext be removed from >> distribution. >>>>>>> It's not in >>>>>>>>> anybody's interest to have >> unmaintained >>>>> software >>>>>>> out there >>>>>>>>> causing confusion. >>>>>>>>> >>>>>>>>> I have cc'd Yee Man Chan for >> this. >>>>> If there >>>>>>> isn't a >>>>>>>>> response or the message >> bounces, we do one >>>>> of two >>>>>>> things: >>>>>>>>> 1) consider it deprecated >> (probably >>>>> safest). >>>>>>>>> 2) spin it out into a separate >> module. >>>>>>>>> >>>>>>>>> Just tried to comile it myself >> and am >>>>> getting >>>>>>> errors (using >>>>>>>>> 64bit perl 5.10), so I think, >> unless >>>>> someone wants >>>>>>> to take >>>>>>>>> this on, option #1 is best. >>>>>>>>> >>>>>>>>>> So Jonny, in short, I >> would say "do >>>>> not use >>>>>>>>> bioperl-ext". >>>>>>>>> >>>>>>>>> In general, that's a safe >> bet. We're >>>>> moving >>>>>>> most of >>>>>>>>> our C/C++ bindings to BioLib. >>>>>>>>> >>>>>>>>>> Step back. What are >> you trying >>>>> to >>>>>>>>> accomplish? Chris >> already >>>>> recommended some >>>>>>> alternative >>>>>>>>> methods in his email of 8/11 >> on this >>>>>>> subject. Perhaps >>>>>>>>> we can guide you to some >> software that is >>>>>>> actively >>>>>>>>> maintained and will meet your >> needs. >>>>>>>>>> Rob >>>>>>>>> Exactly. Lots of other >> (better >>>>> supported!) >>>>>>> options >>>>>>>>> out there. HMMER, SeqAn, >> and >>>>> others. >>>>>>>>> chris >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >> __________________________________________________ >>>>>> Do You Yahoo!? >>>>>> Tired of spam? Yahoo! Mail has the >> best spam >>>>> protection around >>>>>> http://mail.yahoo.com >>>>> >> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> >>> >>> --Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >> >> > > > From cjfields at illinois.edu Sat Aug 15 22:49:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:49:25 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <659CA35CE3AD464AA516D18B313311BE@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> Message-ID: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> On Aug 15, 2009, at 4:07 PM, Mark A. Jensen wrote: > I'm all for an attempt to split out phylogenetic stuff, it > seems natural, and think in terms of a phylo package > dependent upon a sequence package, and if necessary > vice versa -- although if the Bio::Species - Bio::Tree::Node > connection is relatively loose, perhaps we can refactor to > make some attributes/methods optional features that carp > when the phylo package is not installed. (Roles, anyone?) I'm pretty sure they're linked very tightly (Species is-a Bio::Taxon is-a Bio::Tree::Node). This may be something Sendu needs to chime in on; he refactored much of that code prior to 1.5.2. As a suggestion, maybe we can use a combined strategy: fall back to a very simple Bio::Species container class if a bioperl-phylo isn't installed, but utilize Bio::Taxon when it is. > However, probably 1.6.x doesn't sound like the place to > do that! I myself wouldn't have any problem waiting till > 1.7 for 'official' Nexml support--but I hope Chase will chime > in on that. What does Chris think? > MAJ Robert's suggestion of a separate distribution makes sense; it may be one avenue of slowly migrating out phylo-specific code into it's own distribution. Not sure about calling it bioperl-phylo (which might be confused with Rutger's Bio::Phylo). chris From cjfields at illinois.edu Sat Aug 15 22:47:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:47:36 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <4A8727DF.7000204@cornell.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> <4A8727DF.7000204@cornell.edu> Message-ID: <81C3E545-4F0E-4B1F-9F06-398D1EE7A3CF@illinois.edu> On Aug 15, 2009, at 4:25 PM, Robert Buels wrote: > Chris Fields wrote: > > In fact, seeing as we're refactoring GFF and other aspects of > Features > > in bioperl, this may be the best time to add something in. > > Reading that thread, it sounds like most of the issues revolve > around when and how to use the unflattener. Perhaps just adding > another command line switch or two to the script would be appropriate? > > Editorializing a bit, it's really disheartening that Genbank stores > features in such a lossy way. > > Rob Just remembered: NCBI does supply GFF3 files for bacterial genomes, but I'm not sure how well they correspond to the GFF3 specification. For example: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Aquifex_aeolicus/NC_000918.gff A quick glance looks okay, but they don't include FASTA sequence. I think much of the problem with NCBI/GenBank has to do with lack of curation on how submissions are made (lots of inconsistencies). I'm not sure how easy they will be to deal with, but the only way we can deal with that is looking at examples of problematic data (IIRC the Sulfolobus solfataricus genome GB file was a mess, so maybe that's worth a look). chris From cjfields at illinois.edu Sun Aug 16 01:38:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 00:38:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <846546.73578.qm@web30404.mail.mud.yahoo.com> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> Message-ID: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Yee, I took the liberty of making a few simple changes to Bio::Tools::HMM in svn to point out the problem and possible solutions. Feel free to revert these as needed. I'm seeing two errors, which appear randomly when running 'make test'. The first is easily fixable, the second, I'm not so sure. I'll let you make the decisions on both. 1) There is an assumption in the module that, when adding floating points, you will always get 1.0. You may run into problems: see 'perldoc -q long decimals'. Lines like this (two places in the module): ... if ($sum != 1.0) { $self->throw("Sum of probabilities for each state must be 1.0; got $sum\n"); } ... won't work as expected (note I added a simple diagnostic, just print out the 'bad' sum). With perl 5.8.8, this appears to work fine, but this is what I get with perl 5.10 (64-bit): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== Initial Probability Array: 0.499978 0.500022 Transition Probability Matrix: 0.499978 0.500022 0.499978 0.500022 Emission Probability Matrix: 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 Log Probability of sequence 1: -521.808 Log Probability of sequence 2: -426.057 Statistical Training ==================== Initial Probability Array: 1 0 Transition Probability Matrix: ------------- EXCEPTION ------------- MSG: Sum of probabilities for each from-state must be 1.0; got 0.999999999999999976 STACK Bio::Tools::HMM::transition_prob /Users/cjfields/bioperl/bioperl- live/Bio/Tools/HMM.pm:499 STACK toplevel test.pl:82 ------------------------------------- make: *** [test_dynamic] Error 255 I'm assuming this needs to simply be rounded up to 1.0. That could be accomplished with something like 'if (sprintf("%.2f", $sum) != 1.0) {...}' 2) The second error is a little stranger. I have been randomly getting this: pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 When I add strict and warnings pragmas to Bio::Tools::HMM (with a little additional cleanup to get things running), I get an additional warning (arrow): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Argument "FL" isn't numeric in numeric lt (<) at /Users/cjfields/ bioperl/bioperl-live/Bio/Tools/HMM.pm line 188. <---- Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 So something is not being converted as expected. chris On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > When are you going to release 1.6? Maybe let me work on it before it > releases. If it doesn't resolve the problem, then we can think about > other alternatives. > > Also, please show me the latest errors you have for 5.10.0. > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 7:05 PM >> I'm still seeing the same errors on >> Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl >> (v5.8.8) passes fine now (as well as perl 5.8.8 on >> dev.open-bio.org). >> >> I'm wondering if this is a problem with my local perl >> build. I'm very tempted to push the HMM-related code >> into a separate distribution (bioperl-hmm) and make a CPAN >> release out of it so it gets wider testing via CPAN testers; >> it would just require a minimum bioperl 1.6 installation for >> Bio::Tools::HMM and any related modules. Yee, would >> that be okay with you? >> >> chris >> >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >> >>> >>> I just committed HMM.xs and typemap to SVN. Can you >> test it to confirm it works in 64-bit machines? >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Yee Man Chan" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 12:11 PM >>>> I'm not sure, but it makes more sense >>>> to commit these changes directly. Yee, need >> us to set >>>> you up with a commit bit? If so, fill out >> the >>>> information on this page: >>>> >>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>> >>>> and forward it to support at open-bio.org. >>>> I'll sponsor you. >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >>>> >>>>> The usual procedure for developing code is to >> exchange >>>> code via commits to a version control >> system. Yee, do >>>> you know how to use Subversion? Does Yee need a >> commit bit? >>>>> >>>>> Rob >>>>> >>>>> Yee Man Chan wrote: >>>>>> Hi Chris >>>>>> I find that there is a >> memory >>>> access bug in my code. Attached is the fixed >> HMM.xs. This >>>> file together with the simpler typemap should fix >> all >>>> problems. (I hope..) >>>>>> Please let me know if it >> works >>>> for you. >>>>>> Sorry for the bug... >>>>>> Yee Man >>>>>> --- On Fri, 8/14/09, Chris Fields >>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems >> with >>>> Bioperl-ext package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>> "Jonny Dalzell" , >>>> "BioPerl List" >>>>>>> Date: Friday, August 14, 2009, 8:31 >> AM >>>>>>> Yee Man, >>>>>>> >>>>>>> I tested this out locally (perl 5.8.8 >> 32-bit, >>>> perl 5.10.0 >>>>>>> 64-bit) and on dev.open-bio.org (which >> is perl >>>> 5.8.8, >>>>>>> appears to be 32-bit). The patch >> results >>>> in cleaning >>>>>>> up warnings for 5.10.0 but results in >> similar >>>> warnings for >>>>>>> 5.8.8 (linux or OS X). >>>>>>> >>>>>>> On OS X perl 5.8.8, this sometimes >> passes >>>> (note the first >>>>>>> attempt fails, the second succeeds), >> so it's >>>> not entirely a >>>>>>> 32-bit issue: >>>>>>> >>>>>>> http://gist.github.com/167860 >>>>>>> >>>>>>> OS X and perl 5.10.0, this always >> fails as the >>>> previous >>>>>>> gist shows, but demonstrates similar >> behavior >>>> (multiple >>>>>>> attempts to test get different >> responses): >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> On linux, everything passes with or >> w/o the >>>> patched files >>>>>>> (patched files have warnings as >> indicated >>>> above): >>>>>>> >>>>>>> Specs for all three perl executables >> (they >>>> vary a bit): >>>>>>> >>>>>>> http://gist.github.com/167883 >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Aug 14, 2009, at 3:27 AM, Yee Man >> Chan >>>> wrote: >>>>>>> >>>>>>>> Ah.. I find that the typemap can >> become as >>>> simple as >>>>>>> this >>>>>>>> ===================== >>>>>>>> TYPEMAP >>>>>>>> HMM * T_PTROBJ >>>>>>>> ===================== >>>>>>>> >>>>>>>> Then the generated HMM.c will have >> a >>>> function called >>>>>>> INT2PTR to do the pointer conversion. >> I >>>> believe this should >>>>>>> solve the warnings. >>>>>>>> Attached are the updated HMM.xs >> and >>>> typemap. Can >>>>>>> someone with a 64-bit machine give it >> a try? >>>>>>>> Thank you >>>>>>>> Yee Man >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>>> "Jonny Dalzell" , >>>>>>> "BioPerl List" >>>>>>>>> Date: Thursday, August 13, >> 2009, 5:31 >>>> PM >>>>>>>>> (just to point out to >> everyone, Yee >>>>>>>>> Man's contact information was >> in the >>>> POD) >>>>>>>>> >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I have the output in the below >> link: >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> There are similar problems >> popping up >>>> on 32- and >>>>>>> 64-bit >>>>>>>>> perl 5.10.0, Mac OS X 10.5. >>>> Haven't had time >>>>>>> to debug >>>>>>>>> it unfortunately. >>>>>>>>> >>>>>>>>> I think we should seriously >> consider >>>> spinning this >>>>>>> code off >>>>>>>>> into it's own distribution >> for >>>> CPAN. It's >>>>>>>>> unfortunately bit-rotting away >> in >>>>>>> bioperl-ext. If you >>>>>>>>> want to continue supporting it >> I can >>>> help set that >>>>>>> up. >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 6:58 PM, >> Yee Man >>>> Chan wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> So is >> this >>>> an HMM only >>>>>>> problem? Or does >>>>>>>>> it apply to other bioperl-ext >>>> modules? >>>>>>>>>> What >>>> exactly are the >>>>>>> compilation errors >>>>>>>>> for HMM? I believe my >> implementation >>>> is just a >>>>>>> simple one >>>>>>>>> based on Rabiner's paper. >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>> >>>>>>>>>> I >> don't >>>> think I did >>>>>>> anything fancy that >>>>>>>>> makes it machine dependent or >> non-ANSI >>>> C. >>>>>>>>>> Yee Man >>>>>>>>>> >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Robert Buels" >> >>>>>>>>>>> Cc: "Jonny Dalzell" >> , >>>>>>>>> "BioPerl List" , >>>>>>>>> "Yee Man Chan" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 3:18 PM >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 4:37 PM, >>>> Robert Buels >>>>>>> wrote: >>>>>>>>>>>> Jonny Dalzell >> wrote: >>>>>>>>>>>>> Is it >> ridiculous of me >>>> to expect >>>>>>> ubuntu to >>>>>>>>> take >>>>>>>>>>> care of this for >> me? How >>>> do >>>>>>>>>>>>> I go about >> compiling >>>> the HMM? >>>>>>>>>>>> Yes. This is >> a very >>>> specialized >>>>>>> thing >>>>>>>>> that >>>>>>>>>>> you're doing, and >> Ubuntu does >>>> not have >>>>>>> the >>>>>>>>> resources to >>>>>>>>>>> package every single >> thing. >>>>>>>>>>>> Unfortunately, it >> looks >>>> like >>>>>>> bioperl-ext >>>>>>>>> package is >>>>>>>>>>> not installable under >> Ubuntu >>>> 9.04 anyway, >>>>>>> which is >>>>>>>>> what I'm >>>>>>>>>>> running. For >> others on >>>> this list, >>>>>>> if >>>>>>>>> somebody is >>>>>>>>>>> interested in doing >>>> maintaining it, I'd be >>>>>>> happy >>>>>>>>> to help out >>>>>>>>>>> by testing on >> Debian-based >>>> Linux >>>>>>> platforms. >>>>>>>>> We need to >>>>>>>>>>> clarify this >> package's >>>> maintenance status: >>>>>>> if >>>>>>>>> there is >>>>>>>>>>> nobody interested in >>>> maintaining it, I >>>>>>> would >>>>>>>>> recommend that >>>>>>>>>>> bioperl-ext be removed >> from >>>> distribution. >>>>>>>>> It's not in >>>>>>>>>>> anybody's interest to >> have >>>> unmaintained >>>>>>> software >>>>>>>>> out there >>>>>>>>>>> causing confusion. >>>>>>>>>>> >>>>>>>>>>> I have cc'd Yee Man >> Chan for >>>> this. >>>>>>> If there >>>>>>>>> isn't a >>>>>>>>>>> response or the >> message >>>> bounces, we do one >>>>>>> of two >>>>>>>>> things: >>>>>>>>>>> 1) consider it >> deprecated >>>> (probably >>>>>>> safest). >>>>>>>>>>> 2) spin it out into a >> separate >>>> module. >>>>>>>>>>> >>>>>>>>>>> Just tried to comile >> it myself >>>> and am >>>>>>> getting >>>>>>>>> errors (using >>>>>>>>>>> 64bit perl 5.10), so I >> think, >>>> unless >>>>>>> someone wants >>>>>>>>> to take >>>>>>>>>>> this on, option #1 is >> best. >>>>>>>>>>> >>>>>>>>>>>> So Jonny, in >> short, I >>>> would say "do >>>>>>> not use >>>>>>>>>>> bioperl-ext". >>>>>>>>>>> >>>>>>>>>>> In general, that's a >> safe >>>> bet. We're >>>>>>> moving >>>>>>>>> most of >>>>>>>>>>> our C/C++ bindings to >> BioLib. >>>>>>>>>>> >>>>>>>>>>>> Step back. >> What are >>>> you trying >>>>>>> to >>>>>>>>>>> accomplish? >> Chris >>>> already >>>>>>> recommended some >>>>>>>>> alternative >>>>>>>>>>> methods in his email >> of 8/11 >>>> on this >>>>>>>>> subject. Perhaps >>>>>>>>>>> we can guide you to >> some >>>> software that is >>>>>>>>> actively >>>>>>>>>>> maintained and will >> meet your >>>> needs. >>>>>>>>>>>> Rob >>>>>>>>>>> Exactly. Lots of >> other >>>> (better >>>>>>> supported!) >>>>>>>>> options >>>>>>>>>>> out there. >> HMMER, SeqAn, >>>> and >>>>>>> others. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> __________________________________________________ >>>>>>>> Do You Yahoo!? >>>>>>>> Tired of spam? Yahoo! Mail >> has the >>>> best spam >>>>>>> protection around >>>>>>>> http://mail.yahoo.com >>>>>>> >>>> >> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>> >>>>> >>>>> --Robert Buels >>>>> Bioinformatics Analyst, Sol Genomics Network >>>>> Boyce Thompson Institute for Plant Research >>>>> Tower Rd >>>>> Ithaca, NY 14853 >>>>> Tel: 503-889-8539 >>>>> rmb32 at cornell.edu >>>>> http://www.sgn.cornell.edu >>>> >>>> >>> >>> >>> >> >> > > > From abhishek.vit at gmail.com Sun Aug 16 04:06:49 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 04:06:49 -0400 Subject: [Bioperl-l] About binning data for histograms Message-ID: Hi All After a lot of look up on forums I could google, I am finally posting my question here. I think it may not be appropriate for this mailing list. I apologize for this first up. The question is regarding dynamic binning of data points for histogram plots. So I have many hashes, each having a "numerical" coverage data obtained from Next generation sequencing data analysis. Now each hash may have couple of hundred to thousands entry "contig_name => coverage". What I want to do is to plot a histogram for each hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N has to be binned according to the data size). I am using Chart::Gnuplot for this but I am not able to figure out how to bin the data points to fit nicely on a screen. Is there any smart/quick method to do this. Any pointers will help a great deal. Best Regards, -Abhi From bix at sendu.me.uk Sun Aug 16 05:21:11 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 16 Aug 2009 10:21:11 +0100 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <4A87CF87.7030803@sendu.me.uk> Abhishek Pratap wrote: > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width Like it says, it depends on the data, but it's worth trying them out to see if one of them gives you anything sensible. From sdavis2 at mail.nih.gov Sun Aug 16 07:48:23 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 16 Aug 2009 07:48:23 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <264855a00908160448i2691fc08t472fc0d83afbb356@mail.gmail.com> On Sun, Aug 16, 2009 at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > Hi, Abhi. You could use R, but you got that already. ; ) However, you might look here for a perl solution. http://search.cpan.org/~whizdog/GDGraph-histogram-1.1/lib/GD/Graph/histogram.pm Sean From cjfields at illinois.edu Sun Aug 16 08:53:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 07:53:29 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <217259.7083.qm@web30408.mail.mud.yahoo.com> References: <217259.7083.qm@web30408.mail.mud.yahoo.com> Message-ID: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> That worked! Thanks Yee Man! chris ps - let me know how you want to deal with a release. On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > Hi Chris > > Thanks for your suggestions. I think it is indeed better to check > sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm > > I also fixed codes that will lead to warnings with use warnings. > > So now the only problem left is that "monotonic increasing" error. > For that part of the code, I was trying to perform an expectation > maximization step. Theoretically, the expectation should > monotonically increase in every step. But I suppose this is not > necessarily true when double precision floating point numbers are > involved. I don't know why I used a 1e-100 tolerance for this. > Therefore I "fixed" it by using the same tolerance to terminate the > maximization step (ie .000001). I suppose this "fix" will make it > much more unlikely to throw exception with your 5.10.0 perl. > > Can you give that a try again and see if it works now. > > Thank you > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 10:38 PM >> Yee, >> >> I took the liberty of making a few simple changes to >> Bio::Tools::HMM in svn to point out the problem and possible >> solutions. Feel free to revert these as needed. >> >> I'm seeing two errors, which appear randomly when running >> 'make test'. The first is easily fixable, the second, >> I'm not so sure. I'll let you make the decisions on >> both. >> >> 1) There is an assumption in the module that, when >> adding floating points, you will always get 1.0. You >> may run into problems: see 'perldoc -q long decimals'. >> Lines like this (two places in the module): >> ... >> if ($sum != 1.0) { >> $self->throw("Sum of >> probabilities for each state must be 1.0; got $sum\n"); >> } >> ... >> >> won't work as expected (note I added a simple diagnostic, >> just print out the 'bad' sum). With perl 5.8.8, this >> appears to work fine, but this is what I get with perl 5.10 >> (64-bit): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> Initial Probability Array: >> 0.499978 0.500022 >> Transition Probability Matrix: >> 0.499978 0.500022 >> 0.499978 0.500022 >> Emission Probability Matrix: >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> >> Log Probability of sequence 1: -521.808 >> Log Probability of sequence 2: -426.057 >> >> Statistical Training >> ==================== >> Initial Probability Array: >> 1 0 >> Transition Probability Matrix: >> >> ------------- EXCEPTION ------------- >> MSG: Sum of probabilities for each from-state must be 1.0; >> got 0.999999999999999976 >> >> STACK Bio::Tools::HMM::transition_prob >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 >> STACK toplevel test.pl:82 >> ------------------------------------- >> >> make: *** [test_dynamic] Error 255 >> >> I'm assuming this needs to simply be rounded up to >> 1.0. That could be accomplished with something like >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' >> >> 2) The second error is a little stranger. I have been >> randomly getting this: >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> When I add strict and warnings pragmas to Bio::Tools::HMM >> (with a little additional cleanup to get things running), I >> get an additional warning (arrow): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Argument "FL" isn't numeric in numeric lt (<) at >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line >> 188. <---- >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> So something is not being converted as expected. >> >> chris >> >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: >> >>> When are you going to release 1.6? Maybe let me work >> on it before it releases. If it doesn't resolve the problem, >> then we can think about other alternatives. >>> >>> Also, please show me the latest errors you have for >> 5.10.0. >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 7:05 PM >>>> I'm still seeing the same errors on >>>> Mac OS X for 64-bit perl 5.10.0. Mac OS X, >> native perl >>>> (v5.8.8) passes fine now (as well as perl 5.8.8 >> on >>>> dev.open-bio.org). >>>> >>>> I'm wondering if this is a problem with my local >> perl >>>> build. I'm very tempted to push the >> HMM-related code >>>> into a separate distribution (bioperl-hmm) and >> make a CPAN >>>> release out of it so it gets wider testing via >> CPAN testers; >>>> it would just require a minimum bioperl 1.6 >> installation for >>>> Bio::Tools::HMM and any related modules. >> Yee, would >>>> that be okay with you? >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >>>> >>>>> >>>>> I just committed HMM.xs and typemap to SVN. >> Can you >>>> test it to confirm it works in 64-bit machines? >>>>> >>>>> Thanks >>>>> Yee Man >>>>> >>>>> --- On Sat, 8/15/09, Chris Fields >>>> wrote: >>>>> >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Yee Man Chan" , >>>> "BioPerl List" >>>>>> Date: Saturday, August 15, 2009, 12:11 PM >>>>>> I'm not sure, but it makes more sense >>>>>> to commit these changes directly. >> Yee, need >>>> us to set >>>>>> you up with a commit bit? If so, >> fill out >>>> the >>>>>> information on this page: >>>>>> >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>>>> >>>>>> and forward it to support at open-bio.org. >>>>>> I'll sponsor you. >>>>>> >>>>>> chris >>>>>> >>>>>> On Aug 15, 2009, at 11:44 AM, Robert Buels >> wrote: >>>>>> >>>>>>> The usual procedure for developing >> code is to >>>> exchange >>>>>> code via commits to a version control >>>> system. Yee, do >>>>>> you know how to use Subversion? Does Yee >> need a >>>> commit bit? >>>>>>> >>>>>>> Rob >>>>>>> >>>>>>> Yee Man Chan wrote: >>>>>>>> Hi Chris >>>>>>>> I find >> that there is a >>>> memory >>>>>> access bug in my code. Attached is the >> fixed >>>> HMM.xs. This >>>>>> file together with the simpler typemap >> should fix >>>> all >>>>>> problems. (I hope..) >>>>>>>> Please let >> me know if it >>>> works >>>>>> for you. >>>>>>>> Sorry for the bug... >>>>>>>> Yee Man >>>>>>>> --- On Fri, 8/14/09, Chris Fields >> >>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems >>>> with >>>>>> Bioperl-ext package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>> "Jonny Dalzell" , >>>>>> "BioPerl List" >>>>>>>>> Date: Friday, August 14, 2009, >> 8:31 >>>> AM >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I tested this out locally >> (perl 5.8.8 >>>> 32-bit, >>>>>> perl 5.10.0 >>>>>>>>> 64-bit) and on >> dev.open-bio.org (which >>>> is perl >>>>>> 5.8.8, >>>>>>>>> appears to be 32-bit). >> The patch >>>> results >>>>>> in cleaning >>>>>>>>> up warnings for 5.10.0 but >> results in >>>> similar >>>>>> warnings for >>>>>>>>> 5.8.8 (linux or OS X). >>>>>>>>> >>>>>>>>> On OS X perl 5.8.8, this >> sometimes >>>> passes >>>>>> (note the first >>>>>>>>> attempt fails, the second >> succeeds), >>>> so it's >>>>>> not entirely a >>>>>>>>> 32-bit issue: >>>>>>>>> >>>>>>>>> http://gist.github.com/167860 >>>>>>>>> >>>>>>>>> OS X and perl 5.10.0, this >> always >>>> fails as the >>>>>> previous >>>>>>>>> gist shows, but demonstrates >> similar >>>> behavior >>>>>> (multiple >>>>>>>>> attempts to test get >> different >>>> responses): >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> On linux, everything passes >> with or >>>> w/o the >>>>>> patched files >>>>>>>>> (patched files have warnings >> as >>>> indicated >>>>>> above): >>>>>>>>> >>>>>>>>> Specs for all three perl >> executables >>>> (they >>>>>> vary a bit): >>>>>>>>> >>>>>>>>> http://gist.github.com/167883 >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 14, 2009, at 3:27 AM, >> Yee Man >>>> Chan >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Ah.. I find that the >> typemap can >>>> become as >>>>>> simple as >>>>>>>>> this >>>>>>>>>> ===================== >>>>>>>>>> TYPEMAP >>>>>>>>>> HMM * >> T_PTROBJ >>>>>>>>>> ===================== >>>>>>>>>> >>>>>>>>>> Then the generated HMM.c >> will have >>>> a >>>>>> function called >>>>>>>>> INT2PTR to do the pointer >> conversion. >>>> I >>>>>> believe this should >>>>>>>>> solve the warnings. >>>>>>>>>> Attached are the updated >> HMM.xs >>>> and >>>>>> typemap. Can >>>>>>>>> someone with a 64-bit machine >> give it >>>> a try? >>>>>>>>>> Thank you >>>>>>>>>> Yee Man >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Yee Man Chan" >> >>>>>>>>>>> Cc: "Robert Buels" >> , >>>>>>>>> "Jonny Dalzell" , >>>>>>>>> "BioPerl List" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 5:31 >>>>>> PM >>>>>>>>>>> (just to point out to >>>> everyone, Yee >>>>>>>>>>> Man's contact >> information was >>>> in the >>>>>> POD) >>>>>>>>>>> >>>>>>>>>>> Yee Man, >>>>>>>>>>> >>>>>>>>>>> I have the output in >> the below >>>> link: >>>>>>>>>>> >>>>>>>>>>> http://gist.github.com/167542 >>>>>>>>>>> >>>>>>>>>>> There are similar >> problems >>>> popping up >>>>>> on 32- and >>>>>>>>> 64-bit >>>>>>>>>>> perl 5.10.0, Mac OS X >> 10.5. >>>>>> Haven't had time >>>>>>>>> to debug >>>>>>>>>>> it unfortunately. >>>>>>>>>>> >>>>>>>>>>> I think we should >> seriously >>>> consider >>>>>> spinning this >>>>>>>>> code off >>>>>>>>>>> into it's own >> distribution >>>> for >>>>>> CPAN. It's >>>>>>>>>>> unfortunately >> bit-rotting away >>>> in >>>>>>>>> bioperl-ext. If you >>>>>>>>>>> want to continue >> supporting it >>>> I can >>>>>> help set that >>>>>>>>> up. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 6:58 PM, >>>> Yee Man >>>>>> Chan wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> >> So is >>>> this >>>>>> an HMM only >>>>>>>>> problem? Or does >>>>>>>>>>> it apply to other >> bioperl-ext >>>>>> modules? >>>>>>>>>>>> >> What >>>>>> exactly are the >>>>>>>>> compilation errors >>>>>>>>>>> for HMM? I believe my >>>> implementation >>>>>> is just a >>>>>>>>> simple one >>>>>>>>>>> based on Rabiner's >> paper. >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>>>> >>>>>>>>>>>> >> I >>>> don't >>>>>> think I did >>>>>>>>> anything fancy that >>>>>>>>>>> makes it machine >> dependent or >>>> non-ANSI >>>>>> C. >>>>>>>>>>>> Yee Man >>>>>>>>>>>> >>>>>>>>>>>> --- On Thu, >> 8/13/09, Chris >>>> Fields >>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> From: Chris >> Fields >>>> >>>>>>>>>>>>> Subject: Re: >>>> [Bioperl-l] >>>>>> Problems with >>>>>>>>> Bioperl-ext >>>>>>>>>>> package on WinVista? >>>>>>>>>>>>> To: "Robert >> Buels" >>>> >>>>>>>>>>>>> Cc: "Jonny >> Dalzell" >>>> , >>>>>>>>>>> "BioPerl List" , >>>>>>>>>>> "Yee Man Chan" >>>>>>>>>>>>> Date: >> Thursday, August >>>> 13, >>>>>> 2009, 3:18 PM >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 13, >> 2009, at >>>> 4:37 PM, >>>>>> Robert Buels >>>>>>>>> wrote: >>>>>>>>>>>>>> Jonny >> Dalzell >>>> wrote: >>>>>>>>>>>>>>> Is it >>>> ridiculous of me >>>>>> to expect >>>>>>>>> ubuntu to >>>>>>>>>>> take >>>>>>>>>>>>> care of this >> for >>>> me? How >>>>>> do >>>>>>>>>>>>>>> I go >> about >>>> compiling >>>>>> the HMM? >>>>>>>>>>>>>> Yes. >> This is >>>> a very >>>>>> specialized >>>>>>>>> thing >>>>>>>>>>> that >>>>>>>>>>>>> you're doing, >> and >>>> Ubuntu does >>>>>> not have >>>>>>>>> the >>>>>>>>>>> resources to >>>>>>>>>>>>> package every >> single >>>> thing. >>>>>>>>>>>>>> >> Unfortunately, it >>>> looks >>>>>> like >>>>>>>>> bioperl-ext >>>>>>>>>>> package is >>>>>>>>>>>>> not >> installable under >>>> Ubuntu >>>>>> 9.04 anyway, >>>>>>>>> which is >>>>>>>>>>> what I'm >>>>>>>>>>>>> running. >> For >>>> others on >>>>>> this list, >>>>>>>>> if >>>>>>>>>>> somebody is >>>>>>>>>>>>> interested in >> doing >>>>>> maintaining it, I'd be >>>>>>>>> happy >>>>>>>>>>> to help out >>>>>>>>>>>>> by testing on >>>> Debian-based >>>>>> Linux >>>>>>>>> platforms. >>>>>>>>>>> We need to >>>>>>>>>>>>> clarify this >>>> package's >>>>>> maintenance status: >>>>>>>>> if >>>>>>>>>>> there is >>>>>>>>>>>>> nobody >> interested in >>>>>> maintaining it, I >>>>>>>>> would >>>>>>>>>>> recommend that >>>>>>>>>>>>> bioperl-ext be >> removed >>>> from >>>>>> distribution. >>>>>>>>>>> It's not in >>>>>>>>>>>>> anybody's >> interest to >>>> have >>>>>> unmaintained >>>>>>>>> software >>>>>>>>>>> out there >>>>>>>>>>>>> causing >> confusion. >>>>>>>>>>>>> >>>>>>>>>>>>> I have cc'd >> Yee Man >>>> Chan for >>>>>> this. >>>>>>>>> If there >>>>>>>>>>> isn't a >>>>>>>>>>>>> response or >> the >>>> message >>>>>> bounces, we do one >>>>>>>>> of two >>>>>>>>>>> things: >>>>>>>>>>>>> 1) consider >> it >>>> deprecated >>>>>> (probably >>>>>>>>> safest). >>>>>>>>>>>>> 2) spin it out >> into a >>>> separate >>>>>> module. >>>>>>>>>>>>> >>>>>>>>>>>>> Just tried to >> comile >>>> it myself >>>>>> and am >>>>>>>>> getting >>>>>>>>>>> errors (using >>>>>>>>>>>>> 64bit perl >> 5.10), so I >>>> think, >>>>>> unless >>>>>>>>> someone wants >>>>>>>>>>> to take >>>>>>>>>>>>> this on, >> option #1 is >>>> best. >>>>>>>>>>>>> >>>>>>>>>>>>>> So Jonny, >> in >>>> short, I >>>>>> would say "do >>>>>>>>> not use >>>>>>>>>>>>> bioperl-ext". >>>>>>>>>>>>> >>>>>>>>>>>>> In general, >> that's a >>>> safe >>>>>> bet. We're >>>>>>>>> moving >>>>>>>>>>> most of >>>>>>>>>>>>> our C/C++ >> bindings to >>>> BioLib. >>>>>>>>>>>>> >>>>>>>>>>>>>> Step >> back. >>>> What are >>>>>> you trying >>>>>>>>> to >>>>>>>>>>>>> accomplish? >>>> Chris >>>>>> already >>>>>>>>> recommended some >>>>>>>>>>> alternative >>>>>>>>>>>>> methods in his >> email >>>> of 8/11 >>>>>> on this >>>>>>>>>>> subject. >> Perhaps >>>>>>>>>>>>> we can guide >> you to >>>> some >>>>>> software that is >>>>>>>>>>> actively >>>>>>>>>>>>> maintained and >> will >>>> meet your >>>>>> needs. >>>>>>>>>>>>>> Rob >>>>>>>>>>>>> Exactly. >> Lots of >>>> other >>>>>> (better >>>>>>>>> supported!) >>>>>>>>>>> options >>>>>>>>>>>>> out there. >>>> HMMER, SeqAn, >>>>>> and >>>>>>>>> others. >>>>>>>>>>>>> chris >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> __________________________________________________ >>>>>>>>>> Do You Yahoo!? >>>>>>>>>> Tired of spam? >> Yahoo! Mail >>>> has the >>>>>> best spam >>>>>>>>> protection around >>>>>>>>>> http://mail.yahoo.com >>>>>>>>> >>>>>> >>>> >> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --Robert Buels >>>>>>> Bioinformatics Analyst, Sol Genomics >> Network >>>>>>> Boyce Thompson Institute for Plant >> Research >>>>>>> Tower Rd >>>>>>> Ithaca, NY 14853 >>>>>>> Tel: 503-889-8539 >>>>>>> rmb32 at cornell.edu >>>>>>> http://www.sgn.cornell.edu >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > From hlapp at gmx.net Sun Aug 16 11:07:39 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:07:39 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > I'm assuming this needs to simply be rounded up to 1.0. That could > be accomplished with something like 'if (sprintf("%.2f", $sum) != > 1.0) {...}' Couldn't you just test for the absolute difference being smaller than some reasonable epsilon? That might be more efficient (and more explicit) than printing to a string. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 16 11:13:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:13:54 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > Not sure about calling it bioperl-phylo (which might be confused > with Rutger's Bio::Phylo). Frankly, it seems to me that either is more powerful in combination with the other, so I don't quite see how the name suggesting some linkage isn't a Good Thing rather than bad. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Sun Aug 16 11:42:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:42:50 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> Message-ID: On Aug 16, 2009, at 10:07 AM, Hilmar Lapp wrote: > > On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > >> I'm assuming this needs to simply be rounded up to 1.0. That could >> be accomplished with something like 'if (sprintf("%.2f", $sum) != >> 1.0) {...}' > > > Couldn't you just test for the absolute difference being smaller > than some reasonable epsilon? That might be more efficient (and more > explicit) than printing to a string. > > -hilmar Yes, either way is fine. Re: floating point and sprintf, acc. to the perlfaq4, as perl doesn't have a round() function the sprintf() idiom is suggested (and commonly used). chris From cjfields at illinois.edu Sun Aug 16 11:48:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:48:52 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > >> Not sure about calling it bioperl-phylo (which might be confused >> with Rutger's Bio::Phylo). > > > Frankly, it seems to me that either is more powerful in combination > with the other, so I don't quite see how the name suggesting some > linkage isn't a Good Thing rather than bad. > > -hilmar I don't have a problem as long as there is some emphasis they are two separate, but related, projects. There is quite a bit of crossover between the two (particularly with the last few bioperl-related GSoC projects), but I would rather not have to worry about users emailing the list wondering why something in bioperl-phylo doesn't work when they installed Bio::Phylo instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended module with bioperl-phylo to alleviate that? chris From maj at fortinbras.us Sun Aug 16 12:59:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 16 Aug 2009 12:59:40 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: <44D32BE895F446A9917A5550485AB102@NewLife> I see both points- I think Chris's suggestion is good. The nexml support won't work without Bio::Phylo, but not everyone will need that support, so if the install can be chatty about this that would be great- ----- Original Message ----- From: "Chris Fields" To: "Hilmar Lapp" Cc: "BioPerl List" ; "Mark A. Jensen" ; "chase Miller" Sent: Sunday, August 16, 2009 11:48 AM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > >> On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: >> >>> Not sure about calling it bioperl-phylo (which might be confused with >>> Rutger's Bio::Phylo). >> >> >> Frankly, it seems to me that either is more powerful in combination with the >> other, so I don't quite see how the name suggesting some linkage isn't a >> Good Thing rather than bad. >> >> -hilmar > > I don't have a problem as long as there is some emphasis they are two > separate, but related, projects. There is quite a bit of crossover between > the two (particularly with the last few bioperl-related GSoC projects), but I > would rather not have to worry about users emailing the list wondering why > something in bioperl-phylo doesn't work when they installed Bio::Phylo > instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended > module with bioperl-phylo to alleviate that? > > chris > > From rmb32 at cornell.edu Sun Aug 16 13:16:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 16 Aug 2009 10:16:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <44D32BE895F446A9917A5550485AB102@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> Message-ID: <4A883EE2.3060101@cornell.edu> Mark A. Jensen wrote: > I see both points- I think Chris's suggestion is good. The nexml support > won't work without Bio::Phylo, but not everyone will need that support, > so if the install can be chatty about this that would be great- Maybe the parts that have differing dependencies should be in different distros then? Rob From jason at bioperl.org Sun Aug 16 13:25:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 13:25:08 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> For binning of a distribution see the perl module Statistics::Descriptive - http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm function: frequency_distritibution I would also look at R histogram function for the plotting. This would be one of the easiest ways - I would just make a perl script that generates the correct R code that can be used to make the plots. On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > > Best Regards, > -Abhi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From abhishek.vit at gmail.com Sun Aug 16 13:34:54 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 13:34:54 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> References: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> Message-ID: Thanks All. I completely forgot and dint realize that histogram function in R could auto bin based on the data. Cheers, -Abhi On Sun, Aug 16, 2009 at 1:25 PM, Jason Stajich wrote: > For binning of a distribution see the perl module Statistics::Descriptive - > http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm?function: > frequency_distritibution > > I would also look at R histogram function for the plotting. ?This would be > one of the easiest ways - I would just make a perl script that generates the > correct R code that can be used to make the plots. > > > On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > >> Hi All >> >> After a lot of look up on forums I could google, I am finally posting >> my question here. I think it may not be appropriate for this mailing >> list. I apologize for this first up. The question is regarding dynamic >> binning of data points for histogram plots. >> >> So I have many hashes, each having a "numerical" coverage data >> obtained from Next generation sequencing data analysis. Now each hash >> may have couple of hundred to thousands entry "contig_name => >> coverage". ?What I want to do is to plot a histogram for each >> hash/dataset. ?"Coverage v/s Count of contigs with coverage > #N " ( N >> has to be binned according to the data size). >> >> I am using Chart::Gnuplot for this but I am not able to figure out how >> to bin the data points to fit nicely on a screen. Is there any >> smart/quick method to do this. >> >> Any pointers will help a great deal. >> >> Best Regards, >> -Abhi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From robert.bradbury at gmail.com Sun Aug 16 15:16:09 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 16 Aug 2009 15:16:09 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? Message-ID: Hello, I am trying to use get_sequence() to fetch the sequence NS_000198 for the fungus *Podospora anserina* with the databases "GenBank" and when that didn't work "Gene". This is a simple script which fetches the sequence then writes out the fasta and genbank files from the data structure. The errors I got suggested that the system was running out of memory which I thought was unlikely since I've got something like 3GB of main memory and 9GB of swap space. After running strace on the script (which takes a while) I determined that the brk() calls were generating ENOMEM at ~3GB. This turns out to be due to the limit of the Linux memory model I am using (3GB/1GB) on a Pentium IV (Prescott). Now, I think the total genome size for the fungus is ~70MB but haven't verified this so I "should" be able to fetch it unless Bioperl (or perl itself) is doing extremely poor memory management (perhaps not coalescing memory segments into one large sequence) as the reads take place? [1]. Has anyone encountered this problem (fetching say large mammalian chromosomes)? Does anyone know what the limits are for "fetching" sequence files (on 32/64 bit machines?. The reason I am using get_sequence and BioPerl is that I can't seem to find the *Podospora anserina* sequence in a FTP database anywhere (so I can't use "wget or ftp"). I haven't tested accessing the GenBank file in a browser (I don't know what browsers would do with a HTML file that large but suspect it would not be pretty). Thanks in advance, Robert Bradbury 1. The strace seems to indicate periodic brk() calls to expand the process data segment size between which there are lots of read() calls of size 4096, presumably reading the socket from NCBI. I don't know if there is an easy way to trace perl's memory allocation/manipulation at a higher level. From jason at bioperl.org Sun Aug 16 15:22:35 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 15:22:35 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? In-Reply-To: References: Message-ID: <93672502-26EB-4C30-A37E-F3B593E57279@bioperl.org> Robert - Posting your script will help us replicate and diagnose - I am not sure which GenBank fetch option you are using. I have a feeling it is trying to do recursive calls to stitch together the pseudoscaffold. I presume it works find though if you request the each chromosome scaffold like CU607053,CU633438, ... I guess posting it via a bugzilla bug is the best way unless you have a git account and wanted to post it as a 'gist'. -jason -- Jason Stajich jason at bioperl.org http://fungalgenomes.org/ On Aug 16, 2009, at 3:16 PM, Robert Bradbury wrote: > Hello, > > I am trying to use get_sequence() to fetch the sequence NS_000198 > for the > fungus *Podospora anserina* with the databases "GenBank" and when that > didn't work "Gene". This is a simple script which fetches the > sequence then > writes out the fasta and genbank files from the data structure. > > The errors I got suggested that the system was running out of memory > which I > thought was unlikely since I've got something like 3GB of main > memory and > 9GB of swap space. After running strace on the script (which takes > a while) > I determined that the brk() calls were generating ENOMEM at ~3GB. > This > turns out to be due to the limit of the Linux memory model I am using > (3GB/1GB) on a Pentium IV (Prescott). > > Now, I think the total genome size for the fungus is ~70MB but haven't > verified this so I "should" be able to fetch it unless Bioperl (or > perl > itself) is doing extremely poor memory management (perhaps not > coalescing > memory segments into one large sequence) as the reads take place? [1]. > > Has anyone encountered this problem (fetching say large mammalian > chromosomes)? Does anyone know what the limits are for "fetching" > sequence > files (on 32/64 bit machines?. The reason I am using get_sequence and > BioPerl is that I can't seem to find the *Podospora anserina* > sequence in a > FTP database anywhere (so I can't use "wget or ftp"). I haven't > tested > accessing the GenBank file in a browser (I don't know what browsers > would do > with a HTML file that large but suspect it would not be pretty). > > Thanks in advance, > Robert Bradbury > > 1. The strace seems to indicate periodic brk() calls to expand the > process > data segment size between which there are lots of read() calls of > size 4096, > presumably reading the socket from NCBI. I don't know if there is > an easy > way to trace perl's memory allocation/manipulation at a higher level. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Aug 16 15:42:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 14:42:56 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A883EE2.3060101@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> <4A883EE2.3060101@cornell.edu> Message-ID: <69B8C887-1C5E-47B4-9168-8509BB0A5528@illinois.edu> On Aug 16, 2009, at 12:16 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> I see both points- I think Chris's suggestion is good. The nexml >> support >> won't work without Bio::Phylo, but not everyone will need that >> support, >> so if the install can be chatty about this that would be great- > > Maybe the parts that have differing dependencies should be in > different distros then? > > Rob I'm guessing large chunks of that code would have Bio::Root::Root as a base, so I think maintaining related code split into two distributions too problematic. Simple to indicate that Bio::Phylo is required only for NeXML (so listing it as a 'recommends') and keep everything NeXML- related and requiring Bio::Root::Root in one spot. It's possible something inheriting from Bio::Phylo could go there, but that's up to Rutger. chris From maj at fortinbras.us Mon Aug 17 08:43:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 08:43:33 -0400 Subject: [Bioperl-l] new NeXML I/O modules Message-ID: Hi All- I'm pleased to announce that my Google Summer of Code student Chase Miller and I have successfully migrated his modules for NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is Rutger Vos' highly flexible, highly annotable standard for evolutionary data exchange, that is catching on in the evolutionary DB world. We hope these modules will help move that process along. I also want to say that Chase has been a terrific student and collaborator. He learned the not only the complexities of BioPerl IO from scratch, but also grokked Rutger's Bio::Phylo internals, and became familiar with and applied modern OO concepts. He also wrote tests (which pass!), complete POD, and a HOWTO (at http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this work. Best of all, he finished! (Well, as much as anything is ever finished around here.) I for one hope he will continue to use his commit bit for good and not evil. cheers, Mark From deequan at gmail.com Mon Aug 17 09:06:44 2009 From: deequan at gmail.com (David Quan) Date: Mon, 17 Aug 2009 09:06:44 -0400 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? Message-ID: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Hello there, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Hints and recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From akarger at CGR.Harvard.edu Mon Aug 17 09:04:29 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 17 Aug 2009 09:04:29 -0400 Subject: [Bioperl-l] on BP documentation References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger From cjfields at illinois.edu Mon Aug 17 10:06:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:06:15 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: Congrats Chase! chris On Aug 17, 2009, at 7:43 AM, Mark A. Jensen wrote: > Hi All- > > I'm pleased to announce that my Google Summer of Code student > Chase Miller and I have successfully migrated his modules for > NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is > Rutger Vos' highly flexible, highly annotable standard for > evolutionary data exchange, that is catching on in the > evolutionary DB world. We hope these modules will help move that > process along. > > I also want to say that Chase has been a terrific student and > collaborator. He learned the not only the complexities of BioPerl > IO from scratch, but also grokked Rutger's Bio::Phylo internals, > and became familiar with and applied modern OO concepts. He also > wrote tests (which pass!), complete POD, and a HOWTO (at > http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this > work. Best of all, he finished! (Well, as much as anything is > ever finished around here.) I for one hope he will continue to > use his commit bit for good and not evil. > > cheers, > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:22:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:22:26 -0500 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? In-Reply-To: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> References: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Message-ID: <74D10663-5770-43DA-ABDB-27FA5D532497@illinois.edu> That's possible, yes. Use the hit information and use Bio::DB::GenBank to pull the sequence out, in the below example. Note that strand is different than BioPerl's -1/0/1; efetch strand: 1 = normal (default), 2 = comp. ================================ my $factory = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $seqstart, -seq_stop => $seqend, -strand => $strand, # 1=plus, 2=minus ); $factory->get_Seq_by_id($id); # should be UID, use get_Seq_by_acc() for accessions ================================ This pulls everything into a Bio::Seq, though, so you'll need to push it out to a SeqIO output stream. You can also use Bio::DB::EUtilities to get the raw sequence via efetch, something like (untested): ================================ my $fetcher = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'gb'); # loop: for each hit/HSP, grab sequence... my $fetcher->set_parameters( -id => $id # UID or accession -seq_start => $seqstart, # hit start -seq_stop => $seqend, # hit end -strand => $strand # 1=plus, 2=minus ); # then get raw content $fetcher->get_Response(-file => ">$id.gb"); ================================ You could probably plug into ENSembl similarly if the db versions match; see: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences chris On Aug 17, 2009, at 8:06 AM, David Quan wrote: > Hello there, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Hints and recommendations for > elements that would be capable of behaving in such a way would be > greatly appreciated. Thanks very much. > > David N. Quan > > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:47:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:47:31 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: On Aug 17, 2009, at 8:04 AM, Amir Karger wrote: >> -----Original Message----- >> From: Mark A. Jensen [mailto:maj at fortinbras.us] >> >> From: "Hilmar Lapp" >> ... >>> As for the FASTA example, I can understand - I've heard >> repeatedly >>> from people that one of the things that they are missing is >>> documentation for every SeqIO format we support (such as >> GenBank, >>> UniProt, FASTA, etc) about where to find a particular piece of >> the >>> format in the object model. >> >> This is the right thread for list lurkers to contribute their betes >> noires >> such as this one. I encourage ALL to post these issues and help >> create >> our list of action items. >> MAJ > > I wish you the best of luck on this ambitious and crucial project. I > teach intro Perl classes to biologists and always tell them that > Bioperl > is amazingly useful, but only if you can figure out how to use it. If > what you want to do isn't in the howtos, you can be in big trouble. > > I was trying to remember specific examples of where I've gotten lost, > and unfortunately can't give any. But I can tell you that often I've > run > into trouble because the particular method I'm looking for is three > parent classes away from the module I'm actually looking at. The > deobfuscator helps some, but only for people who know about that. Do > you > think you could automate a tool that would add the following to the > bottom of each module? > > =head2 Inherited methods > > =over 4 > > =item desc > > See Bio::Seq::Basic > > =back > > This would make browsing through the docs on bioperl.org more fun too. > > -Amir Karger For many modules this is already in place, but yes this could be improved. One of the problems I suggest we avoid when doing this is placing these interspersed within code. It has been demonstrated that doing so actually slows down the perl interpreter slightly; it has to slog through lots of POD to find the code at the compilation step. This occurs only upon on initial compilation, but it is significant enough that the overall recommendation by most perl brethren (and in Perl Best Practices) has been to place any POD after an __END__ marker. This way the compiler doesn't have to look at it at all, but perldoc can still find it. Also, acc to PBP, although the inline POD would seemingly be easier to take care of, apparently the opposite is true in most cases (though it can come down to styling differences). Interspersed code is much harder to maintain in a consistent state, tends to be choppier, and can be laid out in odd ways due to being scattered throughout the file. I know this can come down to a difference in style, but the arguments do make sense enough to me that in Biome I am pushing to have all docs after the __END__ marker. Lincoln already practices this within bioperl and Bio::Graphics, and I plan on moving much on my documentation similarly within my code in BioPerl. The additional comments in the PBP chapter "Documentation" are well- worth reading if you can get your hands on it. chris From rmb32 at cornell.edu Mon Aug 17 11:21:08 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:21:08 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A897564.2090203@cornell.edu> Hurrah! GSoC strikes again! Rob From rmb32 at cornell.edu Mon Aug 17 11:45:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:45:18 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <474354.59886.qm@web30408.mail.mud.yahoo.com> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A897B0E.7060208@cornell.edu> Yee Man Chan wrote: > As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. We're not pushing to spin it off because of code quality, we're pushing to spin it off because we're spinning everything off. The plan is to break BioPerl up into many discrete distributions on CPAN with the dependencies between them well-known and codified. This will make maintenance of BioPerl *much* easier in the long run. So this means that the plan of action should be 1.) get the code so that it's working on all platforms, 2.) create a CPAN distribution for it and put it on CPAN, 3.) remove it from bioperl-ext Also, doing a search for bioperl-ext on CPAN brings to light a couple of issues that probably need to be dealt with. To wit: 1.) there is an ancient version of bioperl-ext that probably needs to be removed, it's under ~birney's account. Thoughts on this? 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on bioperl-ext, which suggests that these really need to be split off, each with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the first case of this: * make a dir in the repos called Bio-Tools-HMM alongside bioperl-live, having trunk/, and branches/ subdirs * move Bio::Tools::HMM out of bioperl-live into that * move Bio::Ext::HMM stuff out of bioperl-ext into that * repeat with Bio::Tools::dpAlign and pSW, which would probably go together into a Bio-Tools-Align distro, I think Sounds like this is moving along nicely. Rob From rmb32 at cornell.edu Mon Aug 17 11:48:10 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:48:10 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <4A897BBA.2070204@cornell.edu> Also, I volunteer to make this branch and module machinery and such if you want. I just don't want to step on any ongoing development you guys are going in the bioperl-ext trunk. If you want me to do it, just say the word, either here or in #bioperl. Rob Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So if >> the HMM doesn't meet that standard, I am ok with it being spinned off. > > We're not pushing to spin it off because of code quality, we're pushing > to spin it off because we're spinning everything off. The plan is to > break BioPerl up into many discrete distributions on CPAN with the > dependencies between them well-known and codified. This will make > maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a couple of > issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs to be > removed, it's under ~birney's account. Thoughts on this? > > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on > bioperl-ext, which suggests that these really need to be split off, each > with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the > first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside > bioperl-live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Mon Aug 17 12:58:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 11:58:24 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> On Aug 17, 2009, at 10:45 AM, Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So >> if the HMM doesn't meet that standard, I am ok with it being >> spinned off. > > We're not pushing to spin it off because of code quality, we're > pushing to spin it off because we're spinning everything off. The > plan is to break BioPerl up into many discrete distributions on CPAN > with the dependencies between them well-known and codified. This > will make maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a > couple of issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs > to be removed, it's under ~birney's account. Thoughts on this? This subject just recently popped up on perl.module.authors, more in relation to abandonware, but a similar thing. Andreas has indicate there is an abandoned flag that can be set so it's worth looking into, but using it requires another release. I have been in contact with that group on ideas for the split; libwin32 did the same thing, so I'll contact Jan Dubois on the matter for some pointers. > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend > on bioperl-ext, which suggests that these really need to be split > off, each with the Bio::Ext::Modules they depend on. > Bio::Tools::HMM could be the first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside bioperl- > live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob Yes, that's essentially the idea. The more significant impact of this (both here and in core) is allowing updates to be made as needed, and not be blocked due to issues in unrelated modules. We have been waiting years for fixes to pSW, Staden::read, Align w/o progress, which has hindered overall releases of bioperl-ext. Similar problems exist in bp-core. Re: bioperl-ext, BioLib has rendered some of those implementations obsolete. I would rather do that incrementally (individual implementations) vs. wait for a full-blown bioperl-ext release, so splitting these up makes that possible. chris From robert.bradbury at gmail.com Mon Aug 17 13:14:57 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 17 Aug 2009 13:14:57 -0400 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers Message-ID: One of the questions facing people working in bioinformatics is "How do we present information so that it can be effectively interpreted by non-informatics specialists?" Now, my expertise lies in computer science (esp. O.S. & databases) and as a second vocation the biology of aging (DNA damage & repair, to a lesser extent cancer and pathologies of aging, etc.). Now by my estimate there are perhaps 5 people in the world who are able to effectively discuss computer science X aging (gerontology) [3]. There are perhaps several dozen people where those areas, esp aging, may overlap with DNA damage & repair. But then there is a wider audience of perhaps a few hundred members of AGE, and maybe a thousand or so who are members of the scientific subgroup of GSA. But most of those individuals are "old school" scientists who know relatively little about bioinformatics. So one has barriers to presenting bioinformatics information in ways that they can use usefully. I have found in my limited experience that homology graphs of conserved protein domains, such as those displayed in HomloGene or those in Ensembl (including phylogeny graphs) can be quite useful in reaching interesting conclusions. For example, double strand break repair processes which may involve 8-10 relatively conserved proteins, may have a critical role in the mechanisms of aging. In particular two of those proteins, WRN & DCLRE1C (Artemis) contain complementary exonuclease activities which chew up the DNA in order to prepare the strands for ligation. Of course, programmers may appreciate better than gerontologists the significance of deleting random bytes from instruction sequences in ones code. At the recent AGE meeting in June several discussions arose as to possible differences in "aging" in yeast, *C. elegans* and mammals. [1]. A quick database search showed that *C. elegans* seems to be lacking the exonuclease domain on the WRN homologue and may be missing a DCLRE1C homologue entirely (which if true would lead to conclusions that aging in *C. elegans* may be fundamentally different from aging in vertebrates). Explaining this to researchers can best be done using pictures. I've been through PubMed and have several papers (NAR / BMC Bioinformatics) regarding programs to do homology comparisons and phylogeny trees. However these seem to lean towards producing less condensed bioinformatics-ish information. I do not know however whether the outputs from databases like PubMed HomoloGene or Ensembl have been packaged in tools that might be part of BioPerl. I am interested in programs that can be run on a regular basis to draw "pretty pictures" that can be used for publication and/or internet browsing. In particular I'm interested in running such programs on species of interest to various gerontological communities [2] which involves subsets of databases which seem to be scattered around the world. Thanks. 1. Of course there has been lots of discussion and rationalization over the last 15+ years about how "aging" is largely the same in more complex and simpler organisms -- in part to justify sequencing some organisms and in part to justify funding research at certain laboratories. A closer examination based on some of the complete and emerging genome sequences may suggest this is a very swampy discussion. 2. For example, nematode DNA repair gene comparisons would be interesting to nematode researchers, insect DNA repair gene comparisons to insect researchers, both to invertebrate researchers, etc. 3. The recently published textbooks *Aging of the Genome* by Jan Vijg and the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg *et al*, go a long way towards moving these areas from the stacks of research libraries into areas for more general discussion. Both volumes deal extensively with the ~150 DNA repair genes. From cjfields at illinois.edu Mon Aug 17 13:15:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 12:15:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897BBA.2070204@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <4A897BBA.2070204@cornell.edu> Message-ID: I say go for it if Yee Man is okay with the idea. It gets the code out there that much faster. This also doesn't depend on core being split up (only need a 'requires' bioperl 1.6.0). chris On Aug 17, 2009, at 10:48 AM, Robert Buels wrote: > Also, I volunteer to make this branch and module machinery and such > if you want. I just don't want to step on any ongoing development > you guys are going in the bioperl-ext trunk. > > If you want me to do it, just say the word, either here or in > #bioperl. > > Rob > > Robert Buels wrote: >> Yee Man Chan wrote: >>> As to the release, my thinking is that I do understand that >>> your desire to maintain a high level of quality in BioPerl code >>> base. So if the HMM doesn't meet that standard, I am ok with it >>> being spinned off. >> We're not pushing to spin it off because of code quality, we're >> pushing to spin it off because we're spinning everything off. The >> plan is to break BioPerl up into many discrete distributions on >> CPAN with the dependencies between them well-known and codified. >> This will make maintenance of BioPerl *much* easier in the long run. >> So this means that the plan of action should be >> 1.) get the code so that it's working on all platforms, >> 2.) create a CPAN distribution for it and put it on CPAN, >> 3.) remove it from bioperl-ext >> Also, doing a search for bioperl-ext on CPAN brings to light a >> couple of issues that probably need to be dealt with. To wit: >> 1.) there is an ancient version of bioperl-ext that probably needs >> to be removed, it's under ~birney's account. Thoughts on this? >> 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend >> on bioperl-ext, which suggests that these really need to be split >> off, each with the Bio::Ext::Modules they depend on. >> Bio::Tools::HMM could be the first case of this: >> * make a dir in the repos called Bio-Tools-HMM alongside bioperl- >> live, having trunk/, and branches/ subdirs >> * move Bio::Tools::HMM out of bioperl-live into that >> * move Bio::Ext::HMM stuff out of bioperl-ext into that >> * repeat with Bio::Tools::dpAlign and pSW, which would probably >> go together into a Bio-Tools-Align distro, I think >> Sounds like this is moving along nicely. >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chmille4 at gmail.com Mon Aug 17 14:44:09 2009 From: chmille4 at gmail.com (Chase Miller) Date: Mon, 17 Aug 2009 14:44:09 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A897564.2090203@cornell.edu> References: <4A897564.2090203@cornell.edu> Message-ID: <991fb8210908171144t3f7107f0ldaf02dfdc762ae27@mail.gmail.com> Thanks! It was a great experience. I couldn't have done it without Mark who was a fantastic mentor. cheers, Chase On Mon, Aug 17, 2009 at 11:21 AM, Robert Buels wrote: > Hurrah! GSoC strikes again! > > Rob > From rmb32 at cornell.edu Mon Aug 17 16:32:14 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:32:14 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> Message-ID: <4A89BE4E.7090901@cornell.edu> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro at Bio-Tools-HMM in the repo. The tests are not passing, I think that some bugs need to be fixed in the logic of things. Yee Man, could you have a look? To download the newly repackaged code: svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM perl Build.PL; ./Build test Please check that things are compiling OK, check the test logic, upgrade the tests to use Test::More, and get the tests to the point where they are passing. At that point, it should be ready for CPAN, but we need to decide how we want to coordinate that with releases of bioperl-live and bioperl-ext. Rob From rmb32 at cornell.edu Mon Aug 17 16:45:42 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:45:42 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A89C176.3050109@cornell.edu> Mark A. Jensen wrote: > wrote tests (which pass!), complete POD, and a HOWTO (at The tests for this are depending on Bio::Phylo and fail if it's not installed. Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a "recommended" module, or what? Gotta clarify our dependencies. Rob From cjfields at illinois.edu Mon Aug 17 16:54:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 15:54:05 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: On Aug 17, 2009, at 3:45 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not > installed. Are we going to add Bio::Phylo as a bioperl dependency, > or band-aid it as a "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob 'recommends', should skip all tests as a 'pass' with message that 'Bio::Phylo is required' or somesuch. chris From maj at fortinbras.us Mon Aug 17 16:55:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 16:55:19 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: <3D65CA5234EB4BDF892F280D575FB01D@NewLife> I meant to add a skip tests on a runtime check for bio::phylo. Gotta do that. It's necessary only for these modules. ----- Original Message ----- From: "Robert Buels" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Rutger Vos" ; "Chase Miller" Sent: Monday, August 17, 2009 4:45 PM Subject: Re: [Bioperl-l] new NeXML I/O modules > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not installed. > Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a > "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Aug 17 17:22:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:22:00 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89BE4E.7090901@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> Message-ID: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Still seeing that odd warning popping up: cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line 185. Have you tried using Yee Man's original Makefile.PL to see if it works better? There appear to be some differences in the compilation, including a linking warning popping up. chris On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro > at Bio-Tools-HMM in the repo. The tests are not passing, I think > that some bugs need to be fixed in the logic of things. > > Yee Man, could you have a look? To download the newly repackaged > code: > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ > bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM > > perl Build.PL; ./Build test > > Please check that things are compiling OK, check the test logic, > upgrade the tests to use Test::More, and get the tests to the point > where they are passing. > > At that point, it should be ready for CPAN, but we need to decide > how we want to coordinate that with releases of bioperl-live and > bioperl-ext. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 17:28:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:28:05 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Take that back. Yes the 'FL' warning is still there, but no tests are run b/c (simply put) there are no regression tests (no use of Test or Test::More). If you run './Build test --verbose' you can see the run, but no test output. That should be easy to fix, though. chris On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > Still seeing that odd warning popping up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / > Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line > 185. > > Have you tried using Yee Man's original Makefile.PL to see if it > works better? There appear to be some differences in the > compilation, including a linking warning popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro >> at Bio-Tools-HMM in the repo. The tests are not passing, I think >> that some bugs need to be fixed in the logic of things. >> >> Yee Man, could you have a look? To download the newly repackaged >> code: >> >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >> bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM >> >> perl Build.PL; ./Build test >> >> Please check that things are compiling OK, check the test logic, >> upgrade the tests to use Test::More, and get the tests to the point >> where they are passing. >> >> At that point, it should be ready for CPAN, but we need to decide >> how we want to coordinate that with releases of bioperl-live and >> bioperl-ext. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 18:26:19 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 17:26:19 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <419432.62970.qm@web30403.mail.mud.yahoo.com> References: <419432.62970.qm@web30403.mail.mud.yahoo.com> Message-ID: <227EADF3-D769-413D-B1BF-22C919C8D097@illinois.edu> Yee Man, Will look into that. I do recall that disappearing last night, so I'll go look at the commit log. I have committed some regression tests using Bio::Root::Test. This'll need to be extensively tested b/c we're comparing floating point numbers, though I do use our custom float_is() test to run these (so we only compare first six signif). These are passing for me on 64bit perl 5.10.0; I may try these on a local 64bit linux (I need to set up bioperl on it first). chris On Aug 17, 2009, at 5:19 PM, Yee Man Chan wrote: > I believe this warnings should have been fixed with the latest Bio/ > Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/ > HMM.pm? I noticed that there are two pairs of "use strict" and "use > warnings" in this version. :P > > Yee Man > > --- On Mon, 8/17/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "BioPerl List" , "Yee Man Chan" > > >> Date: Monday, August 17, 2009, 2:22 PM >> Still seeing that odd warning popping >> up: >> >> cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose >> t/001_basics.t .. Argument "FL" isn't numeric in numeric lt >> (<) at >> /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm >> line 185. >> >> Have you tried using Yee Man's original Makefile.PL to see >> if it works better? There appear to be some >> differences in the compilation, including a linking warning >> popping up. >> >> chris >> >> On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: >> >>> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into >> a new distro at Bio-Tools-HMM in the repo. The tests >> are not passing, I think that some bugs need to be fixed in >> the logic of things. >>> >>> Yee Man, could you have a look? To download the >> newly repackaged code: >>> >>> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >>> bioperl/Bio-Tools-HMM/trunk >> Bio-Tools-HMM >>> >>> perl Build.PL; ./Build test >>> >>> Please check that things are compiling OK, check the >> test logic, upgrade the tests to use Test::More, and get the >> tests to the point where they are passing. >>> >>> At that point, it should be ready for CPAN, but we >> need to decide how we want to coordinate that with releases >> of bioperl-live and bioperl-ext. >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From abhishek.vit at gmail.com Mon Aug 17 18:53:19 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 18:53:19 -0400 Subject: [Bioperl-l] Error Copying Hashes Message-ID: Hi Guys I think this one should be appropriate for here. I am trying to copy a hash (spaced out below for the sake of readability} % { $OUTPUT->{$dir}->{'file'}->{$file}->{'additive'} } =%ADDITIVE_COUNT; ## Where %ADDITIVE_COUNT is a simple hash. (key/value) No references : I am getting this error :- Odd number of elements in hash assignment at ./assessCoverage.pl line 258 Seeing the dump of hash I see this $VAR1 = { '/local/seq/' => { 'read_len' => 36, 'file' => { 's_3_sorted.txt' => { 'additive' => { '8979/16384' => undef #### I dont understand this behavior. Something unusual is going on ????? }}}}} From rmb32 at cornell.edu Mon Aug 17 19:00:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:00:00 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <360578.66990.qm@web30403.mail.mud.yahoo.com> References: <360578.66990.qm@web30403.mail.mud.yahoo.com> Message-ID: <4A89E0F0.8010307@cornell.edu> Yee Man Chan wrote: > I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Please read my email above with instructions for checkout out the new Bio-Tools-HMM component, where Bio::Tools::HMM has been moved. Please do not add the Bio::Tools::HMM module back into bioperl-live. I think you might be confused about the functions of 'svn add', 'svn commit', etc, because I don't see any actual addition of the module in the commit logs. Please read through the SVN manual at http://svnbook.red-bean.com/ if you need clarification. Rob From rmb32 at cornell.edu Mon Aug 17 19:30:07 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:30:07 -0700 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: References: Message-ID: <4A89E7FF.1020603@cornell.edu> Well for one thing, it looks like somewhere a hash is getting accidentally evaluated in scalar context. '8979/16384' is a typical result of doing, for example, my $x = %some_hash; This might not be the proximate cause of your problem, it would be better to post your whole script somewhere so people can look over it. That said, this isn't the right list for this, this list is specifically for discussing the BioPerl toolkit, not just perl that is used in biology. IRC probably the quickest place to get perl help, try the #perl-help channel on the server irc.perl.org. Otherwise, you might try asking on a general perl mailing list, there seem to be some listed at http://perl-begin.org/mailing-lists/ Best of luck! Rob From abhishek.vit at gmail.com Mon Aug 17 19:33:41 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 19:33:41 -0400 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: <4A89E7FF.1020603@cornell.edu> References: <4A89E7FF.1020603@cornell.edu> Message-ID: Ok great. Thanks for pointing me to the right places to post later. best, -Abhi On Mon, Aug 17, 2009 at 7:30 PM, Robert Buels wrote: > Well for one thing, it looks like somewhere a hash is getting accidentally > evaluated in scalar context. '8979/16384' is a typical result of doing, for > example, my $x = %some_hash; This might not be the proximate cause of your > problem, it would be better to post your whole script somewhere so people > can look over it. > > That said, this isn't the right list for this, this list is specifically > for discussing the BioPerl toolkit, not just perl that is used in biology. > > IRC probably the quickest place to get perl help, try the #perl-help > channel on the server irc.perl.org. > > Otherwise, you might try asking on a general perl mailing list, there seem > to be some listed at > http://perl-begin.org/mailing-lists/ > > Best of luck! > > Rob > From rmb32 at cornell.edu Mon Aug 17 19:42:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:42:21 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A87275C.5040300@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> Message-ID: <4A89EADD.9050509@cornell.edu> I'm digging into the second item on implementation plan, having mostly finished splitting off Bio::FeatureIO (in a branch): * Rename some TypedSeqFeatureI methods as suggested in Hilmar's post Where Hilmar's post is at http://article.gmane.org/gmane.comp.lang.perl.bio.general/15846 Now, he refers to an interesting thing in there that I haven't heard discussed before, which is the concept of having the feature's source_tag by typed with an ontology term also, as source_term(). I can see how this might be a good idea, or it might be overkill. Anybody have thoughts on having feature _sources_ strongly typed with ontology terms? Rob From Kevin.M.Brown at asu.edu Mon Aug 17 20:36:34 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 17 Aug 2009 17:36:34 -0700 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> The obfuscator does help, but even it is a little sparse on data for modules. Especially information on the realities of the returned data from a method call. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger Sent: Monday, August 17, 2009 6:04 AM To: Mark A. Jensen; BioPerl List Subject: Re: [Bioperl-l] on BP documentation > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From sidd.basu at gmail.com Tue Aug 18 07:01:03 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 18 Aug 2009 06:01:03 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> Message-ID: <20090818110102.GA27010@seinfeld> Putting it in the bioperl list, makes more sense here, On Wed, 12 Aug 2009, Chris Fields wrote: > (BTW, this is re: the reimplementation of major chunks of BioPerl using > Moose, Biome: http://github.com/cjfields/biome/tree/) > > Locations should use a Role (specifically, Biome::Role::Range), so > start/end/strand should be attributes, not methods. With attributes the > best way to do this is probably with a builder, and lazily (start > requires end, and vice versa). Factor out the common code as Tomas > indicates. BTW, the $self->throw() is akin to BioPerl's $self->throw() > exception handling; it simply catches any exceptions and passes them to > the metaclass exception handling. > > I've been thinking about making the Range role abstract for this very > reason (or defining very basic attributes); something like: > > ---------------------------- > > package Bio::Role::Range; > > requires qw(_build_start _build_end _build_strand); > > # also require other methods which need to be defined in implementation > > has 'start' => ( > isa => 'Int', > is => 'rw', > builder => '_build_start', > lazy => 1 > ); > > # same for end, strand (except strand has a different isa via > MooseX::Types) > .... > > package Bio::Location::Foo; > > with 'Bio::Role::Range'; > > sub _build_start { > # for location-specific start > } > > sub _build_end { > # for location-specific end > } > > sub _build_strand { > # for location-specific strand > } > > sub _common_build_method { > # factor out common code here, call from other builders > } > > ---------------------------- This plan makes things much clearer. Currently the BioMe::Role::Location has a 'requires' keyword and rest of the location modules consume that role to have its own implementation. At this point on BioMe::Location::Atomic has attribute based 'start' and 'end' implememtation. I got a bit confused because in current bioperl 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when i am trying to follow that path in BioMe it has to override that method. So, my question is do all the location modules really needs to inherits from each other. I am totally aware about the origianl design ideas but it would be better to have a flatten hierarchy if possible. One more thing, what about putting the 'start', 'end' and the other common base attributes in BioMe::Role::Location instead of BioMe::Role::Range. I am not sure which would be correct from bioperl stand of view, just throwing out an idea. > > Also, I think the Coordinate-related stuff should be simplified down to a > trait or an attribute; they bring in way too much overhead in bioperl w/o > much added value. You mean instead of having 'builder' method, having a specialized traits handling those. That sounds like even better. -siddhartha > > And now back to your regular Moose-related broadcast... > > chris > > On Aug 11, 2009, at 9:27 PM, Siddhartha Basu wrote: > > > Hi, > > In one my classes i have this boilerplate code block that is repeated > > all > > over .... > > > > sub start { > > my ( $self, $value ) = @_; > > $self->{'_start'} = $value if defined $value; > > > > ## -- from here > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > return $self->{'_start'}; > > ## -- here > > > > } > > > > then again .... > > > > sub end { > > my ( $self, $value ) = @_; > > > > $self->{'_end'} = $value if defined $value; > > > > #assume end is the same as start if not defined > > if ( !defined $self->{'_end'} ) { > > if ( !defined $self->{'_start'} ) { > > $self->warn('Calling end without a defined start > > position'); > > return; > > } > > $self->warn('Setting start equal to end'); > > $self->{'_end'} = $self->{'_start'}; > > } > > > > ## ---- > > > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > > > return $self->{'_end'}; > > #--------- > > } > > > > > > Is there any way moose can be used here for more code resuage. I > > thought > > about converted it to a type but still couldn't figure out how that > > can > > be done. > > > > > > thanks, > > -siddhartha > From deequan at gmail.com Fri Aug 14 15:02:06 2009 From: deequan at gmail.com (David Quan) Date: Fri, 14 Aug 2009 15:02:06 -0400 Subject: [Bioperl-l] bioperl capability Message-ID: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Hello, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From ymc at yahoo.com Fri Aug 14 22:57:15 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 19:57:15 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <85143.35343.qm@web30404.mail.mud.yahoo.com> Hi Chris I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) Please let me know if it works for you. Sorry for the bug... Yee Man --- On Fri, 8/14/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Friday, August 14, 2009, 8:31 AM > Yee Man, > > I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 > 64-bit) and on dev.open-bio.org (which is perl 5.8.8, > appears to be 32-bit).? The patch results in cleaning > up warnings for 5.10.0 but results in similar warnings for > 5.8.8 (linux or OS X). > > On OS X perl 5.8.8, this sometimes passes (note the first > attempt fails, the second succeeds), so it's not entirely a > 32-bit issue: > > http://gist.github.com/167860 > > OS X and perl 5.10.0, this always fails as the previous > gist shows, but demonstrates similar behavior (multiple > attempts to test get different responses): > > http://gist.github.com/167542 > > On linux, everything passes with or w/o the patched files > (patched files have warnings as indicated above): > > Specs for all three perl executables (they vary a bit): > > http://gist.github.com/167883 > > chris > > On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > > > Ah.. I find that the typemap can become as simple as > this > > ===================== > > TYPEMAP > > HMM *? ? T_PTROBJ > > ===================== > > > > Then the generated HMM.c will have a function called > INT2PTR to do the pointer conversion. I believe this should > solve the warnings. > > > > Attached are the updated HMM.xs and typemap. Can > someone with a 64-bit machine give it a try? > > > > Thank you > > Yee Man > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >> Date: Thursday, August 13, 2009, 5:31 PM > >> (just to point out to everyone, Yee > >> Man's contact information was in the POD) > >> > >> Yee Man, > >> > >> I have the output in the below link: > >> > >> http://gist.github.com/167542 > >> > >> There are similar problems popping up on 32- and > 64-bit > >> perl 5.10.0, Mac OS X 10.5.? Haven't had time > to debug > >> it unfortunately. > >> > >> I think we should seriously consider spinning this > code off > >> into it's own distribution for CPAN.? It's > >> unfortunately bit-rotting away in > bioperl-ext.? If you > >> want to continue supporting it I can help set that > up. > >> > >> chris > >> > >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > >> > >>> Hi > >>> > >>>? ???So is this an HMM only > problem? Or does > >> it apply to other bioperl-ext modules? > >>> > >>>? ???What exactly are the > compilation errors > >> for HMM? I believe my implementation is just a > simple one > >> based on Rabiner's paper. > >>> > >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>> > >>>? ???I don't think I did > anything fancy that > >> makes it machine dependent or non-ANSI C. > >>> > >>> Yee Man > >>> > >>> --- On Thu, 8/13/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Jonny Dalzell" , > >> "BioPerl List" , > >> "Yee Man Chan" > >>>> Date: Thursday, August 13, 2009, 3:18 PM > >>>> > >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels > wrote: > >>>> > >>>>> Jonny Dalzell wrote: > >>>>>> Is it ridiculous of me to expect > ubuntu to > >> take > >>>> care of this for me?? How do > >>>>>> I go about compiling the HMM? > >>>>> Yes.? This is a very specialized > thing > >> that > >>>> you're doing, and Ubuntu does not have > the > >> resources to > >>>> package every single thing. > >>>>> > >>>>> Unfortunately, it looks like > bioperl-ext > >> package is > >>>> not installable under Ubuntu 9.04 anyway, > which is > >> what I'm > >>>> running.? For others on this list, > if > >> somebody is > >>>> interested in doing maintaining it, I'd be > happy > >> to help out > >>>> by testing on Debian-based Linux > platforms. > >> We need to > >>>> clarify this package's maintenance status: > if > >> there is > >>>> nobody interested in maintaining it, I > would > >> recommend that > >>>> bioperl-ext be removed from distribution. > >> It's not in > >>>> anybody's interest to have unmaintained > software > >> out there > >>>> causing confusion. > >>>> > >>>> I have cc'd Yee Man Chan for this.? > If there > >> isn't a > >>>> response or the message bounces, we do one > of two > >> things: > >>>> > >>>> 1) consider it deprecated (probably > safest). > >>>> 2) spin it out into a separate module. > >>>> > >>>> Just tried to comile it myself and am > getting > >> errors (using > >>>> 64bit perl 5.10), so I think, unless > someone wants > >> to take > >>>> this on, option #1 is best. > >>>> > >>>>> So Jonny, in short, I would say "do > not use > >>>> bioperl-ext". > >>>> > >>>> In general, that's a safe bet.? We're > moving > >> most of > >>>> our C/C++ bindings to BioLib. > >>>> > >>>>> Step back.? What are you trying > to > >>>> accomplish?? Chris already > recommended some > >> alternative > >>>> methods in his email of 8/11 on this > >> subject.? Perhaps > >>>> we can guide you to some software that is > >> actively > >>>> maintained and will meet your needs. > >>>>> > >>>>> Rob > >>>> > >>>> Exactly.? Lots of other (better > supported!) > >> options > >>>> out there.? HMMER, SeqAn, and > others. > >>>> > >>>> chris > >>>> > >>> > >>> > >>> > >> > >> > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam?? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5614 bytes Desc: not available URL: From ymc at yahoo.com Sat Aug 15 21:23:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 18:23:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> Message-ID: <241652.96493.qm@web30404.mail.mud.yahoo.com> I just committed HMM.xs and typemap to SVN. Can you test it to confirm it works in 64-bit machines? Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Saturday, August 15, 2009, 12:11 PM > I'm not sure, but it makes more sense > to commit these changes directly.? Yee, need us to set > you up with a commit bit?? If so, fill out the > information on this page: > > http://www.bioperl.org/wiki/SVN_Account_Request > > and forward it to support at open-bio.org.? > I'll sponsor you. > > chris > > On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > > > The usual procedure for developing code is to exchange > code via commits to a version control system.? Yee, do > you know how to use Subversion? Does Yee need a commit bit? > > > > Rob > > > > Yee Man Chan wrote: > >> Hi Chris > >>???I find that there is a memory > access bug in my code. Attached is the fixed HMM.xs. This > file together with the simpler typemap should fix all > problems. (I hope..) > >>???Please let me know if it works > for you. > >> Sorry for the bug... > >> Yee Man > >> --- On Fri, 8/14/09, Chris Fields > wrote: > >>> From: Chris Fields > >>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext package on WinVista? > >>> To: "Yee Man Chan" > >>> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >>> Date: Friday, August 14, 2009, 8:31 AM > >>> Yee Man, > >>> > >>> I tested this out locally (perl 5.8.8 32-bit, > perl 5.10.0 > >>> 64-bit) and on dev.open-bio.org (which is perl > 5.8.8, > >>> appears to be 32-bit).? The patch results > in cleaning > >>> up warnings for 5.10.0 but results in similar > warnings for > >>> 5.8.8 (linux or OS X). > >>> > >>> On OS X perl 5.8.8, this sometimes passes > (note the first > >>> attempt fails, the second succeeds), so it's > not entirely a > >>> 32-bit issue: > >>> > >>> http://gist.github.com/167860 > >>> > >>> OS X and perl 5.10.0, this always fails as the > previous > >>> gist shows, but demonstrates similar behavior > (multiple > >>> attempts to test get different responses): > >>> > >>> http://gist.github.com/167542 > >>> > >>> On linux, everything passes with or w/o the > patched files > >>> (patched files have warnings as indicated > above): > >>> > >>> Specs for all three perl executables (they > vary a bit): > >>> > >>> http://gist.github.com/167883 > >>> > >>> chris > >>> > >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan > wrote: > >>> > >>>> Ah.. I find that the typemap can become as > simple as > >>> this > >>>> ===================== > >>>> TYPEMAP > >>>> HMM *? ? T_PTROBJ > >>>> ===================== > >>>> > >>>> Then the generated HMM.c will have a > function called > >>> INT2PTR to do the pointer conversion. I > believe this should > >>> solve the warnings. > >>>> Attached are the updated HMM.xs and > typemap. Can > >>> someone with a 64-bit machine give it a try? > >>>> Thank you > >>>> Yee Man > >>>> --- On Thu, 8/13/09, Chris Fields > >>> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >>> package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >>> "Jonny Dalzell" , > >>> "BioPerl List" > >>>>> Date: Thursday, August 13, 2009, 5:31 > PM > >>>>> (just to point out to everyone, Yee > >>>>> Man's contact information was in the > POD) > >>>>> > >>>>> Yee Man, > >>>>> > >>>>> I have the output in the below link: > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> There are similar problems popping up > on 32- and > >>> 64-bit > >>>>> perl 5.10.0, Mac OS X 10.5.? > Haven't had time > >>> to debug > >>>>> it unfortunately. > >>>>> > >>>>> I think we should seriously consider > spinning this > >>> code off > >>>>> into it's own distribution for > CPAN.? It's > >>>>> unfortunately bit-rotting away in > >>> bioperl-ext.? If you > >>>>> want to continue supporting it I can > help set that > >>> up. > >>>>> chris > >>>>> > >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man > Chan wrote: > >>>>> > >>>>>> Hi > >>>>>> > >>>>>>? ???So is this > an HMM only > >>> problem? Or does > >>>>> it apply to other bioperl-ext > modules? > >>>>>>? ???What > exactly are the > >>> compilation errors > >>>>> for HMM? I believe my implementation > is just a > >>> simple one > >>>>> based on Rabiner's paper. > >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>> > >>>>>>? ???I don't > think I did > >>> anything fancy that > >>>>> makes it machine dependent or non-ANSI > C. > >>>>>> Yee Man > >>>>>> > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >>> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Robert Buels" > >>>>>>> Cc: "Jonny Dalzell" , > >>>>> "BioPerl List" , > >>>>> "Yee Man Chan" > >>>>>>> Date: Thursday, August 13, > 2009, 3:18 PM > >>>>>>> > >>>>>>> On Aug 13, 2009, at 4:37 PM, > Robert Buels > >>> wrote: > >>>>>>>> Jonny Dalzell wrote: > >>>>>>>>> Is it ridiculous of me > to expect > >>> ubuntu to > >>>>> take > >>>>>>> care of this for me?? How > do > >>>>>>>>> I go about compiling > the HMM? > >>>>>>>> Yes.? This is a very > specialized > >>> thing > >>>>> that > >>>>>>> you're doing, and Ubuntu does > not have > >>> the > >>>>> resources to > >>>>>>> package every single thing. > >>>>>>>> Unfortunately, it looks > like > >>> bioperl-ext > >>>>> package is > >>>>>>> not installable under Ubuntu > 9.04 anyway, > >>> which is > >>>>> what I'm > >>>>>>> running.? For others on > this list, > >>> if > >>>>> somebody is > >>>>>>> interested in doing > maintaining it, I'd be > >>> happy > >>>>> to help out > >>>>>>> by testing on Debian-based > Linux > >>> platforms. > >>>>> We need to > >>>>>>> clarify this package's > maintenance status: > >>> if > >>>>> there is > >>>>>>> nobody interested in > maintaining it, I > >>> would > >>>>> recommend that > >>>>>>> bioperl-ext be removed from > distribution. > >>>>> It's not in > >>>>>>> anybody's interest to have > unmaintained > >>> software > >>>>> out there > >>>>>>> causing confusion. > >>>>>>> > >>>>>>> I have cc'd Yee Man Chan for > this. > >>> If there > >>>>> isn't a > >>>>>>> response or the message > bounces, we do one > >>> of two > >>>>> things: > >>>>>>> 1) consider it deprecated > (probably > >>> safest). > >>>>>>> 2) spin it out into a separate > module. > >>>>>>> > >>>>>>> Just tried to comile it myself > and am > >>> getting > >>>>> errors (using > >>>>>>> 64bit perl 5.10), so I think, > unless > >>> someone wants > >>>>> to take > >>>>>>> this on, option #1 is best. > >>>>>>> > >>>>>>>> So Jonny, in short, I > would say "do > >>> not use > >>>>>>> bioperl-ext". > >>>>>>> > >>>>>>> In general, that's a safe > bet.? We're > >>> moving > >>>>> most of > >>>>>>> our C/C++ bindings to BioLib. > >>>>>>> > >>>>>>>> Step back.? What are > you trying > >>> to > >>>>>>> accomplish?? Chris > already > >>> recommended some > >>>>> alternative > >>>>>>> methods in his email of 8/11 > on this > >>>>> subject.? Perhaps > >>>>>>> we can guide you to some > software that is > >>>>> actively > >>>>>>> maintained and will meet your > needs. > >>>>>>>> Rob > >>>>>>> Exactly.? Lots of other > (better > >>> supported!) > >>>>> options > >>>>>>> out there.? HMMER, SeqAn, > and > >>> others. > >>>>>>> chris > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > __________________________________________________ > >>>> Do You Yahoo!? > >>>> Tired of spam?? Yahoo! Mail has the > best spam > >>> protection around > >>>> http://mail.yahoo.com > >>> > _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > > > > > > --Robert Buels > > Bioinformatics Analyst, Sol Genomics Network > > Boyce Thompson Institute for Plant Research > > Tower Rd > > Ithaca, NY? 14853 > > Tel: 503-889-8539 > > rmb32 at cornell.edu > > http://www.sgn.cornell.edu > > From ymc at yahoo.com Sun Aug 16 00:32:19 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 21:32:19 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <846546.73578.qm@web30404.mail.mud.yahoo.com> When are you going to release 1.6? Maybe let me work on it before it releases. If it doesn't resolve the problem, then we can think about other alternatives. Also, please show me the latest errors you have for 5.10.0. Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 7:05 PM > I'm still seeing the same errors on > Mac OS X for 64-bit perl 5.10.0.? Mac OS X, native perl > (v5.8.8) passes fine now (as well as perl 5.8.8 on > dev.open-bio.org). > > I'm wondering if this is a problem with my local perl > build.? I'm very tempted to push the HMM-related code > into a separate distribution (bioperl-hmm) and make a CPAN > release out of it so it gets wider testing via CPAN testers; > it would just require a minimum bioperl 1.6 installation for > Bio::Tools::HMM and any related modules.? Yee, would > that be okay with you? > > chris > > On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > > > > I just committed HMM.xs and typemap to SVN. Can you > test it to confirm it works in 64-bit machines? > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 12:11 PM > >> I'm not sure, but it makes more sense > >> to commit these changes directly.? Yee, need > us to set > >> you up with a commit bit?? If so, fill out > the > >> information on this page: > >> > >> http://www.bioperl.org/wiki/SVN_Account_Request > >> > >> and forward it to support at open-bio.org. > >> I'll sponsor you. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > >> > >>> The usual procedure for developing code is to > exchange > >> code via commits to a version control > system.? Yee, do > >> you know how to use Subversion? Does Yee need a > commit bit? > >>> > >>> Rob > >>> > >>> Yee Man Chan wrote: > >>>> Hi Chris > >>>>? ? I find that there is a > memory > >> access bug in my code. Attached is the fixed > HMM.xs. This > >> file together with the simpler typemap should fix > all > >> problems. (I hope..) > >>>>? ? Please let me know if it > works > >> for you. > >>>> Sorry for the bug... > >>>> Yee Man > >>>> --- On Fri, 8/14/09, Chris Fields > >> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >> "Jonny Dalzell" , > >> "BioPerl List" > >>>>> Date: Friday, August 14, 2009, 8:31 > AM > >>>>> Yee Man, > >>>>> > >>>>> I tested this out locally (perl 5.8.8 > 32-bit, > >> perl 5.10.0 > >>>>> 64-bit) and on dev.open-bio.org (which > is perl > >> 5.8.8, > >>>>> appears to be 32-bit).? The patch > results > >> in cleaning > >>>>> up warnings for 5.10.0 but results in > similar > >> warnings for > >>>>> 5.8.8 (linux or OS X). > >>>>> > >>>>> On OS X perl 5.8.8, this sometimes > passes > >> (note the first > >>>>> attempt fails, the second succeeds), > so it's > >> not entirely a > >>>>> 32-bit issue: > >>>>> > >>>>> http://gist.github.com/167860 > >>>>> > >>>>> OS X and perl 5.10.0, this always > fails as the > >> previous > >>>>> gist shows, but demonstrates similar > behavior > >> (multiple > >>>>> attempts to test get different > responses): > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> On linux, everything passes with or > w/o the > >> patched files > >>>>> (patched files have warnings as > indicated > >> above): > >>>>> > >>>>> Specs for all three perl executables > (they > >> vary a bit): > >>>>> > >>>>> http://gist.github.com/167883 > >>>>> > >>>>> chris > >>>>> > >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man > Chan > >> wrote: > >>>>> > >>>>>> Ah.. I find that the typemap can > become as > >> simple as > >>>>> this > >>>>>> ===================== > >>>>>> TYPEMAP > >>>>>> HMM *? ? T_PTROBJ > >>>>>> ===================== > >>>>>> > >>>>>> Then the generated HMM.c will have > a > >> function called > >>>>> INT2PTR to do the pointer conversion. > I > >> believe this should > >>>>> solve the warnings. > >>>>>> Attached are the updated HMM.xs > and > >> typemap. Can > >>>>> someone with a 64-bit machine give it > a try? > >>>>>> Thank you > >>>>>> Yee Man > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>>> "Jonny Dalzell" , > >>>>> "BioPerl List" > >>>>>>> Date: Thursday, August 13, > 2009, 5:31 > >> PM > >>>>>>> (just to point out to > everyone, Yee > >>>>>>> Man's contact information was > in the > >> POD) > >>>>>>> > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I have the output in the below > link: > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> There are similar problems > popping up > >> on 32- and > >>>>> 64-bit > >>>>>>> perl 5.10.0, Mac OS X 10.5. > >> Haven't had time > >>>>> to debug > >>>>>>> it unfortunately. > >>>>>>> > >>>>>>> I think we should seriously > consider > >> spinning this > >>>>> code off > >>>>>>> into it's own distribution > for > >> CPAN.? It's > >>>>>>> unfortunately bit-rotting away > in > >>>>> bioperl-ext.? If you > >>>>>>> want to continue supporting it > I can > >> help set that > >>>>> up. > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 13, 2009, at 6:58 PM, > Yee Man > >> Chan wrote: > >>>>>>> > >>>>>>>> Hi > >>>>>>>> > >>>>>>>>? ? ? So is > this > >> an HMM only > >>>>> problem? Or does > >>>>>>> it apply to other bioperl-ext > >> modules? > >>>>>>>>? ? ? What > >> exactly are the > >>>>> compilation errors > >>>>>>> for HMM? I believe my > implementation > >> is just a > >>>>> simple one > >>>>>>> based on Rabiner's paper. > >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>> > >>>>>>>>? ? ? I > don't > >> think I did > >>>>> anything fancy that > >>>>>>> makes it machine dependent or > non-ANSI > >> C. > >>>>>>>> Yee Man > >>>>>>>> > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Robert Buels" > > >>>>>>>>> Cc: "Jonny Dalzell" > , > >>>>>>> "BioPerl List" , > >>>>>>> "Yee Man Chan" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 3:18 PM > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 4:37 PM, > >> Robert Buels > >>>>> wrote: > >>>>>>>>>> Jonny Dalzell > wrote: > >>>>>>>>>>> Is it > ridiculous of me > >> to expect > >>>>> ubuntu to > >>>>>>> take > >>>>>>>>> care of this for > me?? How > >> do > >>>>>>>>>>> I go about > compiling > >> the HMM? > >>>>>>>>>> Yes.? This is > a very > >> specialized > >>>>> thing > >>>>>>> that > >>>>>>>>> you're doing, and > Ubuntu does > >> not have > >>>>> the > >>>>>>> resources to > >>>>>>>>> package every single > thing. > >>>>>>>>>> Unfortunately, it > looks > >> like > >>>>> bioperl-ext > >>>>>>> package is > >>>>>>>>> not installable under > Ubuntu > >> 9.04 anyway, > >>>>> which is > >>>>>>> what I'm > >>>>>>>>> running.? For > others on > >> this list, > >>>>> if > >>>>>>> somebody is > >>>>>>>>> interested in doing > >> maintaining it, I'd be > >>>>> happy > >>>>>>> to help out > >>>>>>>>> by testing on > Debian-based > >> Linux > >>>>> platforms. > >>>>>>> We need to > >>>>>>>>> clarify this > package's > >> maintenance status: > >>>>> if > >>>>>>> there is > >>>>>>>>> nobody interested in > >> maintaining it, I > >>>>> would > >>>>>>> recommend that > >>>>>>>>> bioperl-ext be removed > from > >> distribution. > >>>>>>> It's not in > >>>>>>>>> anybody's interest to > have > >> unmaintained > >>>>> software > >>>>>>> out there > >>>>>>>>> causing confusion. > >>>>>>>>> > >>>>>>>>> I have cc'd Yee Man > Chan for > >> this. > >>>>> If there > >>>>>>> isn't a > >>>>>>>>> response or the > message > >> bounces, we do one > >>>>> of two > >>>>>>> things: > >>>>>>>>> 1) consider it > deprecated > >> (probably > >>>>> safest). > >>>>>>>>> 2) spin it out into a > separate > >> module. > >>>>>>>>> > >>>>>>>>> Just tried to comile > it myself > >> and am > >>>>> getting > >>>>>>> errors (using > >>>>>>>>> 64bit perl 5.10), so I > think, > >> unless > >>>>> someone wants > >>>>>>> to take > >>>>>>>>> this on, option #1 is > best. > >>>>>>>>> > >>>>>>>>>> So Jonny, in > short, I > >> would say "do > >>>>> not use > >>>>>>>>> bioperl-ext". > >>>>>>>>> > >>>>>>>>> In general, that's a > safe > >> bet.? We're > >>>>> moving > >>>>>>> most of > >>>>>>>>> our C/C++ bindings to > BioLib. > >>>>>>>>> > >>>>>>>>>> Step back.? > What are > >> you trying > >>>>> to > >>>>>>>>> accomplish?? > Chris > >> already > >>>>> recommended some > >>>>>>> alternative > >>>>>>>>> methods in his email > of 8/11 > >> on this > >>>>>>> subject.? Perhaps > >>>>>>>>> we can guide you to > some > >> software that is > >>>>>>> actively > >>>>>>>>> maintained and will > meet your > >> needs. > >>>>>>>>>> Rob > >>>>>>>>> Exactly.? Lots of > other > >> (better > >>>>> supported!) > >>>>>>> options > >>>>>>>>> out there.? > HMMER, SeqAn, > >> and > >>>>> others. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >> > __________________________________________________ > >>>>>> Do You Yahoo!? > >>>>>> Tired of spam?? Yahoo! Mail > has the > >> best spam > >>>>> protection around > >>>>>> http://mail.yahoo.com > >>>>> > >> > _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>> > >>> > >>> --Robert Buels > >>> Bioinformatics Analyst, Sol Genomics Network > >>> Boyce Thompson Institute for Plant Research > >>> Tower Rd > >>> Ithaca, NY? 14853 > >>> Tel: 503-889-8539 > >>> rmb32 at cornell.edu > >>> http://www.sgn.cornell.edu > >> > >> > > > > > > > > From ymc at yahoo.com Sun Aug 16 05:36:59 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 02:36:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <217259.7083.qm@web30408.mail.mud.yahoo.com> Hi Chris Thanks for your suggestions. I think it is indeed better to check sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm I also fixed codes that will lead to warnings with use warnings. So now the only problem left is that "monotonic increasing" error. For that part of the code, I was trying to perform an expectation maximization step. Theoretically, the expectation should monotonically increase in every step. But I suppose this is not necessarily true when double precision floating point numbers are involved. I don't know why I used a 1e-100 tolerance for this. Therefore I "fixed" it by using the same tolerance to terminate the maximization step (ie .000001). I suppose this "fix" will make it much more unlikely to throw exception with your 5.10.0 perl. Can you give that a try again and see if it works now. Thank you Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 10:38 PM > Yee, > > I took the liberty of making a few simple changes to > Bio::Tools::HMM in svn to point out the problem and possible > solutions.? Feel free to revert these as needed. > > I'm seeing two errors, which appear randomly when running > 'make test'.? The first is easily fixable, the second, > I'm not so sure.? I'll let you make the decisions on > both. > > 1)? There is an assumption in the module that, when > adding floating points, you will always get 1.0.? You > may run into problems: see 'perldoc -q long decimals'.? > Lines like this (two places in the module): > ? ... > ? if ($sum != 1.0) { > ? ???$self->throw("Sum of > probabilities for each state must be 1.0; got $sum\n"); > ? } > ? ... > > won't work as expected (note I added a simple diagnostic, > just print out the 'bad' sum).? With perl 5.8.8, this > appears to work fine, but this is what I get with perl 5.10 > (64-bit): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > Initial Probability Array: > 0.499978??? 0.500022??? > Transition Probability Matrix: > 0.499978??? 0.500022??? > 0.499978??? 0.500022??? > Emission Probability Matrix: > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > > Log Probability of sequence 1: -521.808 > Log Probability of sequence 2: -426.057 > > Statistical Training > ==================== > Initial Probability Array: > 1??? 0??? > Transition Probability Matrix: > > ------------- EXCEPTION ------------- > MSG: Sum of probabilities for each from-state must be 1.0; > got 0.999999999999999976 > > STACK Bio::Tools::HMM::transition_prob > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > STACK toplevel test.pl:82 > ------------------------------------- > > make: *** [test_dynamic] Error 255 > > I'm assuming this needs to simply be rounded up to > 1.0.? That could be accomplished with something like > 'if (sprintf("%.2f", $sum) != 1.0) {...}' > > 2) The second error is a little stranger.? I have been > randomly getting this: > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > When I add strict and warnings pragmas to Bio::Tools::HMM > (with a little additional cleanup to get things running), I > get an additional warning (arrow): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Argument "FL" isn't numeric in numeric lt (<) at > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > 188. <---- > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > So something is not being converted as expected. > > chris > > On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > > > When are you going to release 1.6? Maybe let me work > on it before it releases. If it doesn't resolve the problem, > then we can think about other alternatives. > > > > Also, please show me the latest errors you have for > 5.10.0. > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 7:05 PM > >> I'm still seeing the same errors on > >> Mac OS X for 64-bit perl 5.10.0.? Mac OS X, > native perl > >> (v5.8.8) passes fine now (as well as perl 5.8.8 > on > >> dev.open-bio.org). > >> > >> I'm wondering if this is a problem with my local > perl > >> build.? I'm very tempted to push the > HMM-related code > >> into a separate distribution (bioperl-hmm) and > make a CPAN > >> release out of it so it gets wider testing via > CPAN testers; > >> it would just require a minimum bioperl 1.6 > installation for > >> Bio::Tools::HMM and any related modules.? > Yee, would > >> that be okay with you? > >> > >> chris > >> > >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > >> > >>> > >>> I just committed HMM.xs and typemap to SVN. > Can you > >> test it to confirm it works in 64-bit machines? > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Yee Man Chan" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 12:11 PM > >>>> I'm not sure, but it makes more sense > >>>> to commit these changes directly.? > Yee, need > >> us to set > >>>> you up with a commit bit?? If so, > fill out > >> the > >>>> information on this page: > >>>> > >>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>> > >>>> and forward it to support at open-bio.org. > >>>> I'll sponsor you. > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels > wrote: > >>>> > >>>>> The usual procedure for developing > code is to > >> exchange > >>>> code via commits to a version control > >> system.? Yee, do > >>>> you know how to use Subversion? Does Yee > need a > >> commit bit? > >>>>> > >>>>> Rob > >>>>> > >>>>> Yee Man Chan wrote: > >>>>>> Hi Chris > >>>>>>? ???I find > that there is a > >> memory > >>>> access bug in my code. Attached is the > fixed > >> HMM.xs. This > >>>> file together with the simpler typemap > should fix > >> all > >>>> problems. (I hope..) > >>>>>>? ???Please let > me know if it > >> works > >>>> for you. > >>>>>> Sorry for the bug... > >>>>>> Yee Man > >>>>>> --- On Fri, 8/14/09, Chris Fields > > >>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems > >> with > >>>> Bioperl-ext package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>> "Jonny Dalzell" , > >>>> "BioPerl List" > >>>>>>> Date: Friday, August 14, 2009, > 8:31 > >> AM > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I tested this out locally > (perl 5.8.8 > >> 32-bit, > >>>> perl 5.10.0 > >>>>>>> 64-bit) and on > dev.open-bio.org (which > >> is perl > >>>> 5.8.8, > >>>>>>> appears to be 32-bit).? > The patch > >> results > >>>> in cleaning > >>>>>>> up warnings for 5.10.0 but > results in > >> similar > >>>> warnings for > >>>>>>> 5.8.8 (linux or OS X). > >>>>>>> > >>>>>>> On OS X perl 5.8.8, this > sometimes > >> passes > >>>> (note the first > >>>>>>> attempt fails, the second > succeeds), > >> so it's > >>>> not entirely a > >>>>>>> 32-bit issue: > >>>>>>> > >>>>>>> http://gist.github.com/167860 > >>>>>>> > >>>>>>> OS X and perl 5.10.0, this > always > >> fails as the > >>>> previous > >>>>>>> gist shows, but demonstrates > similar > >> behavior > >>>> (multiple > >>>>>>> attempts to test get > different > >> responses): > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> On linux, everything passes > with or > >> w/o the > >>>> patched files > >>>>>>> (patched files have warnings > as > >> indicated > >>>> above): > >>>>>>> > >>>>>>> Specs for all three perl > executables > >> (they > >>>> vary a bit): > >>>>>>> > >>>>>>> http://gist.github.com/167883 > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 14, 2009, at 3:27 AM, > Yee Man > >> Chan > >>>> wrote: > >>>>>>> > >>>>>>>> Ah.. I find that the > typemap can > >> become as > >>>> simple as > >>>>>>> this > >>>>>>>> ===================== > >>>>>>>> TYPEMAP > >>>>>>>> HMM *? ? > T_PTROBJ > >>>>>>>> ===================== > >>>>>>>> > >>>>>>>> Then the generated HMM.c > will have > >> a > >>>> function called > >>>>>>> INT2PTR to do the pointer > conversion. > >> I > >>>> believe this should > >>>>>>> solve the warnings. > >>>>>>>> Attached are the updated > HMM.xs > >> and > >>>> typemap. Can > >>>>>>> someone with a 64-bit machine > give it > >> a try? > >>>>>>>> Thank you > >>>>>>>> Yee Man > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>>> "Jonny Dalzell" , > >>>>>>> "BioPerl List" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 5:31 > >>>> PM > >>>>>>>>> (just to point out to > >> everyone, Yee > >>>>>>>>> Man's contact > information was > >> in the > >>>> POD) > >>>>>>>>> > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I have the output in > the below > >> link: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> There are similar > problems > >> popping up > >>>> on 32- and > >>>>>>> 64-bit > >>>>>>>>> perl 5.10.0, Mac OS X > 10.5. > >>>> Haven't had time > >>>>>>> to debug > >>>>>>>>> it unfortunately. > >>>>>>>>> > >>>>>>>>> I think we should > seriously > >> consider > >>>> spinning this > >>>>>>> code off > >>>>>>>>> into it's own > distribution > >> for > >>>> CPAN.? It's > >>>>>>>>> unfortunately > bit-rotting away > >> in > >>>>>>> bioperl-ext.? If you > >>>>>>>>> want to continue > supporting it > >> I can > >>>> help set that > >>>>>>> up. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 6:58 PM, > >> Yee Man > >>>> Chan wrote: > >>>>>>>>> > >>>>>>>>>> Hi > >>>>>>>>>> > >>>>>>>>>>? ? > ???So is > >> this > >>>> an HMM only > >>>>>>> problem? Or does > >>>>>>>>> it apply to other > bioperl-ext > >>>> modules? > >>>>>>>>>>? ? > ???What > >>>> exactly are the > >>>>>>> compilation errors > >>>>>>>>> for HMM? I believe my > >> implementation > >>>> is just a > >>>>>>> simple one > >>>>>>>>> based on Rabiner's > paper. > >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>> > >>>>>>>>>>? ? > ???I > >> don't > >>>> think I did > >>>>>>> anything fancy that > >>>>>>>>> makes it machine > dependent or > >> non-ANSI > >>>> C. > >>>>>>>>>> Yee Man > >>>>>>>>>> > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Robert > Buels" > >> > >>>>>>>>>>> Cc: "Jonny > Dalzell" > >> , > >>>>>>>>> "BioPerl List" , > >>>>>>>>> "Yee Man Chan" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 3:18 PM > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 4:37 PM, > >>>> Robert Buels > >>>>>>> wrote: > >>>>>>>>>>>> Jonny > Dalzell > >> wrote: > >>>>>>>>>>>>> Is it > >> ridiculous of me > >>>> to expect > >>>>>>> ubuntu to > >>>>>>>>> take > >>>>>>>>>>> care of this > for > >> me?? How > >>>> do > >>>>>>>>>>>>> I go > about > >> compiling > >>>> the HMM? > >>>>>>>>>>>> Yes.? > This is > >> a very > >>>> specialized > >>>>>>> thing > >>>>>>>>> that > >>>>>>>>>>> you're doing, > and > >> Ubuntu does > >>>> not have > >>>>>>> the > >>>>>>>>> resources to > >>>>>>>>>>> package every > single > >> thing. > >>>>>>>>>>>> > Unfortunately, it > >> looks > >>>> like > >>>>>>> bioperl-ext > >>>>>>>>> package is > >>>>>>>>>>> not > installable under > >> Ubuntu > >>>> 9.04 anyway, > >>>>>>> which is > >>>>>>>>> what I'm > >>>>>>>>>>> running.? > For > >> others on > >>>> this list, > >>>>>>> if > >>>>>>>>> somebody is > >>>>>>>>>>> interested in > doing > >>>> maintaining it, I'd be > >>>>>>> happy > >>>>>>>>> to help out > >>>>>>>>>>> by testing on > >> Debian-based > >>>> Linux > >>>>>>> platforms. > >>>>>>>>> We need to > >>>>>>>>>>> clarify this > >> package's > >>>> maintenance status: > >>>>>>> if > >>>>>>>>> there is > >>>>>>>>>>> nobody > interested in > >>>> maintaining it, I > >>>>>>> would > >>>>>>>>> recommend that > >>>>>>>>>>> bioperl-ext be > removed > >> from > >>>> distribution. > >>>>>>>>> It's not in > >>>>>>>>>>> anybody's > interest to > >> have > >>>> unmaintained > >>>>>>> software > >>>>>>>>> out there > >>>>>>>>>>> causing > confusion. > >>>>>>>>>>> > >>>>>>>>>>> I have cc'd > Yee Man > >> Chan for > >>>> this. > >>>>>>> If there > >>>>>>>>> isn't a > >>>>>>>>>>> response or > the > >> message > >>>> bounces, we do one > >>>>>>> of two > >>>>>>>>> things: > >>>>>>>>>>> 1) consider > it > >> deprecated > >>>> (probably > >>>>>>> safest). > >>>>>>>>>>> 2) spin it out > into a > >> separate > >>>> module. > >>>>>>>>>>> > >>>>>>>>>>> Just tried to > comile > >> it myself > >>>> and am > >>>>>>> getting > >>>>>>>>> errors (using > >>>>>>>>>>> 64bit perl > 5.10), so I > >> think, > >>>> unless > >>>>>>> someone wants > >>>>>>>>> to take > >>>>>>>>>>> this on, > option #1 is > >> best. > >>>>>>>>>>> > >>>>>>>>>>>> So Jonny, > in > >> short, I > >>>> would say "do > >>>>>>> not use > >>>>>>>>>>> bioperl-ext". > >>>>>>>>>>> > >>>>>>>>>>> In general, > that's a > >> safe > >>>> bet.? We're > >>>>>>> moving > >>>>>>>>> most of > >>>>>>>>>>> our C/C++ > bindings to > >> BioLib. > >>>>>>>>>>> > >>>>>>>>>>>> Step > back. > >> What are > >>>> you trying > >>>>>>> to > >>>>>>>>>>> accomplish? > >> Chris > >>>> already > >>>>>>> recommended some > >>>>>>>>> alternative > >>>>>>>>>>> methods in his > email > >> of 8/11 > >>>> on this > >>>>>>>>> subject.? > Perhaps > >>>>>>>>>>> we can guide > you to > >> some > >>>> software that is > >>>>>>>>> actively > >>>>>>>>>>> maintained and > will > >> meet your > >>>> needs. > >>>>>>>>>>>> Rob > >>>>>>>>>>> Exactly.? > Lots of > >> other > >>>> (better > >>>>>>> supported!) > >>>>>>>>> options > >>>>>>>>>>> out there. > >> HMMER, SeqAn, > >>>> and > >>>>>>> others. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>> Do You Yahoo!? > >>>>>>>> Tired of spam?? > Yahoo! Mail > >> has the > >>>> best spam > >>>>>>> protection around > >>>>>>>> http://mail.yahoo.com > >>>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> --Robert Buels > >>>>> Bioinformatics Analyst, Sol Genomics > Network > >>>>> Boyce Thompson Institute for Plant > Research > >>>>> Tower Rd > >>>>> Ithaca, NY? 14853 > >>>>> Tel: 503-889-8539 > >>>>> rmb32 at cornell.edu > >>>>> http://www.sgn.cornell.edu > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Sun Aug 16 23:34:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 20:34:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> Message-ID: <474354.59886.qm@web30408.mail.mud.yahoo.com> Hi Chris Good to hear that it is working and thanks for testing. As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. So please pass around the updated code and test it extensively, if no one complains about the new code by the time of release, I would think it should go into the next bioperl-ext release. If people uncover new errors with the new code and the errors can't be fixed on time, then it should be spinned off. What do you think? Best Regards, Yee Man --- On Sun, 8/16/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Sunday, August 16, 2009, 5:53 AM > That worked!? Thanks Yee Man! > > chris > > ps - let me know how you want to deal with a release. > > On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > > > Hi Chris > > > >???Thanks for your suggestions. I think > it is indeed better to check? > > sum to 1.0 using sprintf. I fixed this in the newly > committed HMM.pm > > > >???I also fixed codes that will lead to > warnings with use warnings. > > > >???So now the only problem left is that > "monotonic increasing" error.? > > For that part of the code, I was trying to perform an > expectation? > > maximization step. Theoretically, the expectation > should? > > monotonically increase in every step. But I suppose > this is not? > > necessarily true when double precision floating point > numbers are? > > involved. I don't know why I used a 1e-100 tolerance > for this.? > > Therefore I "fixed" it by using the same tolerance to > terminate the? > > maximization step (ie .000001). I suppose this "fix" > will make it? > > much more unlikely to throw exception with your 5.10.0 > perl. > > > >???Can you give that a try again and see > if it works now. > > > > Thank you > > Yee Man > > > > > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on? > >> WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> > > >> Date: Saturday, August 15, 2009, 10:38 PM > >> Yee, > >> > >> I took the liberty of making a few simple changes > to > >> Bio::Tools::HMM in svn to point out the problem > and possible > >> solutions.? Feel free to revert these as > needed. > >> > >> I'm seeing two errors, which appear randomly when > running > >> 'make test'.? The first is easily fixable, > the second, > >> I'm not so sure.? I'll let you make the > decisions on > >> both. > >> > >> 1)? There is an assumption in the module > that, when > >> adding floating points, you will always get > 1.0.? You > >> may run into problems: see 'perldoc -q long > decimals'. > >> Lines like this (two places in the module): > >>???... > >>???if ($sum != 1.0) { > >>? ? ? $self->throw("Sum of > >> probabilities for each state must be 1.0; got > $sum\n"); > >>???} > >>???... > >> > >> won't work as expected (note I added a simple > diagnostic, > >> just print out the 'bad' sum).? With perl > 5.8.8, this > >> appears to work fine, but this is what I get with > perl 5.10 > >> (64-bit): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> Initial Probability Array: > >> 0.499978? ? 0.500022 > >> Transition Probability Matrix: > >> 0.499978? ? 0.500022 > >> 0.499978? ? 0.500022 > >> Emission Probability Matrix: > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> > >> Log Probability of sequence 1: -521.808 > >> Log Probability of sequence 2: -426.057 > >> > >> Statistical Training > >> ==================== > >> Initial Probability Array: > >> 1? ? 0 > >> Transition Probability Matrix: > >> > >> ------------- EXCEPTION ------------- > >> MSG: Sum of probabilities for each from-state must > be 1.0; > >> got 0.999999999999999976 > >> > >> STACK Bio::Tools::HMM::transition_prob > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > >> STACK toplevel test.pl:82 > >> ------------------------------------- > >> > >> make: *** [test_dynamic] Error 255 > >> > >> I'm assuming this needs to simply be rounded up > to > >> 1.0.? That could be accomplished with > something like > >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' > >> > >> 2) The second error is a little stranger.? I > have been > >> randomly getting this: > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> When I add strict and warnings pragmas to > Bio::Tools::HMM > >> (with a little additional cleanup to get things > running), I > >> get an additional warning (arrow): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Argument "FL" isn't numeric in numeric lt (<) > at > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > >> 188. <---- > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> So something is not being converted as expected. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > >> > >>> When are you going to release 1.6? Maybe let > me work > >> on it before it releases. If it doesn't resolve > the problem, > >> then we can think about other alternatives. > >>> > >>> Also, please show me the latest errors you > have for > >> 5.10.0. > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Yee Man Chan" > >>>> Cc: "Robert Buels" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 7:05 PM > >>>> I'm still seeing the same errors on > >>>> Mac OS X for 64-bit perl 5.10.0.? Mac > OS X, > >> native perl > >>>> (v5.8.8) passes fine now (as well as perl > 5.8.8 > >> on > >>>> dev.open-bio.org). > >>>> > >>>> I'm wondering if this is a problem with my > local > >> perl > >>>> build.? I'm very tempted to push the > >> HMM-related code > >>>> into a separate distribution (bioperl-hmm) > and > >> make a CPAN > >>>> release out of it so it gets wider testing > via > >> CPAN testers; > >>>> it would just require a minimum bioperl > 1.6 > >> installation for > >>>> Bio::Tools::HMM and any related modules. > >> Yee, would > >>>> that be okay with you? > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan > wrote: > >>>> > >>>>> > >>>>> I just committed HMM.xs and typemap to > SVN. > >> Can you > >>>> test it to confirm it works in 64-bit > machines? > >>>>> > >>>>> Thanks > >>>>> Yee Man > >>>>> > >>>>> --- On Sat, 8/15/09, Chris Fields > > >>>> wrote: > >>>>> > >>>>>> From: Chris Fields > >>>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext > >>>> package on WinVista? > >>>>>> To: "Robert Buels" > >>>>>> Cc: "Yee Man Chan" , > >>>> "BioPerl List" > >>>>>> Date: Saturday, August 15, 2009, > 12:11 PM > >>>>>> I'm not sure, but it makes more > sense > >>>>>> to commit these changes directly. > >> Yee, need > >>>> us to set > >>>>>> you up with a commit bit?? If > so, > >> fill out > >>>> the > >>>>>> information on this page: > >>>>>> > >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>>>> > >>>>>> and forward it to support at open-bio.org. > >>>>>> I'll sponsor you. > >>>>>> > >>>>>> chris > >>>>>> > >>>>>> On Aug 15, 2009, at 11:44 AM, > Robert Buels > >> wrote: > >>>>>> > >>>>>>> The usual procedure for > developing > >> code is to > >>>> exchange > >>>>>> code via commits to a version > control > >>>> system.? Yee, do > >>>>>> you know how to use Subversion? > Does Yee > >> need a > >>>> commit bit? > >>>>>>> > >>>>>>> Rob > >>>>>>> > >>>>>>> Yee Man Chan wrote: > >>>>>>>> Hi Chris > >>>>>>>>? ? ? I > find > >> that there is a > >>>> memory > >>>>>> access bug in my code. Attached is > the > >> fixed > >>>> HMM.xs. This > >>>>>> file together with the simpler > typemap > >> should fix > >>>> all > >>>>>> problems. (I hope..) > >>>>>>>>? ? ? Please > let > >> me know if it > >>>> works > >>>>>> for you. > >>>>>>>> Sorry for the bug... > >>>>>>>> Yee Man > >>>>>>>> --- On Fri, 8/14/09, Chris > Fields > >> > >>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems > >>>> with > >>>>>> Bioperl-ext package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>> "Jonny Dalzell" , > >>>>>> "BioPerl List" > >>>>>>>>> Date: Friday, August > 14, 2009, > >> 8:31 > >>>> AM > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I tested this out > locally > >> (perl 5.8.8 > >>>> 32-bit, > >>>>>> perl 5.10.0 > >>>>>>>>> 64-bit) and on > >> dev.open-bio.org (which > >>>> is perl > >>>>>> 5.8.8, > >>>>>>>>> appears to be > 32-bit). > >> The patch > >>>> results > >>>>>> in cleaning > >>>>>>>>> up warnings for 5.10.0 > but > >> results in > >>>> similar > >>>>>> warnings for > >>>>>>>>> 5.8.8 (linux or OS > X). > >>>>>>>>> > >>>>>>>>> On OS X perl 5.8.8, > this > >> sometimes > >>>> passes > >>>>>> (note the first > >>>>>>>>> attempt fails, the > second > >> succeeds), > >>>> so it's > >>>>>> not entirely a > >>>>>>>>> 32-bit issue: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167860 > >>>>>>>>> > >>>>>>>>> OS X and perl 5.10.0, > this > >> always > >>>> fails as the > >>>>>> previous > >>>>>>>>> gist shows, but > demonstrates > >> similar > >>>> behavior > >>>>>> (multiple > >>>>>>>>> attempts to test get > >> different > >>>> responses): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> On linux, everything > passes > >> with or > >>>> w/o the > >>>>>> patched files > >>>>>>>>> (patched files have > warnings > >> as > >>>> indicated > >>>>>> above): > >>>>>>>>> > >>>>>>>>> Specs for all three > perl > >> executables > >>>> (they > >>>>>> vary a bit): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167883 > >>>>>>>>> > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 14, 2009, at > 3:27 AM, > >> Yee Man > >>>> Chan > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Ah.. I find that > the > >> typemap can > >>>> become as > >>>>>> simple as > >>>>>>>>> this > >>>>>>>>>> > ===================== > >>>>>>>>>> TYPEMAP > >>>>>>>>>> HMM * > >> T_PTROBJ > >>>>>>>>>> > ===================== > >>>>>>>>>> > >>>>>>>>>> Then the generated > HMM.c > >> will have > >>>> a > >>>>>> function called > >>>>>>>>> INT2PTR to do the > pointer > >> conversion. > >>>> I > >>>>>> believe this should > >>>>>>>>> solve the warnings. > >>>>>>>>>> Attached are the > updated > >> HMM.xs > >>>> and > >>>>>> typemap. Can > >>>>>>>>> someone with a 64-bit > machine > >> give it > >>>> a try? > >>>>>>>>>> Thank you > >>>>>>>>>> Yee Man > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Yee Man > Chan" > >> > >>>>>>>>>>> Cc: "Robert > Buels" > >> , > >>>>>>>>> "Jonny Dalzell" , > >>>>>>>>> "BioPerl List" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 5:31 > >>>>>> PM > >>>>>>>>>>> (just to point > out to > >>>> everyone, Yee > >>>>>>>>>>> Man's contact > >> information was > >>>> in the > >>>>>> POD) > >>>>>>>>>>> > >>>>>>>>>>> Yee Man, > >>>>>>>>>>> > >>>>>>>>>>> I have the > output in > >> the below > >>>> link: > >>>>>>>>>>> > >>>>>>>>>>> http://gist.github.com/167542 > >>>>>>>>>>> > >>>>>>>>>>> There are > similar > >> problems > >>>> popping up > >>>>>> on 32- and > >>>>>>>>> 64-bit > >>>>>>>>>>> perl 5.10.0, > Mac OS X > >> 10.5. > >>>>>> Haven't had time > >>>>>>>>> to debug > >>>>>>>>>>> it > unfortunately. > >>>>>>>>>>> > >>>>>>>>>>> I think we > should > >> seriously > >>>> consider > >>>>>> spinning this > >>>>>>>>> code off > >>>>>>>>>>> into it's own > >> distribution > >>>> for > >>>>>> CPAN.? It's > >>>>>>>>>>> unfortunately > >> bit-rotting away > >>>> in > >>>>>>>>> bioperl-ext.? If > you > >>>>>>>>>>> want to > continue > >> supporting it > >>>> I can > >>>>>> help set that > >>>>>>>>> up. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 6:58 PM, > >>>> Yee Man > >>>>>> Chan wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? So is > >>>> this > >>>>>> an HMM only > >>>>>>>>> problem? Or does > >>>>>>>>>>> it apply to > other > >> bioperl-ext > >>>>>> modules? > >>>>>>>>>>>> > >>? ? What > >>>>>> exactly are the > >>>>>>>>> compilation errors > >>>>>>>>>>> for HMM? I > believe my > >>>> implementation > >>>>>> is just a > >>>>>>>>> simple one > >>>>>>>>>>> based on > Rabiner's > >> paper. > >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > > >>>>>>>>>>>> > ~murphyk%2FBayes > >>>>>>>>>>>> > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > > >>>>>>>>>>>> > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? I > >>>> don't > >>>>>> think I did > >>>>>>>>> anything fancy that > >>>>>>>>>>> makes it > machine > >> dependent or > >>>> non-ANSI > >>>>>> C. > >>>>>>>>>>>> Yee Man > >>>>>>>>>>>> > >>>>>>>>>>>> --- On > Thu, > >> 8/13/09, Chris > >>>> Fields > >>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> From: > Chris > >> Fields > >>>> > >>>>>>>>>>>>> > Subject: Re: > >>>> [Bioperl-l] > >>>>>> Problems with > >>>>>>>>> Bioperl-ext > >>>>>>>>>>> package on > WinVista? > >>>>>>>>>>>>> To: > "Robert > >> Buels" > >>>> > >>>>>>>>>>>>> Cc: > "Jonny > >> Dalzell" > >>>> , > >>>>>>>>>>> "BioPerl List" > , > >>>>>>>>>>> "Yee Man Chan" > > >>>>>>>>>>>>> Date: > >> Thursday, August > >>>> 13, > >>>>>> 2009, 3:18 PM > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Aug > 13, > >> 2009, at > >>>> 4:37 PM, > >>>>>> Robert Buels > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > Jonny > >> Dalzell > >>>> wrote: > >>>>>>>>>>>>>>> > Is it > >>>> ridiculous of me > >>>>>> to expect > >>>>>>>>> ubuntu to > >>>>>>>>>>> take > >>>>>>>>>>>>> care > of this > >> for > >>>> me?? How > >>>>>> do > >>>>>>>>>>>>>>> > I go > >> about > >>>> compiling > >>>>>> the HMM? > >>>>>>>>>>>>>> > Yes. > >> This is > >>>> a very > >>>>>> specialized > >>>>>>>>> thing > >>>>>>>>>>> that > >>>>>>>>>>>>> you're > doing, > >> and > >>>> Ubuntu does > >>>>>> not have > >>>>>>>>> the > >>>>>>>>>>> resources to > >>>>>>>>>>>>> > package every > >> single > >>>> thing. > >>>>>>>>>>>>>> > >> Unfortunately, it > >>>> looks > >>>>>> like > >>>>>>>>> bioperl-ext > >>>>>>>>>>> package is > >>>>>>>>>>>>> not > >> installable under > >>>> Ubuntu > >>>>>> 9.04 anyway, > >>>>>>>>> which is > >>>>>>>>>>> what I'm > >>>>>>>>>>>>> > running. > >> For > >>>> others on > >>>>>> this list, > >>>>>>>>> if > >>>>>>>>>>> somebody is > >>>>>>>>>>>>> > interested in > >> doing > >>>>>> maintaining it, I'd be > >>>>>>>>> happy > >>>>>>>>>>> to help out > >>>>>>>>>>>>> by > testing on > >>>> Debian-based > >>>>>> Linux > >>>>>>>>> platforms. > >>>>>>>>>>> We need to > >>>>>>>>>>>>> > clarify this > >>>> package's > >>>>>> maintenance status: > >>>>>>>>> if > >>>>>>>>>>> there is > >>>>>>>>>>>>> > nobody > >> interested in > >>>>>> maintaining it, I > >>>>>>>>> would > >>>>>>>>>>> recommend > that > >>>>>>>>>>>>> > bioperl-ext be > >> removed > >>>> from > >>>>>> distribution. > >>>>>>>>>>> It's not in > >>>>>>>>>>>>> > anybody's > >> interest to > >>>> have > >>>>>> unmaintained > >>>>>>>>> software > >>>>>>>>>>> out there > >>>>>>>>>>>>> > causing > >> confusion. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have > cc'd > >> Yee Man > >>>> Chan for > >>>>>> this. > >>>>>>>>> If there > >>>>>>>>>>> isn't a > >>>>>>>>>>>>> > response or > >> the > >>>> message > >>>>>> bounces, we do one > >>>>>>>>> of two > >>>>>>>>>>> things: > >>>>>>>>>>>>> 1) > consider > >> it > >>>> deprecated > >>>>>> (probably > >>>>>>>>> safest). > >>>>>>>>>>>>> 2) > spin it out > >> into a > >>>> separate > >>>>>> module. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Just > tried to > >> comile > >>>> it myself > >>>>>> and am > >>>>>>>>> getting > >>>>>>>>>>> errors (using > >>>>>>>>>>>>> 64bit > perl > >> 5.10), so I > >>>> think, > >>>>>> unless > >>>>>>>>> someone wants > >>>>>>>>>>> to take > >>>>>>>>>>>>> this > on, > >> option #1 is > >>>> best. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> So > Jonny, > >> in > >>>> short, I > >>>>>> would say "do > >>>>>>>>> not use > >>>>>>>>>>>>> > bioperl-ext". > >>>>>>>>>>>>> > >>>>>>>>>>>>> In > general, > >> that's a > >>>> safe > >>>>>> bet.? We're > >>>>>>>>> moving > >>>>>>>>>>> most of > >>>>>>>>>>>>> our > C/C++ > >> bindings to > >>>> BioLib. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > Step > >> back. > >>>> What are > >>>>>> you trying > >>>>>>>>> to > >>>>>>>>>>>>> > accomplish? > >>>> Chris > >>>>>> already > >>>>>>>>> recommended some > >>>>>>>>>>> alternative > >>>>>>>>>>>>> > methods in his > >> email > >>>> of 8/11 > >>>>>> on this > >>>>>>>>>>> subject. > >> Perhaps > >>>>>>>>>>>>> we can > guide > >> you to > >>>> some > >>>>>> software that is > >>>>>>>>>>> actively > >>>>>>>>>>>>> > maintained and > >> will > >>>> meet your > >>>>>> needs. > >>>>>>>>>>>>>> > Rob > >>>>>>>>>>>>> > Exactly. > >> Lots of > >>>> other > >>>>>> (better > >>>>>>>>> supported!) > >>>>>>>>>>> options > >>>>>>>>>>>>> out > there. > >>>> HMMER, SeqAn, > >>>>>> and > >>>>>>>>> others. > >>>>>>>>>>>>> chris > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>>>> Do You Yahoo!? > >>>>>>>>>> Tired of spam? > >> Yahoo! Mail > >>>> has the > >>>>>> best spam > >>>>>>>>> protection around > >>>>>>>>>> http://mail.yahoo.com > >>>>>>>>> > >>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>>>> Bioperl-l mailing > list > >>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> --Robert Buels > >>>>>>> Bioinformatics Analyst, Sol > Genomics > >> Network > >>>>>>> Boyce Thompson Institute for > Plant > >> Research > >>>>>>> Tower Rd > >>>>>>> Ithaca, NY? 14853 > >>>>>>> Tel: 503-889-8539 > >>>>>>> rmb32 at cornell.edu > >>>>>>> http://www.sgn.cornell.edu > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Mon Aug 17 18:19:27 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:19:27 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <419432.62970.qm@web30403.mail.mud.yahoo.com> I believe this warnings should have been fixed with the latest Bio/Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/HMM.pm? I noticed that there are two pairs of "use strict" and "use warnings" in this version. :P Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:22 PM > Still seeing that odd warning popping > up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt > (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > Have you tried using Yee Man's original Makefile.PL to see > if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into > a new distro at Bio-Tools-HMM in the repo.? The tests > are not passing, I think that some bugs need to be fixed in > the logic of things. > > > > Yee Man, could you have a look?? To download the > newly repackaged code: > > > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > > > > perl Build.PL; ./Build test > > > > Please check that things are compiling OK, check the > test logic, upgrade the tests to use Test::More, and get the > tests to the point where they are passing. > > > > At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > > > > Rob > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Mon Aug 17 18:28:50 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:28:50 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Message-ID: <360578.66990.qm@web30403.mail.mud.yahoo.com> I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:28 PM > Take that back.? Yes the 'FL' > warning is still there, but no tests are run b/c (simply > put) there are no regression tests (no use of Test or > Test::More).? If you run './Build test --verbose' you > can see the run, but no test output.? That should be > easy to fix, though. > > chris > > On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > > > Still seeing that odd warning popping up: > > > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test > --verbose > > t/001_basics.t .. Argument "FL" isn't numeric in > numeric lt (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > > > Have you tried using Yee Man's original Makefile.PL to > see if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > > > chris > > > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off > into a new distro at Bio-Tools-HMM in the repo.? The > tests are not passing, I think that some bugs need to be > fixed in the logic of things. > >> > >> Yee Man, could you have a look?? To download > the newly repackaged code: > >> > >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > >> > >> perl Build.PL; ./Build test > >> > >> Please check that things are compiling OK, check > the test logic, upgrade the tests to use Test::More, and get > the tests to the point where they are passing. > >> > >> At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > >> > >> Rob > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Mon Aug 17 20:24:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 17:24:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89E0F0.8010307@cornell.edu> Message-ID: <62126.74727.qm@web30401.mail.mud.yahoo.com> I get it now. So it is now spinned off. Anyway, I updated the HMM.pm in Bio-Tools-HMM with the latest version. I think it should work. Yee Man --- On Mon, 8/17/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Monday, August 17, 2009, 4:00 PM > Yee Man Chan wrote: > > I noticed that Bio/Tools/HMM.pm was removed from the > trunk. So I added it back in. I think you shouldn't get the > warnings with this version. > > Please read my email above with instructions for checkout > out the new Bio-Tools-HMM component, where Bio::Tools::HMM > has been moved.? Please do not add the Bio::Tools::HMM > module back into bioperl-live. > > I think you might be confused about the functions of 'svn > add', 'svn commit', etc, because I don't see any actual > addition of the module in the commit logs.? Please read > through the SVN manual at http://svnbook.red-bean.com/ if you need > clarification. > > Rob > > From whs at eaglegenomics.com Tue Aug 18 05:14:48 2009 From: whs at eaglegenomics.com (Will Spooner) Date: Tue, 18 Aug 2009 10:14:48 +0100 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers In-Reply-To: References: Message-ID: Hi Robert, Speaking for Ensembl, the GeneTree display code is deeply embedded in the API and web code, and refactoring as a standalone package would be exceedingly difficult. Jalview (http://www.jalview.org) may be a good alternative, albeit a Java one. There is code available for driving Jalview from the Ensembl database, and something similar for BioPerl seems reasonable. Will On 17 Aug 2009, at 18:14, Robert Bradbury wrote: > One of the questions facing people working in bioinformatics is "How > do we > present information so that it can be effectively interpreted by > non-informatics specialists?" > > Now, my expertise lies in computer science (esp. O.S. & databases) > and as a > second vocation the biology of aging (DNA damage & repair, to a lesser > extent cancer and pathologies of aging, etc.). Now by my estimate > there are > perhaps 5 people in the world who are able to effectively discuss > computer > science X aging (gerontology) [3]. There are perhaps several dozen > people > where those areas, esp aging, may overlap with DNA damage & repair. > But > then there is a wider audience of perhaps a few hundred members of > AGE, and > maybe a thousand or so who are members of the scientific subgroup of > GSA. > But most of those individuals are "old school" scientists who know > relatively little about bioinformatics. So one has barriers to > presenting > bioinformatics information in ways that they can use usefully. > > I have found in my limited experience that homology graphs of > conserved > protein domains, such as those displayed in HomloGene or those in > Ensembl > (including phylogeny graphs) can be quite useful in reaching > interesting > conclusions. For example, double strand break repair processes > which may > involve 8-10 relatively conserved proteins, may have a critical role > in the > mechanisms of aging. In particular two of those proteins, WRN & > DCLRE1C > (Artemis) contain complementary exonuclease activities which chew up > the DNA > in order to prepare the strands for ligation. Of course, > programmers may > appreciate better than gerontologists the significance of deleting > random > bytes from instruction sequences in ones code. At the recent AGE > meeting in > June several discussions arose as to possible differences in "aging" > in > yeast, *C. elegans* and mammals. [1]. A quick database search > showed that *C. > elegans* seems to be lacking the exonuclease domain on the WRN > homologue and > may be missing a DCLRE1C homologue entirely (which if true would > lead to > conclusions that aging in *C. elegans* may be fundamentally > different from > aging in vertebrates). Explaining this to researchers can best be > done > using pictures. > > I've been through PubMed and have several papers (NAR / BMC > Bioinformatics) > regarding programs to do homology comparisons and phylogeny trees. > However > these seem to lean towards producing less condensed bioinformatics-ish > information. I do not know however whether the outputs from > databases like > PubMed HomoloGene or Ensembl have been packaged in tools that might > be part > of BioPerl. I am interested in programs that can be run on a > regular basis > to draw "pretty pictures" that can be used for publication and/or > internet > browsing. In particular I'm interested in running such programs on > species > of interest to various gerontological communities [2] which involves > subsets > of databases which seem to be scattered around the world. > > Thanks. > > 1. Of course there has been lots of discussion and rationalization > over the > last 15+ years about how "aging" is largely the same in more complex > and > simpler organisms -- in part to justify sequencing some organisms > and in > part to justify funding research at certain laboratories. A closer > examination based on some of the complete and emerging genome > sequences may > suggest this is a very swampy discussion. > 2. For example, nematode DNA repair gene comparisons would be > interesting to > nematode researchers, insect DNA repair gene comparisons to insect > researchers, both to invertebrate researchers, etc. > 3. The recently published textbooks *Aging of the Genome* by Jan > Vijg and > the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg > *et al*, > go a long way towards moving these areas from the stacks of research > libraries into areas for more general discussion. Both volumes deal > extensively with the ~150 DNA repair genes. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- William Spooner whs at eaglegenomics.com http://www.eaglegenomics.com From cjfields at illinois.edu Tue Aug 18 10:35:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 09:35:49 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Message-ID: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> I think I already answered this: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 chris On Aug 14, 2009, at 2:02 PM, David Quan wrote: > Hello, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Recommendations for elements > that would be capable of behaving in such a way would be greatly > appreciated. Thanks very much. > > David N. Quan > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 18 10:42:09 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 Aug 2009 16:42:09 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> Message-ID: <628aabb70908180742o4bf93d21tab0b90c328323efa@mail.gmail.com> On Tue, Aug 18, 2009 at 02:36, Kevin Brown wrote: > The obfuscator does help, but even it is a little sparse on data for > modules. Especially information on the realities of the returned data > from a method call. Yep, sorry about that, Kevin. I'm way overdue in devoting a little attention to cleaning up those Deobfuscator bugs and -- just maybe -- putting a prettier face on it. Hoping to find some time in the near future for that. Dave From cjfields at illinois.edu Tue Aug 18 11:04:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:04:40 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: <20090818110102.GA27010@seinfeld> References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > Putting it in the bioperl list, makes more sense here, > > On Wed, 12 Aug 2009, Chris Fields wrote: > >> (BTW, this is re: the reimplementation of major chunks of BioPerl >> using >> Moose, Biome: http://github.com/cjfields/biome/tree/) >> >> Locations should use a Role (specifically, Biome::Role::Range), so >> start/end/strand should be attributes, not methods. With >> attributes the >> best way to do this is probably with a builder, and lazily (start >> requires end, and vice versa). Factor out the common code as Tomas >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- >> >throw() >> exception handling; it simply catches any exceptions and passes >> them to >> the metaclass exception handling. >> >> I've been thinking about making the Range role abstract for this very >> reason (or defining very basic attributes); something like: >> >> ---------------------------- >> >> package Bio::Role::Range; >> >> requires qw(_build_start _build_end _build_strand); >> >> # also require other methods which need to be defined in >> implementation >> >> has 'start' => ( >> isa => 'Int', >> is => 'rw', >> builder => '_build_start', >> lazy => 1 >> ); >> >> # same for end, strand (except strand has a different isa via >> MooseX::Types) >> .... >> >> package Bio::Location::Foo; >> >> with 'Bio::Role::Range'; >> >> sub _build_start { >> # for location-specific start >> } >> >> sub _build_end { >> # for location-specific end >> } >> >> sub _build_strand { >> # for location-specific strand >> } >> >> sub _common_build_method { >> # factor out common code here, call from other builders >> } >> >> ---------------------------- > > This plan makes things much clearer. Currently the > BioMe::Role::Location has a 'requires' keyword and rest of the > location modules consume that role to have its own implementation. At > this point on BioMe::Location::Atomic has attribute based 'start' and > 'end' implememtation. I got a bit confused because in current bioperl > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > i am trying to follow that path in BioMe it has to override that > method. > So, my question is do all the location modules really needs to > inherits > from each other. I am totally aware about the origianl design ideas > but > it would be better to have a flatten hierarchy if possible. Flattening with roles is always a good idea, yes. I wouldn't worry as much about the way it was originally implemented as the general API (and ways in which we can simplify it). > One more thing, what about putting the 'start', 'end' and the other > common base attributes in BioMe::Role::Location instead of > BioMe::Role::Range. I am not sure which would be correct from bioperl > stand of view, just throwing out an idea. That's a possibility. To me Locations are just Ranges with different behavior (hence the below comment...) >> Also, I think the Coordinate-related stuff should be simplified >> down to a >> trait or an attribute; they bring in way too much overhead in >> bioperl w/o >> much added value. > > You mean instead of having 'builder' method, having a specialized > traits handling those. That sounds like even better. > > -siddhartha Yes, that's essentially it. Location behavior could be changed by having CoordinatePolicy as a trait. Similarly, fuzziness for start/ end could also be thought of as a trait. In essence, you could probably role most behavior into attribute traits (which, in Moose, are just roles that are composed into the attribute meta class, Moose::Meta::Attribute). I had started up a Biome::Meta::Attribute class in case we were to go down this path, then we could start registering specific traits within that namespace. Just to note, it might be easier to try the simplest approach first and get tests passing, then layer in traits to see how they act performance-wise. My guess is they will speed things up, but you never know. Locations will be a performance bottleneck as they are used in generic Features. chris From cjfields at illinois.edu Tue Aug 18 11:10:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:10:08 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <62126.74727.qm@web30401.mail.mud.yahoo.com> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: Yee Man, Robert, All tests are passing; there was a small change in the expected floating point, but no warning now. Re: passing this on to CPAN, I think it needs a distinct version from BioPerl (something that should probably happen with any spinoffs). I foresee two options (and a possible conflict): 1) Use the same versioning scheme, starting with 1.6.1. 2) Use a simpler scheme a'la Bio::Graphics, which I suggest. Tripartite versions are a PITA, we'll only need to keep that in core. Conflict: Bio::Tools::HMM is currently part of the 1.6 branch (in 1.6.0). If this stays in 1.6.1 then we have two versions of the module floating out there. I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 is out. After that, I could then add Yee Man as PAUSE co- maintainer for those modules (which means Yee Man needs to sign up for a PAUSE account). Any objections to that? chris On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > I get it now. So it is now spinned off. Anyway, I updated the HMM.pm > in Bio-Tools-HMM with the latest version. I think it should work. > > Yee Man > > --- On Mon, 8/17/09, Robert Buels wrote: > >> From: Robert Buels >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Chris Fields" , "BioPerl List" > > >> Date: Monday, August 17, 2009, 4:00 PM >> Yee Man Chan wrote: >>> I noticed that Bio/Tools/HMM.pm was removed from the >> trunk. So I added it back in. I think you shouldn't get the >> warnings with this version. >> >> Please read my email above with instructions for checkout >> out the new Bio-Tools-HMM component, where Bio::Tools::HMM >> has been moved. Please do not add the Bio::Tools::HMM >> module back into bioperl-live. >> >> I think you might be confused about the functions of 'svn >> add', 'svn commit', etc, because I don't see any actual >> addition of the module in the commit logs. Please read >> through the SVN manual at http://svnbook.red-bean.com/ if you need >> clarification. >> >> Rob >> >> > > > From hlapp at gmx.net Tue Aug 18 11:46:55 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 11:46:55 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A89EADD.9050509@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> Message-ID: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > I can see how this might be a good idea, or it might be overkill. > Anybody have thoughts on having feature _sources_ strongly typed > with ontology terms? It's how BioSQL and Chado would store it anyway. I'm not sure whether GFF3 requires it, possibly not. But when you make everything else ontology-typed, why exempt one property that also stands to benefit from more predictable values? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Tue Aug 18 11:49:32 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 08:49:32 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: <4A8ACD8C.1060908@cornell.edu> Chris Fields wrote: > I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I > could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 > is out. After that, I could then add Yee Man as PAUSE co-maintainer for > those modules (which means Yee Man needs to sign up for a PAUSE > account). Any objections to that? Sounds like a good plan to me, if Yee Man agreed with it. He would be the primary CPAN maintainer of the package. Maybe he should actually be the first uploader too? Then, it would show up under his PAUSE account at the outset, and he would get better attribution and visibility. Rob From cjfields at illinois.edu Tue Aug 18 12:34:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 11:34:00 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 10:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. Might be worth bringing up with Lincoln to get his thoughts. > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar What I'm thinking as well. You can always implement it that way, and if we deem it too heavy-weight then revert back. Or have it evaluated lazily and get the benefits of both. That's the magic of doing this on a branch, it gives you much more latitude to try things out. chris From cain.cshl at gmail.com Tue Aug 18 14:28:05 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 18 Aug 2009 14:28:05 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: Hi Hilmar and all, Actually, Chado stores sources as a dbxref for the feature (where the db.name is "GFF_source") and the source can be any string, which is what the GFF3 spec indicates. I think the source was intended to be free text to allow the creator maximum flexibility when making the GFF; it also allows lots of flexibility when defining what features go into a particular track in GBrowse: you can have lots of gene features in your GFF, but you can segregate them according to what their source attributes are. Additionally, some applications (SynBrowse comes to mind) overload the source value and require them to conform to a certain syntax. So, what I'm trying to say is, source should probably just stay a simple string. Scott On Aug 18, 2009, at 11:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. > > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From marcelo011982 at gmail.com Tue Aug 18 14:34:17 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Tue, 18 Aug 2009 15:34:17 -0300 Subject: [Bioperl-l] Genbank code from Blast results Message-ID: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> hi all.. I was doing a script that take some information of the results of blastn files. Everythig was ok, but i have some dificult to pic the Genbank code number (the 'gb' below). I tried $obj->each_accession_number $hit->name And some variation of this. ------------------------------ >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h segment 1 gmrtDrNS01 Glycine max cDNA 3', mRNA sequence /clone_end=3' /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 Length = 853 Score = 1336 bits (674), Expect = 0.0 Identities = 793/832 (95%), Gaps = 8/832 (0%) Strand = Plus / Minus Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt 294917 |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc 794 ---------------------------------------- But, i still don't get it. thank you with regards Miwata From hlapp at gmx.net Tue Aug 18 16:01:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 16:01:18 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > Additionally, some applications (SynBrowse comes to mind) overload > the source value and require them to conform to a certain syntax. > > So, what I'm trying to say is, source should probably just stay a > simple string. I would rephrase that to source should probably retain the possibility of using made-up strings. You mention one example yourself, and there have been others in a recent thread on BioSQL [1], for why the option to have predictable, structured values with attached semantics could be very useful. -hilmar [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 18 17:46:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 16:46:25 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8ACD8C.1060908@cornell.edu> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> <4A8ACD8C.1060908@cornell.edu> Message-ID: On Aug 18, 2009, at 10:49 AM, Robert Buels wrote: > Chris Fields wrote: >> I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, >> and I could attempt to push the initial Bio-Ext-HMM release after >> core 1.6.1 is out. After that, I could then add Yee Man as PAUSE >> co-maintainer for those modules (which means Yee Man needs to sign >> up for a PAUSE account). Any objections to that? > > > Sounds like a good plan to me, if Yee Man agreed with it. He would > be the primary CPAN maintainer of the package. Maybe he should > actually be the first uploader too? Then, it would show up under > his PAUSE account at the outset, and he would get better attribution > and visibility. > > Rob At the moment BIOPERLML is the primary maintainer. It's an 'umbrella' account for the bioperl group; a few others exist for stuff like DBI, Catalyst, etc I think. Anyone who's designated a co-maintainer can release code onto CPAN. Several of us can assign new co-maintainer status for modules, so the code doesn't get locked up if someone decides to abandon it. We simply designate another co-maintainer if someone decides to take it over. In fact, that's half the reason I would like to get the ext code out there again; either designate it as abandonware or set it up so that it can be reimplemented by someone with the tuits (maybe using biolib, for instance). We have recently moved Bio::Graphics over to LDS as the primary, though, so this is all a point up for debate. chris From rmb32 at cornell.edu Tue Aug 18 17:56:19 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 14:56:19 -0700 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> Message-ID: <4A8B2383.1030207@cornell.edu> Steven, Could you CC Heath Bair on this? He's the YAPC::NA 2010 coordinator that started this thread. Rob Steven Lembark wrote: > On Fri, 26 Jun 2009 14:06:06 -0700 > Robert Buels wrote: > >> This is a really giant opportunity to expose some of the best >> technologists in the world to what we do in bioinformatics, and possibly >> to entice some of them to help us the heck out! ;-) > > OK, so I'm a few months behind on my email... > > One suggestion: Have them add a BioPerl track to the > conference in advance of getting any submissions for > it. The gent I spoke to in Pittsburgh seemed open to > the idea of a Bioinformatcs/BioPerl track in 2010. > > Opening things up a bit to include Bioinformatics > even beyond BioPerl would give people who are > marginally interested a chance to see what the > whole area is about (e.g., adapting the W-Curve > for use with Perl or how we analyzed Clostridia > using Perl for the bookkeeping). > > In the meantime you might want to see how many > people would be willing to give talks in the > track -- even recycled ones -- before the conference > submission period begins. And, yes, I'd volunteer to > give 1-2 talks. > > enjoi > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jncline at gmail.com Tue Aug 18 23:06:19 2009 From: jncline at gmail.com (Jonathan Cline) Date: Tue, 18 Aug 2009 22:06:19 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: <4A8B6C2B.9030101@gmail.com> Chris Fields wrote: > > Your modules may or may not need the Bio* namespace (that's up to you, > actually); there are several non-bioperl modules that also share the > Bio* namespace, and I believe there are modules that aren't Bio* that > use BioPerl (Gbrowse comes to mind). If you're focusing on > interaction with robotics, Robotics::Bio::X might be a better > namespace for instance (b/c you could expand later into other possibly > non-bio robotics interfaces). Based on your & other opinions I have received, I am creating: Robotics.pm (high level hardware abstraction layer) Robotics::Tecan Robotics::Tecan::Genesis I'll post a release note when it's reached an interesting level of maturity (estimate a couple weeks from now) so anyone with the hardware can play with the package. It's currently working great, and I am adding functionality on a daily basis. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> >> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>>> Sent: Thursday, 30 July 2009 2:07 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Cc: Jonathan Cline >>>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>>> >>>> I am writing a module for communication with biology robotics, as >>>> discussed recently on #bioperl, and I invite your comments. >>>> >>>> >>>> On Namespace: >>>> >>>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >>>> s/w modules already called 'robots' (web spider robots, chat bots, www >>>> automate, etc) so I chose the longer name "robotics" to differentiate >>>> this module as manipulating real hardware. Bio::Robotics is the >>>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>>> manufacturer-specific implementation. Robot control is made more >>>> complex due to the very configurable nature of the work table >>>> (placement >>>> of equipment, type of equipment, type of attached arm, etc). The >>>> abstraction has to be careful not to generalize or assume too >>>> much. In >>>> some cases, the Bio::Robotics modules may expand to arbitrary >>>> equipment >>>> such as thermocyclers, tray holders, imagers, etc - that could be a >>>> future roadmap plan. From rmb32 at cornell.edu Wed Aug 19 00:13:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:13:53 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <829996.94283.qm@web30404.mail.mud.yahoo.com> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A8B7C01.5060502@cornell.edu> Yee Man Chan wrote: > I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. The long-term development plan is to factor *ALL* of Bioperl into individual distributions similar to Bio-Tools-HMM. It is actually much easier to maintain and release code in this "broken up" way. This means that the Bio-Ext package is going to go away, so it doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core devs, do you agree with this? > I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Oh good, the next step would just be to coordinate when to do the release in concert with Bioperl 1.6.1, right? Rob From rmb32 at cornell.edu Wed Aug 19 00:37:49 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:37:49 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <190221.61009.qm@web30408.mail.mud.yahoo.com> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A8B819D.9070309@cornell.edu> Yee Man Chan wrote: > Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. Yes, we are probably going to make a Task::BioPerl or something similar. > What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. By going away, I meant that everything in there is going to be spinned off. Except modules that are no longer maintainable, if there are any in there. Rob From deequan at gmail.com Wed Aug 19 00:39:35 2009 From: deequan at gmail.com (deequan) Date: Tue, 18 Aug 2009 21:39:35 -0700 (PDT) Subject: [Bioperl-l] bioperl capability In-Reply-To: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> Message-ID: <25037707.post@talk.nabble.com> Howdy there, Yes, quite right. I apologize for the double posting. Moreover, I appreciate your assistance in trying to sort out what can and cannot be done with bioperl. To address the problem previously stated, I put together a remarkably misbehaving script that has the following parts: #Some parsing: $q_start = $hsp->query->start; $q_end = $hsp->query->end; $h_start = $hsp->hit->start; $h_end = $hsp->hit->end; $length = $hsp->query->seqlength(); $id = $hit->accession; print OUT "$id\t"; my $seq; if($h_start<$h_end){ #the bit per your recommendation my $begin = $h_start-$q_start+1; my $cease = ($length - $q_end) + $h_end; my $strand = 1; my $factory = Bio::DB::GenBank->new(-format=> 'genbank', -seq_start =>$begin, -seq_stop =>$cease, -strand => $strand, #1 = plus, 2 = minus ); $seq = $factory->get_Seq_by_acc($id); }else{#else assume backward, code not shown} #and some stuff to retrieve the sequence my $len = $seq->length(); my $string = $seq->subseq(1, $len); print OUT "length = $len\t"; print OUT "seq = $string\n"; In your previous reply, you said the code accessing the seq object created by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO for basic IO purposes. Not seeing exactly how to go about that, I tried some other functions in combination that seemed as though they should work (length() and subseq()). Unfortunately, the program does not even run to that point, as the script throws an exception: ------------- EXCEPTION ------------- MSG: acc CP000948 does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 2 STACK toplevel test.pl:36 ------------------------------------- Oddly, the record corresponding to this accession number can be found here: http://www.ncbi.nlm.nih.gov/nuccore/169887498 Perhaps you'd be willing to offer another hint. Thank you for your assistance thus far. And on behalf of all posters, thank you for sharing your knowledge. 'Preciate. David Q. Chris Fields-5 wrote: > > I think I already answered this: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 > > chris > > -- View this message in context: http://www.nabble.com/bioperl-capability-tp25024929p25037707.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Wed Aug 19 01:28:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:29 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> <4A8B819D.9070309@cornell.edu> Message-ID: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> On Aug 18, 2009, at 11:37 PM, Robert Buels wrote: > Yee Man Chan wrote: >> Is it going to be an arrangement similar to bioconductor? If so, I >> suppose then it makes sense. But you might want to develop scripts >> to automatically download and install new modules to make it user >> friendly. > Yes, we are probably going to make a Task::BioPerl or something > similar. > >> What do you mean by Bio-Ext is going away? I notice quite many >> people using dpAlign. So if Bio-Ext is going away, then at least >> dpAlign should become another spin off. > By going away, I meant that everything in there is going to be > spinned off. Except modules that are no longer maintainable, if > there are any in there. > > Rob dpAlign could become another spinoff, yes, if it's used (and works fine). The problematic code dealt with pSW, alignment statistics, and staden io_lib support (the latter which is fairly bit rotted now): http://bugzilla.open-bio.org/show_bug.cgi?id=2668 http://bugzilla.open-bio.org/show_bug.cgi?id=1857 http://bugzilla.open-bio.org/show_bug.cgi?id=2069 http://bugzilla.open-bio.org/show_bug.cgi?id=2074 http://bugzilla.open-bio.org/show_bug.cgi?id=2329 dpAlign has it's own bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2384 chris From cjfields at illinois.edu Wed Aug 19 01:28:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:39 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> <4A8B7C01.5060502@cornell.edu> Message-ID: <1DA73AAB-EC4F-4F44-BBF2-CFF7B3E4A0BE@illinois.edu> On Aug 18, 2009, at 11:13 PM, Robert Buels wrote: > Yee Man Chan wrote: >> I think it is better to keep Bio-Tools-HMM within the Bio-Ext >> package and then spin this whole Bio-Ext package out to CPAN. I am >> ok with Robert's arrangement to move the related pm files under Bio/ >> Tools/ to the new Bio-Ext package. > > The long-term development plan is to factor *ALL* of Bioperl into > individual distributions similar to Bio-Tools-HMM. It is actually > much easier to maintain and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so it > doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core > devs, do you agree with this? In general, though there will be a limit as to how small we can split these off. For instance, Bio::Tree/TreeIO will be messy to split up and makes sense to keep together. Others could be more easily split off. YMMV. >> I have a PAUSE already due to my other CPAN contributions. So there >> is no need to create a new one. My PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to do the > release in concert with Bioperl 1.6.1, right? > > Rob Yes. That should be easy enough to do; basically Bio::Tools::HMM will be removed from 1.6.1, then core will be released along with Bio::Ext::HMM (or Bio::Tools::HMM, either way it would double as the distribution name). chris From cjfields at illinois.edu Wed Aug 19 01:28:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:48 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A8B6C2B.9030101@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> <4A8B6C2B.9030101@gmail.com> Message-ID: <2F5111BE-A1F3-437F-AC6C-4AC3BE05E9EB@illinois.edu> On Aug 18, 2009, at 10:06 PM, Jonathan Cline wrote: > Chris Fields wrote: >> >> Your modules may or may not need the Bio* namespace (that's up to >> you, >> actually); there are several non-bioperl modules that also share the >> Bio* namespace, and I believe there are modules that aren't Bio* that >> use BioPerl (Gbrowse comes to mind). If you're focusing on >> interaction with robotics, Robotics::Bio::X might be a better >> namespace for instance (b/c you could expand later into other >> possibly >> non-bio robotics interfaces). > > Based on your & other opinions I have received, I am creating: > > Robotics.pm (high level hardware abstraction layer) > Robotics::Tecan > Robotics::Tecan::Genesis > > > I'll post a release note when it's reached an interesting level of > maturity (estimate a couple weeks from now) so anyone with the > hardware > can play with the package. It's currently working great, and I am > adding functionality on a daily basis. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## That's great to hear! Keep us updated, I'm sure there are a few potential users lurking about here. chris From scott at scottcain.net Wed Aug 19 09:15:12 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 19 Aug 2009 09:15:12 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Hilmar, The examples in that thread ought to go in the ninth column; using the Dbxref tag for references back to GenBank for example. The provenience stuff should go in the ninth column as well, though I don't know exactly how would be best. Scott On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > > On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > >> Additionally, some applications (SynBrowse comes to mind) overload >> the source value and require them to conform to a certain syntax. >> >> So, what I'm trying to say is, source should probably just stay a >> simple string. > > > I would rephrase that to source should probably retain the > possibility of using made-up strings. > > You mention one example yourself, and there have been others in a > recent thread on BioSQL [1], for why the option to have predictable, > structured values with attached semantics could be very useful. > > -hilmar > > [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From saikari78 at gmail.com Wed Aug 19 09:30:07 2009 From: saikari78 at gmail.com (saikari keitele) Date: Wed, 19 Aug 2009 14:30:07 +0100 Subject: [Bioperl-l] Pipeline for generating phylogenetic trees from list of species names Message-ID: Hi, Does anyone know of a simple pipeline for generating a phylogenetic tree from a list of species with bioperl? I've had a look at http://www.bioperl.org/wiki/HOWTO:PhylogeneticAnalysisPipeline#Distance_Distance_in_PHYLIP_.2B_NJ_Tree_in_PHYLIPbut it isn't explicit for the crucial steps (at least given my level of knowledge) For each species, should I extract the longest sequence available for every protein and align it with the same protein sequences of the other species in the list? Would anyone have an example pipeline of the different steps to perform? Thank you very much. Saikari From ymc at yahoo.com Tue Aug 18 22:50:57 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 19:50:57 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <829996.94283.qm@web30404.mail.mud.yahoo.com> I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. There aren't that many modules in Bio-Ext. Plus, based on Chris and Robert's comments, modules other than my dpAlign and HMM appear to be abandoned. Moving HMM out only makes users less likely to try it out. If need be, I can also be a co-maintainer of this spinned off Bio-Ext package. I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Tuesday, August 18, 2009, 8:10 AM > Yee Man, Robert, > > All tests are passing; there was a small change in the > expected floating point, but no warning now. > > Re: passing this on to CPAN, I think it needs a distinct > version from BioPerl (something that should probably happen > with any spinoffs).? I foresee two options (and a > possible conflict): > > 1) Use the same versioning scheme, starting with 1.6.1. > 2) Use a simpler scheme a'la Bio::Graphics, which I > suggest.? Tripartite versions are a PITA, we'll only > need to keep that in core. > > Conflict: Bio::Tools::HMM is currently part of the 1.6 > branch (in 1.6.0).? If this stays in 1.6.1 then we have > two versions of the module floating out there. > > I think we should go ahead and remove Bio::Tools::HMM from > 1.6.1, and I could attempt to push the initial Bio-Ext-HMM > release after core 1.6.1 is out.? After that, I could > then add Yee Man as PAUSE co-maintainer for those modules > (which means Yee Man needs to sign up for a PAUSE > account).? Any objections to that? > > chris > > On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > > > I get it now. So it is now spinned off. Anyway, I > updated the HMM.pm in Bio-Tools-HMM with the latest version. > I think it should work. > > > > Yee Man > > > > --- On Mon, 8/17/09, Robert Buels > wrote: > > > >> From: Robert Buels > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Chris Fields" , > "BioPerl List" > >> Date: Monday, August 17, 2009, 4:00 PM > >> Yee Man Chan wrote: > >>> I noticed that Bio/Tools/HMM.pm was removed > from the > >> trunk. So I added it back in. I think you > shouldn't get the > >> warnings with this version. > >> > >> Please read my email above with instructions for > checkout > >> out the new Bio-Tools-HMM component, where > Bio::Tools::HMM > >> has been moved.? Please do not add the > Bio::Tools::HMM > >> module back into bioperl-live. > >> > >> I think you might be confused about the functions > of 'svn > >> add', 'svn commit', etc, because I don't see any > actual > >> addition of the module in the commit logs.? > Please read > >> through the SVN manual at http://svnbook.red-bean.com/ if you need > >> clarification. > >> > >> Rob > >> > >> > > > > > > > > From ymc at yahoo.com Wed Aug 19 00:24:05 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:24:05 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> Message-ID: <190221.61009.qm@web30408.mail.mud.yahoo.com> Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:13 PM > Yee Man Chan wrote: > > I think it is better to keep Bio-Tools-HMM within the > Bio-Ext package and then spin this whole Bio-Ext package out > to CPAN. I am ok with Robert's arrangement to move the > related pm files under Bio/Tools/ to the new Bio-Ext > package. > > The long-term development plan is to factor *ALL* of > Bioperl into individual distributions similar to > Bio-Tools-HMM.? It is actually much easier to maintain > and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so > it doesn't make sense to keep Bio-Tools-HMM in it.? > Chris, other core devs, do you agree with this? > > > I have a PAUSE already due to my other CPAN > contributions. So there is no need to create a new one. My > PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to > do the release in concert with Bioperl 1.6.1, right? > > Rob > > From ymc at yahoo.com Wed Aug 19 00:49:18 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:49:18 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> Message-ID: <184595.94226.qm@web30407.mail.mud.yahoo.com> Good. That makes sense then. Please update me when all is set. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:37 PM > Yee Man Chan wrote: > > Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > What do you mean by Bio-Ext is going away? I notice > quite many people using dpAlign. So if Bio-Ext is going > away, then at least dpAlign should become another spin off. > By going away, I meant that everything in there is going to > be spinned off.? Except modules that are no longer > maintainable, if there are any in there. > > Rob > > From ymc at yahoo.com Wed Aug 19 05:01:39 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 02:01:39 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> Message-ID: <884845.92813.qm@web30408.mail.mud.yahoo.com> I tried that sample script that reportedly caused the dpAlign "bug" but I can't reproduced it. All I get is a warning from LocatableSeq. ------------------------------------------- [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "-Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl --------------------- WARNING --------------------- MSG: In sequence ABC|9944760 residue count gives end value 101. Overriding value [104] with value 101 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 300 Getting score for ABC|9986984 -> ABC|9944760 = 303 ------------------------------------------ Does the test script crash in your machine? Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Tuesday, August 18, 2009, 10:28 PM > On Aug 18, 2009, at 11:37 PM, Robert > Buels wrote: > > > Yee Man Chan wrote: > >> Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > >> What do you mean by Bio-Ext is going away? I > notice quite many people using dpAlign. So if Bio-Ext is > going away, then at least dpAlign should become another spin > off. > > By going away, I meant that everything in there is > going to be spinned off.? Except modules that are no > longer maintainable, if there are any in there. > > > > Rob > > dpAlign could become another spinoff, yes, if it's used > (and works fine).? The problematic code dealt with pSW, > alignment statistics, and staden io_lib support (the latter > which is fairly bit rotted now): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > dpAlign has it's own bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > chris > From cjfields at illinois.edu Wed Aug 19 10:49:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 09:49:15 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <884845.92813.qm@web30408.mail.mud.yahoo.com> References: <884845.92813.qm@web30408.mail.mud.yahoo.com> Message-ID: I'll have a look. It's probably something that hasn't been updated to deal with LocatableSeq's pathological end point checking. chris On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > I tried that sample script that reportedly caused the dpAlign "bug" > but I can't reproduced it. All I get is a warning from LocatableSeq. > ------------------------------------------- > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "- > Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9944760 residue count gives end value 101. > Overriding value [104] with value 101 for Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT > -GGG-CCGGCCC-AA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 300 > Getting score for ABC|9986984 -> ABC|9944760 > = 303 > ------------------------------------------ > > Does the test script crash in your machine? > > Yee Man > > --- On Tue, 8/18/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] >> Problems with Bioperl-ext package on WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM >> On Aug 18, 2009, at 11:37 PM, Robert >> Buels wrote: >> >>> Yee Man Chan wrote: >>>> Is it going to be an arrangement similar to >> bioconductor? If so, I suppose then it makes sense. But you >> might want to develop scripts to automatically download and >> install new modules to make it user friendly. >>> Yes, we are probably going to make a Task::BioPerl or >> something similar. >>> >>>> What do you mean by Bio-Ext is going away? I >> notice quite many people using dpAlign. So if Bio-Ext is >> going away, then at least dpAlign should become another spin >> off. >>> By going away, I meant that everything in there is >> going to be spinned off. Except modules that are no >> longer maintainable, if there are any in there. >>> >>> Rob >> >> dpAlign could become another spinoff, yes, if it's used >> (and works fine). The problematic code dealt with pSW, >> alignment statistics, and staden io_lib support (the latter >> which is fairly bit rotted now): >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 >> >> dpAlign has it's own bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 >> >> chris >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Aug 19 18:19:25 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 19 Aug 2009 18:19:25 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Message-ID: <4907C3F4-C503-4019-BBDA-153ED777276C@gmx.net> Putting it into the 9nth column is the equivalent of storing it in the {seqfeature,bioentry}_qualifier_value tables in BioSQL. -hilmar On Aug 19, 2009, at 9:15 AM, Scott Cain wrote: > Hilmar, > > The examples in that thread ought to go in the ninth column; using > the Dbxref tag for references back to GenBank for example. The > provenience stuff should go in the ninth column as well, though I > don't know exactly how would be best. > > Scott > > > > On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > >> >> On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: >> >>> Additionally, some applications (SynBrowse comes to mind) overload >>> the source value and require them to conform to a certain syntax. >>> >>> So, what I'm trying to say is, source should probably just stay a >>> simple string. >> >> >> I would rephrase that to source should probably retain the >> possibility of using made-up strings. >> >> You mention one example yourself, and there have been others in a >> recent thread on BioSQL [1], for why the option to have >> predictable, structured values with attached semantics could be >> very useful. >> >> -hilmar >> >> [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Wed Aug 19 20:55:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 19 Aug 2009 20:55:22 -0400 Subject: [Bioperl-l] Hi In-Reply-To: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> References: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> Message-ID: Please ask on the mailing list for these things, I am not really sure what you mean by subtract all taxonomy -- I suspect you mean extract all IDs, I think you should take a look at the example like http://bioperl.org/wiki/Module:Bio::DB::Taxonomy I think the example is basically what you want to do, except replace the nodeid with 7742 instead of 33090 -jason On Aug 19, 2009, at 2:47 PM, JingtaoLiu(TSU) wrote: > Hi Sir, > > Thank you for reading this. > I am working for BioChem Dept Texastate university. > I encounter a problem. > I need subtract all taxonomy IDs from vertebrates(taxon id is 7742) > how I can get all the leaf node of these? > > I referenced Bio::DB::Taxonomy, > but i have no clue about it. > Very appreciate for your help. > > Jingtao Liu -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From yannick.wurm at unil.ch Wed Aug 19 15:25:11 2009 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Wed, 19 Aug 2009 21:25:11 +0200 Subject: [Bioperl-l] Programmer job in Lausanne Switzerland Message-ID: <1D1F031E-29F1-4AE4-A225-D9B434ACE070@unil.ch> Dear list, my apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. We have a job opening for assembly and annotation of ant genomes in Lausanne Switzerland. http://www.isb-sib.ch/about-sib/jobs/details/91-sib-bioinformatician-at-sib--unil.html http://fourmidable.unil.ch/BioinformaticsEngineerLausanneAnts.pdf Kind regards, Yannick http://yannick.poulet.org From sidd.basu at gmail.com Thu Aug 20 06:03:07 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 20 Aug 2009 05:03:07 -0500 Subject: [Bioperl-l] Re: code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: <20090820100304.GA1884@seinfeld> On Tue, 18 Aug 2009, Chris Fields wrote: > > On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > > > Putting it in the bioperl list, makes more sense here, > > > > On Wed, 12 Aug 2009, Chris Fields wrote: > > > >> (BTW, this is re: the reimplementation of major chunks of BioPerl > >> using > >> Moose, Biome: http://github.com/cjfields/biome/tree/) > >> > >> Locations should use a Role (specifically, Biome::Role::Range), so > >> start/end/strand should be attributes, not methods. With attributes > >> the > >> best way to do this is probably with a builder, and lazily (start > >> requires end, and vice versa). Factor out the common code as Tomas > >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- > >> >throw() > >> exception handling; it simply catches any exceptions and passes them > >> to > >> the metaclass exception handling. > >> > >> I've been thinking about making the Range role abstract for this very > >> reason (or defining very basic attributes); something like: > >> > >> ---------------------------- > >> > >> package Bio::Role::Range; > >> > >> requires qw(_build_start _build_end _build_strand); > >> > >> # also require other methods which need to be defined in > >> implementation > >> > >> has 'start' => ( > >> isa => 'Int', > >> is => 'rw', > >> builder => '_build_start', > >> lazy => 1 > >> ); > >> > >> # same for end, strand (except strand has a different isa via > >> MooseX::Types) > >> .... > >> > >> package Bio::Location::Foo; > >> > >> with 'Bio::Role::Range'; > >> > >> sub _build_start { > >> # for location-specific start > >> } > >> > >> sub _build_end { > >> # for location-specific end > >> } > >> > >> sub _build_strand { > >> # for location-specific strand > >> } > >> > >> sub _common_build_method { > >> # factor out common code here, call from other builders > >> } > >> > >> ---------------------------- > > > > This plan makes things much clearer. Currently the > > BioMe::Role::Location has a 'requires' keyword and rest of the > > location modules consume that role to have its own implementation. At > > this point on BioMe::Location::Atomic has attribute based 'start' and > > 'end' implememtation. I got a bit confused because in current bioperl > > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > > i am trying to follow that path in BioMe it has to override that > > method. > > So, my question is do all the location modules really needs to > > inherits > > from each other. I am totally aware about the origianl design ideas > > but > > it would be better to have a flatten hierarchy if possible. > > Flattening with roles is always a good idea, yes. I wouldn't worry as > much about the way it was originally implemented as the general API (and > ways in which we can simplify it). Thanks for clarifying that. > > > One more thing, what about putting the 'start', 'end' and the other > > common base attributes in BioMe::Role::Location instead of > > BioMe::Role::Range. I am not sure which would be correct from bioperl > > stand of view, just throwing out an idea. > > That's a possibility. To me Locations are just Ranges with different > behavior (hence the below comment...) > > >> Also, I think the Coordinate-related stuff should be simplified down > >> to a > >> trait or an attribute; they bring in way too much overhead in > >> bioperl w/o > >> much added value. > > > > You mean instead of having 'builder' method, having a specialized > > traits handling those. That sounds like even better. > > > > -siddhartha > > Yes, that's essentially it. Location behavior could be changed by > having CoordinatePolicy as a trait. Similarly, fuzziness for start/end > could also be thought of as a trait. In essence, you could probably role > most behavior into attribute traits (which, in Moose, are just roles that > are composed into the attribute meta class, Moose::Meta::Attribute). I > had started up a Biome::Meta::Attribute class in case we were to go down > this path, then we could start registering specific traits within that > namespace. > > Just to note, it might be easier to try the simplest approach first and > get tests passing, then layer in traits to see how they act > performance-wise. My guess is they will speed things up, but you never > know. Locations will be a performance bottleneck as they are used in > generic Features. That's seemed to be a saner approach. Will play around with the builder approach and get the tests passing at least. thanks, -siddhartha > > chris From ymc at yahoo.com Wed Aug 19 23:01:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 20:01:28 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <191324.76414.qm@web30403.mail.mud.yahoo.com> I noticed that the $qalseq is a LocatableSeq with gaps. I don't think my program was written to support LocatableSeq with gaps. If I removed the gaps, then I would have the scores agree with each other which should be the desired outcome. --------------------- WARNING --------------------- MSG: In sequence ABC|9986984 residue count gives end value 104. Overriding value [101] with value 104 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 291 Getting score for ABC|9986984 -> ABC|9944760 = 291 Do you think I should check for this LocatableSeq type and give an error or should I remove the gaps if this is a LocatableSeq? Yee Man --- On Wed, 8/19/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 7:49 AM > I'll have a look.? It's probably > something that hasn't been updated to deal with > LocatableSeq's pathological end point checking. > > chris > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > I tried that sample script that reportedly caused the > dpAlign "bug" but I can't reproduced it. All I get is a > warning from LocatableSeq. > > ------------------------------------------- > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > "-Iblib/lib" "-Iblib/arch" > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > --------------------- WARNING --------------------- > > MSG: In sequence ABC|9944760 residue count gives end > value 101. > > Overriding value [104] with value 101 for > Bio::LocatableSeq::end(). > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > --------------------------------------------------- > > Getting score for ABC|9944760 -> ABC|9986984 > > = 300 > > Getting score for ABC|9986984 -> ABC|9944760 > > = 303 > > ------------------------------------------ > > > > Does the test script crash in your machine? > > > > Yee Man > > > > --- On Tue, 8/18/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was > Re: [Bioperl-l] Problems with Bioperl-ext package on > WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Tuesday, August 18, 2009, 10:28 PM > >> On Aug 18, 2009, at 11:37 PM, Robert > >> Buels wrote: > >> > >>> Yee Man Chan wrote: > >>>> Is it going to be an arrangement similar > to > >> bioconductor? If so, I suppose then it makes > sense. But you > >> might want to develop scripts to automatically > download and > >> install new modules to make it user friendly. > >>> Yes, we are probably going to make a > Task::BioPerl or > >> something similar. > >>> > >>>> What do you mean by Bio-Ext is going away? > I > >> notice quite many people using dpAlign. So if > Bio-Ext is > >> going away, then at least dpAlign should become > another spin > >> off. > >>> By going away, I meant that everything in > there is > >> going to be spinned off.? Except modules that > are no > >> longer maintainable, if there are any in there. > >>> > >>> Rob > >> > >> dpAlign could become another spinoff, yes, if it's > used > >> (and works fine).? The problematic code dealt > with pSW, > >> alignment statistics, and staden io_lib support > (the latter > >> which is fairly bit rotted now): > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > >> > >> dpAlign has it's own bug: > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > >> > >> chris > >> > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at gmail.com Thu Aug 20 04:46:52 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Thu, 20 Aug 2009 10:46:52 +0200 Subject: [Bioperl-l] SCF installation Message-ID: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Hi, I am trying to install SCF (a prerequisite to samtools). I installed libread and the compilation seems to be working, only test is failing: zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::SCF zoppel:Bio-SCF-1.01 bernd$ make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c Running Mkbootstrap for Bio::SCF () chmod 644 SCF.bs rm -f blib/arch/auto/Bio/SCF/SCF.bundle LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ -lread -lz \ chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs chmod 644 blib/arch/auto/Bio/SCF/SCF.bs Manifying blib/man3/Bio::SCF.3pm zoppel:Bio-SCF-1.01 bernd$ make test PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) Failed 18/18 subtests Test Summary Report ------------------- t/scf.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: Bad plan. You planned 18 tests but ran 0. Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 csys = 0.11 CPU) Result: FAIL Failed 1/1 test programs. 0/0 subtests failed. make: *** [test_dynamic] Error 2 Any idea what might be going wrong? Please not that in the directory there are some file empty: ls -ltr -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER -rw-r--r-- 1 bernd staff 532 17 mai 2006 README -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm drwxr-xr-x 3 bernd staff 102 17 mai 2006 t drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . Thanks, Bernd From cain.cshl at gmail.com Thu Aug 20 10:30:33 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 20 Aug 2009 10:30:33 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: Hi Bernd, Bio::SCF isn't technically part of BioPerl, but I have installed it before so I'll take a shot: do you have the Staden io-lib installed? It is a prereq for Bio::SCF. If you did install it, is it in a normal library path, and did you run ldconfig (if appropriate for your system) after installing it? io-lib can be obtained here: http://staden.sourceforge.net/ If you do have all of those things in place, what version of io-lib are you using? I wonder if there is an incompatibility between Bio::SCF and your version. The INSTALL doc for Bio::SCF indicates that you should have version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may have broken an api call that Bio::SCF depends on. Scott On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only > test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - > typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv > SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN > SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 > cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From dan.bolser at gmail.com Thu Aug 20 11:00:41 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 20 Aug 2009 16:00:41 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? Message-ID: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Hi, Quick version: How do I get a column of Bio::SimpleAlign using ungapped 'reference' sequence coordinates? Longer version: I have a set of pairwise alignments that I would like to process into a 'multiple sequence alignment' (MSA). All the alignments are short sequence 'contigs' aligned to a 'reference' sequence, so one sequence in all the pairwise alignments is constant (making the resulting MSA unambiguous). I came up with the following pseudo-code to create a MSA (Bio::SimpleAlign) from the set of pairwise alignments... initialise: Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. for each pairwise alignment: Create a Bio::LocatableSeq from the given fragment of the REFERENCE sequence (using ungapped REFERENCE coordinates). for each gap in the REFERENCE sequence: Take the position of the gap (in ungapped REFERENCE coordinates) and look up the corresponding column of the MSA (in ungapped REFERENCE coordinates). for each sequence in the column: Check if there is a gap-character at this position. if any sequence has a non gap-character at this position: Stick a gap in the MSA just before this position. Create a Bio::LocatableSeq from the CONTIG sequence (using ungapped REFERENCE coordinates) and add it to the Bio::SimpleAlign. done. I would very much appreciate, 1) feedback on the correctness of the above algorithm (it could be horribly wrong), and 2) advice on how to get a column of the alignment using ungapped REFERENCE coordinates? Sorry if this is a solved problem (where is it solved?). If not, and if I can get it working, I'll try to write a generic function to merge two MSAs when they have a reference sequence in common. For your reference, the pairwise alignments come from the show-aligns command in the MUMmer sequence alignment package, and have the following format: my.reference.fasta my.contigs.multi.fasta ============================================================ -- Alignments between REFERENCE and CONTIG00012 -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat ^ ^ ^ ^ 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa ^^ ^ ^ ^ 29335 aagacccctacataaatatatggtcccattaatattataaattaataat 97 aagacccctatataaatatatggtctcattaatattataaattaataat ^ ^ ... For further reference: This thread: http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html http://www.bioperl.org/wiki/Align_Refactor http://www.bioperl.org/wiki/Alignment_object All the best, Dan. From lincoln.stein at gmail.com Thu Aug 20 12:07:16 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 20 Aug 2009 12:07:16 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From j_martin at lbl.gov Thu Aug 20 12:41:16 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 20 Aug 2009 09:41:16 -0700 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <20090820164115.GA10681@eniac.jgi-psf.org> Hello, Bio::SCF isn't a pre-requisite of samtools or Bio::Samtools, and neither is actually related to Bioperl. samtools has a pretty active mailing list at sourceforge, you might try asking there. http://sourceforge.net/mailarchive/forum.php?forum_name=samtools-help I use samtools all the time w/o either of those modules. Joel On Thu, Aug 20, 2009 at 10:46:52AM +0200, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Aug 20 12:42:23 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 20 Aug 2009 17:42:23 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? In-Reply-To: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> References: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Message-ID: <4A8D7CEF.4080002@gmail.com> Hi Dan, I think you want the Bio::LocatableSeq method "column_from_residue_number". You might also try combining your pairwise alignments using the profile alignment option in ClustalW. Cheers. Roy. Dan Bolser wrote: > Hi, > > Quick version: How do I get a column of Bio::SimpleAlign using > ungapped 'reference' sequence coordinates? > > > > Longer version: > > I have a set of pairwise alignments that I would like to process into > a 'multiple sequence alignment' (MSA). All the alignments are short > sequence 'contigs' aligned to a 'reference' sequence, so one sequence > in all the pairwise alignments is constant (making the resulting MSA > unambiguous). > > I came up with the following pseudo-code to create a MSA > (Bio::SimpleAlign) from the set of pairwise alignments... > > initialise: > Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. > > for each pairwise alignment: > Create a Bio::LocatableSeq from the given fragment of the > REFERENCE sequence (using ungapped REFERENCE coordinates). > > for each gap in the REFERENCE sequence: > Take the position of the gap (in ungapped REFERENCE > coordinates) and look up the corresponding column of the MSA > (in ungapped REFERENCE coordinates). > > for each sequence in the column: > Check if there is a gap-character at this position. > > if any sequence has a non gap-character at this position: > Stick a gap in the MSA just before this position. > > Create a Bio::LocatableSeq from the CONTIG sequence (using > ungapped REFERENCE coordinates) and add it to the > Bio::SimpleAlign. > > done. > > > I would very much appreciate, 1) feedback on the correctness of the > above algorithm (it could be horribly wrong), and 2) advice on how to > get a column of the alignment using ungapped REFERENCE coordinates? > > > Sorry if this is a solved problem (where is it solved?). If not, and > if I can get it working, I'll try to write a generic function to merge > two MSAs when they have a reference sequence in common. > > > For your reference, the pairwise alignments come from the show-aligns > command in the MUMmer sequence alignment package, and have the > following format: > > my.reference.fasta my.contigs.multi.fasta > > ============================================================ > -- Alignments between REFERENCE and CONTIG00012 > > -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] > > > 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat > 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat > ^ ^ ^ ^ > > 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta > 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa > ^^ ^ ^ ^ > > 29335 aagacccctacataaatatatggtcccattaatattataaattaataat > 97 aagacccctatataaatatatggtctcattaatattataaattaataat > ^ ^ > > ... > > > For further reference: > > This thread: > http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html > > http://www.bioperl.org/wiki/Align_Refactor > > http://www.bioperl.org/wiki/Alignment_object > > > > All the best, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Thu Aug 20 16:31:20 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 20 Aug 2009 16:31:20 -0400 Subject: [Bioperl-l] genbank to fasta conversion Message-ID: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Hello, I have previously converted multiple genbank files to fasta. For some reason I am having trouble with this simple script. #!/usr/bin/perl -w use strict; use Bio::SeqIO; open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); my $in = Bio::SeqIO->new('-file' => "$inFile" , '-format' => 'GenBank'); my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); print $out $_ while <$in>; I keep getting the error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open GLOB(0x36a214): No such file or directory STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ genbank.pm:202 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 ----------------------------------------------------------- I am probably missing something simple, but would appreciate any help. M From cjfields at illinois.edu Thu Aug 20 16:38:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 Aug 2009 15:38:03 -0500 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <7868B105-53AD-4C87-8B21-2E4D4A7781B5@illinois.edu> You are passing filehandles in, not file names. Switch the '-file' parameter to '-fh'. chris On Aug 20, 2009, at 3:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 20 16:43:06 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 20 Aug 2009 13:43:06 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <4A8DB55A.6060605@cornell.edu> The error is that you are opening a filehandle called $outfile, and then you are stringifying it (resulting in a string containing "GLOB(..)", and telling Bio::SeqIO write to a file named "GLOB(...)", which it can't open. You probably want to use the -fh arguments for your two uses of Bio::SeqIO, either that, or remove your open() calls and pass the filenames to the SeqIO objects directly, like: my $in = Bio::SeqIO->new ('-file' => "C:/Documents and Settings/mydir/Desktop/TARGETING.gb", '-format' => 'GenBank', ); my $out = Bio::SeqIO->new ('-file' => ">C:/Documents and Settings/mydir/Desktop/TARGET.fa", '-format' => 'fasta', ); Rob Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From sharpton at berkeley.edu Thu Aug 20 16:40:34 2009 From: sharpton at berkeley.edu (Thomas Sharpton) Date: Thu, 20 Aug 2009 13:40:34 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: This is a problem I think I can solve, so I'm chiming in for once. Looks to me like you're trying to pass a file handle to the -file setting in your SeqIO object. One of the excellent things about using SeqIO is that you don't need to worry about file handles; it's all taken care of under the hood. Try the following adaptation of your script: #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $inFile = "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"; my $outfile = "C:/Documents and Settings/mydir/Desktop/TARGET.fa"; #OPEN A SEQUENCE FILE OF INTEREST ($inFile) AND CREATE A SEQUENCE STREAM ($in) my $in = Bio::SeqIO->new(-file => "$inFile" , '-format' => 'GenBank'); #OPEN AN OUPUT FILE OF INTEREST ($outfile)AND CREATE AN OUTPUT SEQUENCE STREAM ($out) #NOTICE HOW WE SET -file FOR OUTPUT WITH THE > SYMBOL HERE: my $out = Bio::SeqIO->new(-file => ">$outfile" ,'-format' => 'Fasta'); #NOW LET'S DO THE CONVERSION AND DUMP THE OUTPUT #INSTEAD OF DOING THIS #print $out $_ while <$in>; #TRY THIS while(my $seq = $in->next_seq() ){ $out->write_seq($seq) } The above is pretty much what you'll find here: http://www.bioperl.org/wiki/HOWTO:SeqIO which you should definitely look over to better understand what's happening with SeqIO object. Good luck! Tom On Aug 20, 2009, at 1:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 07:34:49 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 13:34:49 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotide database Message-ID: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Hello all I would like to download the wgs sequences of the unfinished genomes from ncbi. (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi here's an example accession NZ_ACVD00000000 and here's the link to the accession at genbank http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 This record contains the accessions that belong to this record in the following line in the genbank output WGS NZ_ACVD01000001-NZ_ACVD01000139 The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession numbers that are are specified by this range. here's a link http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] The bioperl related question is... Since these are unassembled genomes, there are several contigs for each one, and they all available in this record. Is it possible to download a range without trying to recreate each accession number? on the other hand, it is possible to download each individually , this would mean making the following NZ_ACVD01000001 NZ_ACVD01000002 NZ_ACVD01000003 . . . NZ_ACVD01000139 from NZ_ACVD01000001-NZ_ACVD01000139 I can recreate these numbers and download each one separately. However, sometimes I get a timeout exception and the whole thing stops. the code ( copied shamelessly from the bioperl website, works great to get single accessions) my $id = "NZ_ACVD00000000"; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => 'fullcontig.gb'); I did try and catch the exceptions from the get_Response..but its not working as expected... maybe someone can point out what I'm doing wrong here. For some reason, the code never seems to go any print statement in the catch construct... $ele = "somecontig id"; try { print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; $factory->get_Response(-file => "$genbank_file"); } catch Bio::Root::Exception with { my $err = shift; if (! defined $err) { print "MAY HAVE DOWNLOADED $ele..\n"; } else { print "PROBABLE TIMEOUT ERROR\n"; print "$err\n"; } }; Or is it possible to somehow increase the timeout time for the get_Response method? thanks in advance! regards Rohit From bernd.jagla at gmail.com Fri Aug 21 05:30:27 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Fri, 21 Aug 2009 11:30:27 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi, I have installed io_lib-1.9.0. This produces libread.a. I am working on a Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error message. I don't really know how to test that it is working. I am trying to install Bio-SCF-1.01. It seems that the test.scf file cannot be read. Is there another way using some other tools to see if that is working? (Sorry for misrepresenting samtools. I was actually trying to install Bio-Graphics, which was asking for Bio::SCF). Thanks, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln Stein Sent: Thursday, August 20, 2009 6:07 PM To: scott at scottcain.net Cc: bioperl-l at lists.open-bio.org; Bernd Jagla Subject: Re: [Bioperl-l] SCF installation It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Fri Aug 21 09:05:25 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 Aug 2009 09:05:25 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Fri Aug 21 08:50:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 08:50:08 -0400 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Message-ID: <71B4268E5B524F719D24088483568870@NewLife> Hi Rohit- Re: timeout, you could try $factory->ua->timeout($number_greater_than_180_sec) before issuing the request. cheers MAJ ----- Original Message ----- From: "Rohit Ghai" To: Sent: Friday, August 21, 2009 7:34 AM Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase > Hello all > > I would like to download the wgs sequences of the unfinished genomes from > ncbi. > (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi > > here's an example accession > > NZ_ACVD00000000 > > and here's the link to the accession at genbank > > http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 > > This record contains the accessions that belong to this record in the > following line in the genbank output > > WGS NZ_ACVD01000001-NZ_ACVD01000139 > > The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession > numbers that are > > are specified by this range. > > here's a link > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] > > > The bioperl related question is... > > Since these are unassembled genomes, there are several contigs for each one, > and they all available in this record. > > Is it possible to download a range without trying to recreate each accession > number? > > on the other hand, it is possible to download each individually , this would > mean making the following > > NZ_ACVD01000001 > NZ_ACVD01000002 > NZ_ACVD01000003 > . > . > . > NZ_ACVD01000139 > > from NZ_ACVD01000001-NZ_ACVD01000139 > > > I can recreate these numbers and download each one separately. However, > sometimes I get a timeout exception > and the whole thing stops. > > the code ( copied shamelessly from the bioperl website, works great to get > single accessions) > > my $id = "NZ_ACVD00000000"; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => > 'nucleotide', > -id => > $id, > -rettype > => 'gbwithparts'); > > $factory->get_Response(-file => 'fullcontig.gb'); > > > I did try and catch the exceptions from the get_Response..but its not > working as expected... maybe someone can point out what I'm doing wrong > here. For some reason, the code never seems to go any print statement in the > catch construct... > > $ele = "somecontig id"; > > try { > print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; > $factory->get_Response(-file => "$genbank_file"); > > } catch Bio::Root::Exception with { > my $err = shift; > if (! defined $err) { > print "MAY HAVE DOWNLOADED $ele..\n"; > } else { > print "PROBABLE TIMEOUT ERROR\n"; > print "$err\n"; > } > }; > > > Or is it possible to somehow increase the timeout time for the get_Response > method? > > thanks in advance! > > > regards > > Rohit > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at pasteur.fr Fri Aug 21 09:30:38 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Fri, 21 Aug 2009 15:30:38 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina><6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: <0D219C72BC5F432BA5CDBBCFCE94AA02@zillumina> Thanks, I was confused by the error message of Bio::Graphics. Now I tried make, make test and was able to install... Thanks, Let's forget about the rest then since I don't believe I will need that... Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain Sent: Friday, August 21, 2009 3:05 PM To: Bernd Jagla Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] SCF installation Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 09:40:02 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 15:40:02 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <71B4268E5B524F719D24088483568870@NewLife> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> <71B4268E5B524F719D24088483568870@NewLife> Message-ID: <94c73820908210640h3b5854fbxe19c259c66cf9ee4@mail.gmail.com> Thanks! I have made the change... no error yet.. so keeping my fingers crossed cheers Rohit On Fri, Aug 21, 2009 at 2:50 PM, Mark A. Jensen wrote: > Hi Rohit- > Re: timeout, you could try > $factory->ua->timeout($number_greater_than_180_sec) > before issuing the request. > cheers MAJ > ----- Original Message ----- From: "Rohit Ghai" > To: > Sent: Friday, August 21, 2009 7:34 AM > Subject: [Bioperl-l] downloading multiple contigs from ncbi > nucleotidedatabase > > > Hello all >> >> I would like to download the wgs sequences of the unfinished genomes from >> ncbi. >> (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi >> >> here's an example accession >> >> NZ_ACVD00000000 >> >> and here's the link to the accession at genbank >> >> http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 >> >> This record contains the accessions that belong to this record in the >> following line in the genbank output >> >> WGS NZ_ACVD01000001-NZ_ACVD01000139 >> >> The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession >> numbers that are >> >> are specified by this range. >> >> here's a link >> >> >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] >> >> >> The bioperl related question is... >> >> Since these are unassembled genomes, there are several contigs for each >> one, >> and they all available in this record. >> >> Is it possible to download a range without trying to recreate each >> accession >> number? >> >> on the other hand, it is possible to download each individually , this >> would >> mean making the following >> >> NZ_ACVD01000001 >> NZ_ACVD01000002 >> NZ_ACVD01000003 >> . >> . >> . >> NZ_ACVD01000139 >> >> from NZ_ACVD01000001-NZ_ACVD01000139 >> >> >> I can recreate these numbers and download each one separately. However, >> sometimes I get a timeout exception >> and the whole thing stops. >> >> the code ( copied shamelessly from the bioperl website, works great to get >> single accessions) >> >> my $id = "NZ_ACVD00000000"; >> my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', >> -db => >> 'nucleotide', >> -id => >> $id, >> -rettype >> => 'gbwithparts'); >> >> $factory->get_Response(-file => 'fullcontig.gb'); >> >> >> I did try and catch the exceptions from the get_Response..but its not >> working as expected... maybe someone can point out what I'm doing wrong >> here. For some reason, the code never seems to go any print statement in >> the >> catch construct... >> >> $ele = "somecontig id"; >> >> try { >> print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; >> $factory->get_Response(-file => "$genbank_file"); >> >> } catch Bio::Root::Exception with { >> my $err = shift; >> if (! defined $err) { >> print "MAY HAVE DOWNLOADED $ele..\n"; >> } else { >> print "PROBABLE TIMEOUT ERROR\n"; >> print "$err\n"; >> } >> }; >> >> >> Or is it possible to somehow increase the timeout time for the >> get_Response >> method? >> >> thanks in advance! >> >> >> regards >> >> Rohit >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From rmb32 at cornell.edu Fri Aug 21 15:39:31 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 Aug 2009 12:39:31 -0700 Subject: [Bioperl-l] added a perltidy profile file Message-ID: <4A8EF7F3.0@cornell.edu> This one is copied from the parrot project. I added it in maintenance/perltidy.conf. Have a look, tweak as you see fit. The idea with perltidy profile files is to use them to enforce coding style rules. So this perltidy profile file would be the place to codify the BioPerl coding standards, such as indentation, use of cuddled elses, etc. So here is one, let's customize it for our needs. The way I usually run perltidy is with -b to modify a file in-place, and with the '-pro=' option to specify a profile file. Example: perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Aug 21 17:03:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 16:03:07 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <25037707.post@talk.nabble.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> <25037707.post@talk.nabble.com> Message-ID: On Aug 18, 2009, at 11:39 PM, deequan wrote: > > Howdy there, > > Yes, quite right. I apologize for the double posting. > Moreover, I > appreciate your assistance in trying to sort out what can and cannot > be done > with bioperl. To address the problem previously stated, I put > together a > remarkably misbehaving script that has the following parts: > > #Some parsing: > $q_start = $hsp->query->start; > $q_end = $hsp->query->end; > $h_start = $hsp->hit->start; > $h_end = $hsp->hit->end; > $length = $hsp->query->seqlength(); > $id = $hit->accession; > > print OUT "$id\t"; > my $seq; > if($h_start<$h_end){ > > #the bit per your recommendation > my $begin = $h_start-$q_start+1; > my $cease = ($length - $q_end) + $h_end; > my $strand = 1; > my $factory = Bio::DB::GenBank->new(-format=> 'genbank', > -seq_start =>$begin, > -seq_stop =>$cease, > -strand => $strand, #1 = plus, 2 = minus > ); > $seq = $factory->get_Seq_by_acc($id); > }else{#else assume backward, code not shown} > [ > #and some stuff to retrieve the sequence > > my $len = $seq->length(); > my $string = $seq->subseq(1, $len); > print OUT "length = $len\t"; > print OUT "seq = $string\n"; ] Not sure what you are doing with the above sequence. The abve > In your previous reply, you said the code accessing the seq object > created > by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO > for > basic IO purposes. # create an output seq stream somewhere my $out = Bio::SeqIO->new(-file => '>sequences.gb', -format => 'genbank'); .... # take seq object ($seq), write to the stream $out->write_seq($seq); > Not seeing exactly how to go about that, I tried some > other functions in combination that seemed as though they should work > (length() and subseq()). Unfortunately, the program does not even > run to > that point, as the script throws an exception: > > ------------- EXCEPTION ------------- > MSG: acc CP000948 does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 > 2 > STACK toplevel test.pl:36 > ------------------------------------- > > > Oddly, the record corresponding to this accession number can be > found here: > http://www.ncbi.nlm.nih.gov/nuccore/169887498 That's probably something to do with NCBI unfortunately; I'll have to look into it. The best alternative is if you have BLAST reports that include the GI (or UID). That's the most reliable number (using that in coordination with get_Seq_by_id), but it's not on by default, you have to indicate it's inclusion. More recent versions of Bio::SearchIO::blast parse out the GI from the descriptor if it's present. > Perhaps you'd be willing to offer another hint. Thank you for your > assistance thus far. And on behalf of all posters, thank you for > sharing > your knowledge. 'Preciate. > > David Q. No problem. chris From dan.bolser at gmail.com Fri Aug 21 17:55:37 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 21 Aug 2009 22:55:37 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <4A8EF7F3.0@cornell.edu> References: <4A8EF7F3.0@cornell.edu> Message-ID: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Cheers Rob, Whatever objectons may arise from style x or style y, I think it's a great idea to at least have one style or another recognized as being 'standard'. I know TMTOWTDI, but on a project like this, with so many contributors and users, it's essential to at least have a recommendation. I'll try to use this on any contribs. As you pointed out [1], its probably best to provide two patches for any change involving a formating clean up: one to change the fomat to the standard and one to commit the actual code changes. All the best, Dan. [1] irc://irc.freenode.net/#bioperl 2009/8/21 Robert Buels : > This one is copied from the parrot project. ?I added it in > maintenance/perltidy.conf. > Have a look, tweak as you see fit. > > The idea with perltidy profile files is to use them to enforce coding style > rules. ?So this perltidy profile file would be the place to codify the > BioPerl coding standards, such as indentation, use of cuddled elses, etc. > > So here is one, let's customize it for our needs. ?The way I usually run > perltidy is with -b to modify a file in-place, and with the '-pro=' option > to specify a profile file. > > Example: > ? perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY ?14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Fri Aug 21 23:12:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 23:12:55 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <86486D3736614E6A81AF9521B5BB796A@NewLife> Thanks to all (six, seven including Rob and his perltidy) who responded to this thread. (Lurkers, you are not volunteering by responding, honest.) I'm preparing a wiki page (of course) with the major points, some further comments, and an action plan for your consideration. Watch this space. cheers, MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "BioPerl List" Cc: "Chris Fields" Sent: Friday, August 14, 2009 10:32 PM Subject: [Bioperl-l] on BP documentation > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >>I guess that from my perspective, after doing this stuff for >>about 10 years, I personally would prefer to see a "summer of >>documentation" for the bio* languages (or at least bioperl, as that is >>the only one I ever look at). From my own experiences, and from those >>of many colleagues, the documentation for bioperl has gone from >>mediocre to quite poor in the last few years. I largely think the >>wikification of the docs are to blame for this. Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. There is >>far too much inside baseball going on in the documentation scheme. > >>So I worry more about the constant adding of features at the expense >>of documenting what is already there. This is just my 2 cents, and it >>is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I must > agree > that BP docs are rather a rat's nest and of varying quality, but taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal about the > wikification, but I wager that the power of the wiki could be leveraged > to greater advantage (right, Dan?). > > I think that what we all as developers love is to code, and detest is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it all hang > out', > as they said in my mom's era. > > cheers > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 22 00:11:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 23:11:42 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <86486D3736614E6A81AF9521B5BB796A@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <86486D3736614E6A81AF9521B5BB796A@NewLife> Message-ID: <594EBBA3-5043-4DDF-9157-65195747266D@illinois.edu> Mark, One suggestion that i agree with: we need to add API-specific module documentation to the site somehow (not just links to CPAN/PDOC). There are a few ways to do so; a quick way may be to install something like the Mediawiki SecureHTML extension and create a protected template (this would be for pdoc, cpan, or both). Another one is to write up a pod2wiki converter and create API- specific pages, then have a bot automate the pages. A POD extension also exists, but we would still need to embed code. I much prefer the extensions than anything else. chris On Aug 21, 2009, at 10:12 PM, Mark A. Jensen wrote: > Thanks to all (six, seven including Rob and his perltidy) who > responded to this thread. (Lurkers, you are not volunteering > by responding, honest.) I'm preparing a wiki page (of course) > with the major points, some further comments, and an action > plan for your consideration. Watch this space. > cheers, > MAJ > ----- Original Message ----- From: "Mark A. Jensen" > > To: "BioPerl List" > Cc: "Chris Fields" > Sent: Friday, August 14, 2009 10:32 PM > Subject: [Bioperl-l] on BP documentation > > >> Hi All -- >> >> Off-list, an old colleague of mine had this insightful, if damning, >> comment: >> >>> I guess that from my perspective, after doing this stuff for >>> about 10 years, I personally would prefer to see a "summer of >>> documentation" for the bio* languages (or at least bioperl, as >>> that is >>> the only one I ever look at). From my own experiences, and from >>> those >>> of many colleagues, the documentation for bioperl has gone from >>> mediocre to quite poor in the last few years. I largely think the >>> wikification of the docs are to blame for this. Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. There is >>> far too much inside baseball going on in the documentation scheme. >> >>> So I worry more about the constant adding of features at the expense >>> of documenting what is already there. This is just my 2 cents, >>> and it >>> is disappointing to see a downward trend for bioperl in this regard. >> >> I would be really interested in all responses from the list users. >> I must agree >> that BP docs are rather a rat's nest and of varying quality, but >> taken in >> toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount >> of useful and sophisticated information available. I think there are >> approaches we can take to reorganize and standardize the accession >> of it to make it more useful and inviting. I disagree with my pal >> about the >> wikification, but I wager that the power of the wiki could be >> leveraged >> to greater advantage (right, Dan?). >> >> I think that what we all as developers love is to code, and detest >> is to >> document. Since BP is all-volunteer, and volunteers tend to do what >> they like -- the beauty of open source, btw -- documentation reorg >> and cleanup probably must devolve to the Core. I am willing to lead >> such an effort, which will take some time, and more time the fewer >> volunteers there are. First let's hear some thoughts, and 'let it >> all hang out', >> as they said in my mom's era. >> >> cheers >> Mark >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sat Aug 22 10:55:06 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sat, 22 Aug 2009 16:55:06 +0200 Subject: [Bioperl-l] Getting genomic coordinates for a list of SNPs Message-ID: <2ac05d0f0908220755y59b029f2u82eede5b29836a1d@mail.gmail.com> Dear list, I'm searching for a script like this http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates to get the genomic position of a SNP, not a Gene. Does it exist? Thanks a lot Emanuele From cjfields at illinois.edu Sat Aug 22 16:17:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 22 Aug 2009 15:17:46 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> Message-ID: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Anand, You should always post emails to the bioperl-l mailing list, never to individual developers (you'll get an answer much faster). Keep responses on the list as well. Though I use bioperl-db some, I'm probably not the best person to ask. Does anyone know what's going on with this? Does this have to do with the Species/Taxon refactoring? chris Begin forwarded message: > From: "Anand C. Patel" > Date: August 22, 2009 2:57:42 PM CDT > To: cjfields at illinois.edu > Subject: problem with bioperl (where's the Mus?) > > Dr. Fields, > > I'm struggling with what seems to be a strange quirk in Bioperl +/- > Bioperl-db/BioSQL. > > I've successfully loaded in genbank sequences into a biosql database. > > When I try to write a genbank sequence back out, a curious thing > happens -- the Genus is missing from the SOURCE and ORGANISM areas. > > Despite reporting: > primary tag: source > tag: chromosome > value: 3 > > tag: db_xref > value: taxon:10090 > > tag: map > value: 3 74.5 cM > > tag: mol_type > value: mRNA > > tag: organism > value: Mus musculus > The sequence when printed out via SeqIO looks like this: > LOCUS NM_017474 2935 bp dna linear ROD > 13-AUG-2009 > DEFINITION Mus musculus chloride channel calcium activated 3 > (Clca3), mRNA. > ACCESSION NM_017474 XM_978159 > VERSION NM_017474.2 GI:255918210 > KEYWORDS . > SOURCE musculus > ORGANISM musculus > Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; > Bilateria; > Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; > Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; > Tetrapoda; > Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; > Glires; > Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. > Confession -- I have a final project due Monday wherein I boldly > elected to interface Bioperl, MySQL, Perl, and CGI. > (I'm an MD getting my MS in Bioinformatics.) > After many misadventures, I'm getting to the point where I could > actually complete the objectives, but this is bug is rather > problematic. > Thanks, > Anand > Anand C. Patel, MD > Assistant Professor of Pediatrics > Division of Allergy/Pulmonary Medicine > Department of Pediatrics > Washington University School of Medicine > 660 South Euclid Ave, Campus Box 8052 > St. Louis, MO 63110 > acpatel at wustl.edu > acpatel at gmail.com > acpatel at jhu.edu > From hlapp at gmx.net Sat Aug 22 17:36:42 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:36:42 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Message-ID: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> That's a pretty strange bug. Anand, which version of BioPerl and Bioperl-db are you running? Note that the genus *is* actually there in the lineage (and hence does get retrieved from the database). Apparently the Species object fails to pull it out correctly, though? Anand - I suspect there have been some warnings printed to the terminal - can you post these, and otherwise confirm that there haven't been any? -hilmar On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > Anand, > > You should always post emails to the bioperl-l mailing list, never > to individual developers (you'll get an answer much faster). Keep > responses on the list as well. > > Though I use bioperl-db some, I'm probably not the best person to > ask. Does anyone know what's going on with this? Does this have to > do with the Species/Taxon refactoring? > > chris > > Begin forwarded message: > >> From: "Anand C. Patel" >> Date: August 22, 2009 2:57:42 PM CDT >> To: cjfields at illinois.edu >> Subject: problem with bioperl (where's the Mus?) >> >> Dr. Fields, >> >> I'm struggling with what seems to be a strange quirk in Bioperl +/- >> Bioperl-db/BioSQL. >> >> I've successfully loaded in genbank sequences into a biosql database. >> >> When I try to write a genbank sequence back out, a curious thing >> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >> >> Despite reporting: >> primary tag: source >> tag: chromosome >> value: 3 >> >> tag: db_xref >> value: taxon:10090 >> >> tag: map >> value: 3 74.5 cM >> >> tag: mol_type >> value: mRNA >> >> tag: organism >> value: Mus musculus >> The sequence when printed out via SeqIO looks like this: >> LOCUS NM_017474 2935 bp dna linear ROD >> 13-AUG-2009 >> DEFINITION Mus musculus chloride channel calcium activated 3 >> (Clca3), mRNA. >> ACCESSION NM_017474 XM_978159 >> VERSION NM_017474.2 GI:255918210 >> KEYWORDS . >> SOURCE musculus >> ORGANISM musculus >> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >> Bilateria; >> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >> Tetrapoda; >> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >> Glires; >> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >> Confession -- I have a final project due Monday wherein I boldly >> elected to interface Bioperl, MySQL, Perl, and CGI. >> (I'm an MD getting my MS in Bioinformatics.) >> After many misadventures, I'm getting to the point where I could >> actually complete the objectives, but this is bug is rather >> problematic. >> Thanks, >> Anand >> Anand C. Patel, MD >> Assistant Professor of Pediatrics >> Division of Allergy/Pulmonary Medicine >> Department of Pediatrics >> Washington University School of Medicine >> 660 South Euclid Ave, Campus Box 8052 >> St. Louis, MO 63110 >> acpatel at wustl.edu >> acpatel at gmail.com >> acpatel at jhu.edu >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 17:42:32 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:42:32 -0400 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: Consistent coding style is in principle a good thing. It's also worth to keep in mind one of the old BioPerl principles - don't change working code purely to change style. In my interpretation of the rule, however, this has always applied to code writing style, and not code formatting style. I'm assuming the goal here is only to make the formatting consistent. -hilmar On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > Cheers Rob, > > Whatever objectons may arise from style x or style y, I think it's a > great idea to at least have one style or another recognized as being > 'standard'. I know TMTOWTDI, but on a project like this, with so many > contributors and users, it's essential to at least have a > recommendation. I'll try to use this on any contribs. > > As you pointed out [1], its probably best to provide two patches for > any change involving a formating clean up: one to change the fomat to > the standard and one to commit the actual code changes. > > > All the best, > Dan. > > [1] irc://irc.freenode.net/#bioperl > > > 2009/8/21 Robert Buels : >> This one is copied from the parrot project. I added it in >> maintenance/perltidy.conf. >> Have a look, tweak as you see fit. >> >> The idea with perltidy profile files is to use them to enforce >> coding style >> rules. So this perltidy profile file would be the place to codify >> the >> BioPerl coding standards, such as indentation, use of cuddled >> elses, etc. >> >> So here is one, let's customize it for our needs. The way I >> usually run >> perltidy is with -b to modify a file in-place, and with the '-pro=' >> option >> to specify a profile file. >> >> Example: >> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 19:21:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 19:21:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > [...] > I think I know what's broken. Using load_seqdatabases.pl, I'd put a > set of sequences from genbank into a biosql db in mysql. > > I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl > script from biosql. Did you load the NCBI taxonomy first, or afterwards? > > When I searched for house (as in house mouse), I found that the name > of the type of taxon class was "genbank common name". > > When I searched for musculus, it does appear as a type of > "scientific name". It is the 'scientific name' class names that Bioperl-db will onto the lineage array. > [...] > I'm not just getting warnings. I'm getting errors. Tons of them. > It's a wonder it's working at all. I'm not sure what you're referring to, but what you pasted into your email were neither errors nor warnings but a debugging log (and what it prints looks like it's working fine). You triggered that by setting -verbose to a value greater than 0. If you don't want debugging output, then you can just leave off that argument (no debugging output is the default). > > I started with the getentry.cgi script in the cgi-bin folder, and > stripped most of it away. I see - which reminds me that I need to look at that script; I'm afraid it hasn't been updated for a long time (that doesn't mean though that it can't work - the core API has been stable for years). > > Code: > #!/usr/bin/perl > > [...] > if( $@ || !defined $seq) { > print "Got fetch exception of...\n
$@\n
"; > exit(0); > } Wouldn't you want to put that right after the eval() clause? -hilmar > > >> >> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >> >>> Anand, >>> >>> You should always post emails to the bioperl-l mailing list, never >>> to individual developers (you'll get an answer much faster). Keep >>> responses on the list as well. >>> >>> Though I use bioperl-db some, I'm probably not the best person to >>> ask. Does anyone know what's going on with this? Does this have >>> to do with the Species/Taxon refactoring? >>> >>> chris >>> >>> Begin forwarded message: >>> >>>> From: "Anand C. Patel" >>>> Date: August 22, 2009 2:57:42 PM CDT >>>> To: cjfields at illinois.edu >>>> Subject: problem with bioperl (where's the Mus?) >>>> >>>> Dr. Fields, >>>> >>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>> +/- Bioperl-db/BioSQL. >>>> >>>> I've successfully loaded in genbank sequences into a biosql >>>> database. >>>> >>>> When I try to write a genbank sequence back out, a curious thing >>>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>>> >>>> Despite reporting: >>>> primary tag: source >>>> tag: chromosome >>>> value: 3 >>>> >>>> tag: db_xref >>>> value: taxon:10090 >>>> >>>> tag: map >>>> value: 3 74.5 cM >>>> >>>> tag: mol_type >>>> value: mRNA >>>> >>>> tag: organism >>>> value: Mus musculus >>>> The sequence when printed out via SeqIO looks like this: >>>> LOCUS NM_017474 2935 bp dna linear >>>> ROD 13-AUG-2009 >>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>> (Clca3), mRNA. >>>> ACCESSION NM_017474 XM_978159 >>>> VERSION NM_017474.2 GI:255918210 >>>> KEYWORDS . >>>> SOURCE musculus >>>> ORGANISM musculus >>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>> Bilateria; >>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>> Tetrapoda; >>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>> Glires; >>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>> Confession -- I have a final project due Monday wherein I boldly >>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>> (I'm an MD getting my MS in Bioinformatics.) >>>> After many misadventures, I'm getting to the point where I could >>>> actually complete the objectives, but this is bug is rather >>>> problematic. >>>> Thanks, >>>> Anand >>>> Anand C. Patel, MD >>>> Assistant Professor of Pediatrics >>>> Division of Allergy/Pulmonary Medicine >>>> Department of Pediatrics >>>> Washington University School of Medicine >>>> 660 South Euclid Ave, Campus Box 8052 >>>> St. Louis, MO 63110 >>>> acpatel at wustl.edu >>>> acpatel at gmail.com >>>> acpatel at jhu.edu >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 10:38:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:38:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: On Aug 22, 2009, at 9:13 PM, Anand C. Patel wrote: > Turns out that using the default namespace bioperl doesn't change > anything. No it shouldn't, so long as you are consistent about it. (And if you're not, all that should happen is that you don't find your sequences any more.) > > Common name -- still "genbank common name" in name_class in the > taxon_name table for "house mouse", which I think the module is > looking for as "common name". If you are loading the NCBI taxonomy first, this is coming from NCBI, not one of the scripts or BioPerl, and hence we have no control over it. Are you saying that there is no designated name of class 'common name' for Mus musculus in the NCBI taxonomy dump? Also, the common name being present or not should have no bearing on the lineage array, where the actual problem is, so I don't understand right now how this would be connected to the problem you are seeing. > > It's not behaving differently despite reloading the sequences. > > I've created a horrible munge that fixes it for cosmetic purposes: > my $species = $seq->species; > my $justspecies = $species->scientific_name(); > my $binspecies = $species->binomial(); > > my $gbstring2 = $gbstring; > > $gbstring2 =~ s/$binspecies/$justspecies/g; > $gbstring2 =~ s/$justspecies/$binspecies/g; I don't understand what you are trying to achieve here - it seems like you are making a substitution and then reverting it? Also, $species- >scientific_name() and $species->binomial() should be identical for Mus musculus - are you finding different values being returned? So in essence, I wouldn't expect your above code snippet to have any effect, for both of these reasons. How do you find $gbstring2 to be different from $gbstring at the end of this block of code? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 10:42:58 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:42:58 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Message-ID: <119BC08A-6D3A-4D03-B0D5-7619EDE682AE@gmx.net> On Aug 22, 2009, at 8:13 PM, Anand C. Patel wrote: > Do I need to load ontology before loading sequences? You don't. Especially if you load genbank sequences as they come. Loading ontologies that are used for sequence annotation is useful as it will get your features (or sequences) linked to fully populated (description, synonyms, relationships, etc) terms rather than skeleton term records created on the fly. However, in GenBank format ontology terms are part of the feature table, and require a post-processing (using, e.g., a SeqProcessor class) step to be identified and turned into Bio::Annotation::OntologyTerm objects. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jorismeys at gmail.com Sun Aug 23 11:08:47 2009 From: jorismeys at gmail.com (joris meys) Date: Sun, 23 Aug 2009 17:08:47 +0200 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree Message-ID: Hi, I'm currently exploring the phylogenetic parts of Bio Perl, but I can't seem to find a quick solution to following problem : Say you have a tree obtained by a certain method. From this tree, you want to have the evolutionary distances between species, defined as the sum of the branch lengths between any 2 species. There is as far as I know no function for doing that. But is there a possibility to get a list of some sort of "shortest paths" from one species to another, allowing to easily calculate that matrix? >From the phylip package, I get following data if I run the neighbor or fitch program. From there I can easily get an algorithm to calculate the distances I need. But I also need to do that for maximum likelihood trees and the like. Is there a way to get this information in Bio Perl? >From to dist node1 sp1 xxxxx node2 sp3 xxxxxx node1 node2 xxxxx node 1 sp2 xxxxx Kind regards Joris From heikki.lehvaslaiho at gmail.com Mon Aug 24 01:59:22 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 24 Aug 2009 08:59:22 +0300 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: De facto coding style standard for BioPerl has been emacs using cperl mode and bioperl.list file. As long as this configuration does not change the conventions used, I see this as great way in helping to format code from other editors. -Heikki 2009/8/23 Hilmar Lapp : > Consistent coding style is in principle a good thing. > > It's also worth to keep in mind one of the old BioPerl principles - don't > change working code purely to change style. In my interpretation of the > rule, however, this has always applied to code writing style, and not code > formatting style. I'm assuming the goal here is only to make the formatting > consistent. > > ? ? ? ?-hilmar > > On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > >> Cheers Rob, >> >> Whatever objectons may arise from style x or style y, I think it's a >> great idea to at least have one style or another recognized as being >> 'standard'. I know TMTOWTDI, but on a project like this, with so many >> contributors and users, it's essential to at least have a >> recommendation. I'll try to use this on any contribs. >> >> As you pointed out [1], its probably best to provide two patches for >> any change involving a formating clean up: one to change the fomat to >> the standard and one to commit the actual code changes. >> >> >> All the best, >> Dan. >> >> [1] irc://irc.freenode.net/#bioperl >> >> >> 2009/8/21 Robert Buels : >>> >>> This one is copied from the parrot project. ?I added it in >>> maintenance/perltidy.conf. >>> Have a look, tweak as you see fit. >>> >>> The idea with perltidy profile files is to use them to enforce coding >>> style >>> rules. ?So this perltidy profile file would be the place to codify the >>> BioPerl coding standards, such as indentation, use of cuddled elses, etc. >>> >>> So here is one, let's customize it for our needs. ?The way I usually run >>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>> option >>> to specify a profile file. >>> >>> Example: >>> ?perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>> >>> Rob >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY ?14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp ?-:- ?Durham, NC ?-:- ?hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Building #2, Office #4216 Computational Bioscience Research Centre (CBRC) 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From geoeco at rambler.ru Mon Aug 24 05:20:13 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Mon, 24 Aug 2009 13:20:13 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file Message-ID: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Dear all, I am trying to extract species taxonomy from ORGANISM line. In fact I only need a first line under ORGANISM tag (e.i. genus + species). I though that it would be possible to do with the SeqBuilder object by stating $builder->add_wanted_slot('display_id','species'); the problem is, however, that I've got an empty file as a result. What might be wrong with the script (see below)? Thanks a lot in advance for any ideas, ------------------------------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'raw'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species'); while(my $seq = $seq_in->next_seq()) { $seq_out->write_seq($seq); } exit; ---------------------------------------------------- Anna From maj at fortinbras.us Mon Aug 24 07:30:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 07:30:27 -0400 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree In-Reply-To: References: Message-ID: Hi Joris, AFAIK, there is only one path between any two nodes in a typical phylogenetic tree, the one passing through the most recent common ancestor of the nodes. The distance() method in Bio::Tree::TreeFunctionsI will give you what I think you want: use Bio::TreeIO; use Bio::Tree::TreeFunctionsI; $t = Bio::TreeIO->new(-file=>'t/data/urease.tre.nexus', -format=>'nexus')->next_tree; $n1 = $t->find_node('Anidulans'); $n2 = $t->find_node('Ncrassa'); $dist = $t->distance(-nodes => [$n1, $n2] ); print $dist; Use the Bio::TreeIO package to read in the tree in your favorite format; it will handle many. cheers, MAJ ----- Original Message ----- From: "joris meys" To: Sent: Sunday, August 23, 2009 11:08 AM Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree > Hi, > > I'm currently exploring the phylogenetic parts of Bio Perl, but I > can't seem to find a quick solution to following problem : > Say you have a tree obtained by a certain method. From this tree, you > want to have the evolutionary distances between species, defined as > the sum of the branch lengths between any 2 species. There is as far > as I know no function for doing that. But is there a possibility to > get a list of some sort of "shortest paths" from one species to > another, allowing to easily calculate that matrix? > >>From the phylip package, I get following data if I run the neighbor or > fitch program. From there I can easily get an algorithm to calculate > the distances I need. But I also need to do that for maximum > likelihood trees and the like. Is there a way to get this information > in Bio Perl? >>From to dist > node1 sp1 xxxxx > node2 sp3 xxxxxx > node1 node2 xxxxx > node 1 sp2 xxxxx > > Kind regards > Joris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.bolser at gmail.com Mon Aug 24 08:26:13 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:26:13 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <2c8757af0908240526j1cb0a455x53f7f3dccaceda86@mail.gmail.com> 2009/8/24 Heikki Lehvaslaiho : > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. 'bioperl.list' file? I guess you made a typo and you mean bioperl.lisp http://www.bioperl.org/wiki/Emacs_template > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - don't >> change working code purely to change style. In my interpretation of the >> rule, however, this has always applied to code writing style, and not code >> formatting style. I'm assuming the goal here is only to make the formatting >> consistent. I have changed coding style in the past. IIRC this was in the Quality.pm file. I made the changes because two different styles were being used to do (roughly) the same thing at different points in the script. The two styles were being used interchangeably (at random?). As a noob, the use of two different styles was very confusing, because I didn't know if the difference was significant or what the significance of the difference might be. I resolved the issue by writing a set of additional tests and then slowly harmonizing the coding style while confirming that the tests were still running OK. In this case I think it was reasonable to try to have a consistent style at least within the module. Or should I have left the style as it was? Cheers, Dan. From dan.bolser at gmail.com Mon Aug 24 08:50:46 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:50:46 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> Message-ID: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> I just ran into the same problem described here. Here is my code to demonstrate what I expected: #!/usr/bin/perl -w use strict; use Bio::SimpleAlign; use Bio::LocatableSeq; use Bio::AlignIO; my $CLUDGE = 0; ## REF tacattaaagacccg ## SEQ1 taca.taaa...... ## SEQ2 .....taaaga.ccg my $aln = Bio::SimpleAlign->new(); $aln->gap_char('.'); my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' ); my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' ); $aln->add_seq( $r ); $aln->add_seq( $s1 ); $aln->add_seq( $s2 ); if($CLUDGE){ foreach(($r, $s1, $s2)){ $_->seq( '.' x ($_->start - 1) . $_->seq ) } } ## Prepare an 'output stream' for the alignment: my $aliWriter = Bio::AlignIO-> new( -fh => \*STDOUT, -format => 'clustalw', ); warn "\nOUTPUT:\n"; $aliWriter->write_aln($aln); I was calling the "fill in the gaps yourself" step a CLUDGE because I had expected the alignment object to take care of this for me. Is there any reason that it couldn't do this 'CLUDGE' automatically? It seems strange that it insists on being passed locatable sequence objects, but then largely ignore the given location. Would it not be possible to have this happen when the sequences are written out from the alignment? I think it should still be possible to index the column number via the (gapless) sequence number... or did I get confused? There are two levels of confusion here (on my part), 1) the concepts behind the objects and 2) the implementation details. Thanks for any hints on how to understand or potentially how to fix these problems. Cheers, Dan. 2009/7/22 Mark A. Jensen : > Hi Paolo, > I think I see what you want to do, however, it doesn't quite work > this way. I'm supposing you want to specify something like > > s1/3-6 attc > s2/7-10 gaag > > and obtain output like > > s1 --attc---- > s2 ------gaag > > But (and this is why LocatableSeqs are "locatable"), the alignment described > by the former data is always going to be > > s1 attc > s2 gaag > > so that I can query the alignment *column* number 1 and obtain > the residue coordinates of the original sequences in that column: > > $loc = $aln->get_seq_by_pos(1)->location_from_column(1); # 3 > > or vice-versa > > $col = $aln->column_from_residue_number( 's1', 3); # 1 > > As far as I know, you have to fill in the gaps yourself; a good > exercise, since you already have all the information you need, in having set > up the start and end coordinates (which are really > the column coordinates in this model). > If this wasn't what you had in mind, I apologize. > cheers, Mark > > > ----- Original Message ----- From: "Paolo Pavan" > To: > Sent: Thursday, July 16, 2009 6:17 AM > Subject: [Bioperl-l] Bio::SimpleAlign constructor? > > >> Hi, >> I have a brief question: I would like to know if there is a method to >> obtain a valid formatted and flush Bio::SimpleAlign object (i.e. >> properly filled with gaps on the right and on the left side of each >> sequence) given a bounch of Bio::LocatableSeq objects in which I have >> specified the -start and -end properties. >> Can anyone help me? Thank you very much, >> >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghai.rohit at gmail.com Mon Aug 24 08:53:03 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Mon, 24 Aug 2009 14:53:03 +0200 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> hi I think you forgot to add the "seq" in the builder.. thats why the file is empty. Also, the species name, though being parsed, is nowhere in the output. Here's a version using fasta output that you can probably customize further. This also takes the full name of the organism and adds to the description line in the output. use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'fasta'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species','seq','description'); while(my $seq = $seq_in->next_seq()) { my $desc = $seq->description(); my $species_string = $seq->species()->binomial('FULL'); $desc = $desc . " [$species_string]"; $seq->description($desc); $seq_out->write_seq($seq); } exit; On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact I only > need a first line under ORGANISM tag (e.i. genus + species). I though that > it would be possible to do with the SeqBuilder object by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 24 08:55:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:55:56 -0500 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Anna, It's stored in the Bio::Species object. I have to say, though, I think you're using a stick of dynamite for a scalpel here; if you only need ORGANISM parse it out directly (it's much faster). Or am I missing something? chris On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 08:56:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:56:02 -0500 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <1E5347D2-A60F-49CB-8F3B-C5E06342417E@illinois.edu> Heikki, perltidy has become the most common way to standardize perl coding style (in a non-text-editor-dependent way). A number of projects have started using it as a means for checking and cleaning up modules prior to release. I think Perl Best Practices reinforced that. chris On Aug 24, 2009, at 12:59 AM, Heikki Lehvaslaiho wrote: > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. > > > -Heikki > > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - >> don't >> change working code purely to change style. In my interpretation of >> the >> rule, however, this has always applied to code writing style, and >> not code >> formatting style. I'm assuming the goal here is only to make the >> formatting >> consistent. >> >> -hilmar >> >> On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: >> >>> Cheers Rob, >>> >>> Whatever objectons may arise from style x or style y, I think it's a >>> great idea to at least have one style or another recognized as being >>> 'standard'. I know TMTOWTDI, but on a project like this, with so >>> many >>> contributors and users, it's essential to at least have a >>> recommendation. I'll try to use this on any contribs. >>> >>> As you pointed out [1], its probably best to provide two patches for >>> any change involving a formating clean up: one to change the fomat >>> to >>> the standard and one to commit the actual code changes. >>> >>> >>> All the best, >>> Dan. >>> >>> [1] irc://irc.freenode.net/#bioperl >>> >>> >>> 2009/8/21 Robert Buels : >>>> >>>> This one is copied from the parrot project. I added it in >>>> maintenance/perltidy.conf. >>>> Have a look, tweak as you see fit. >>>> >>>> The idea with perltidy profile files is to use them to enforce >>>> coding >>>> style >>>> rules. So this perltidy profile file would be the place to >>>> codify the >>>> BioPerl coding standards, such as indentation, use of cuddled >>>> elses, etc. >>>> >>>> So here is one, let's customize it for our needs. The way I >>>> usually run >>>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>>> option >>>> to specify a profile file. >>>> >>>> Example: >>>> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>>> >>>> Rob >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > Building #2, Office #4216 > Computational Bioscience Research Centre (CBRC) > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 09:36:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 08:36:32 -0500 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: Dan, all, Bio::SimpleAlign doesn't align anything for you. It makes no assumptions about the data being added, beyond possibly checking for the seqs to be flush prior to analyses. Here's the reason why: The object doesn't 'know' the seqs map across from one to the other as below: > ... > ## REF tacattaaagacccg > ## SEQ1 taca.taaa...... > ## SEQ2 .....taaaga.ccg > > my $aln = Bio::SimpleAlign->new(); > > $aln->gap_char('.'); > > my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); > my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, - > seq=>'taca.taaa' ); > my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, - > seq=>'taaaga.ccg' ); > > $aln->add_seq( $r ); > $aln->add_seq( $s1 ); > $aln->add_seq( $s2 ); Above, you are making the assumption that SimpleAlign 'knows' where to match the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does NOT indicate that (the LocatableSeq docs, and their usage, should indicate that). Think about HSP alignments in a BLAST report; the start/end/strand coordinates are where the sequence in the alignment maps to the original query or hit sequence. They don't indicate where the hit maps to the query (the alignment itself does that in a column-wise fashion). I'm not sure, maybe it needs to be more explicit in the documentation, but SimpleAlign does not align the sequences for you (and it shouldn't be expected to). There are much better (faster, more accurate) ways to do that. > if($CLUDGE){ > foreach(($r, $s1, $s2)){ > $_->seq( '.' x ($_->start - 1) . $_->seq ) > } > } > > ## Prepare an 'output stream' for the alignment: > my $aliWriter = Bio::AlignIO-> > new( -fh => \*STDOUT, > -format => 'clustalw', > ); > > warn "\nOUTPUT:\n"; > $aliWriter->write_aln($aln); ... > I was calling the "fill in the gaps yourself" step a CLUDGE because I > had expected the alignment object to take care of this for me. Is > there any reason that it couldn't do this 'CLUDGE' automatically? It > seems strange that it insists on being passed locatable sequence > objects, but then largely ignore the given location. > > Would it not be possible to have this happen when the sequences are > written out from the alignment? I think it should still be possible to > index the column number via the (gapless) sequence number... or did I > get confused? There are two levels of confusion here (on my part), 1) > the concepts behind the objects and 2) the implementation details. Mentioned above (no assumptions on how locatableseqs map to one another). WYSIWYG. There is nothing precluding you from writing up code to do that, though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl alignment implementation (there are, believe it or not, pure perl implementations of Smith- Waterman and Needleman-Wunsch. > Thanks for any hints on how to understand or potentially how to fix > these problems. > > Cheers, > Dan. Not that SimpleAlign and LocatableSeqs don't have their share of problems. However, I don't think you can expect this behavior to change with the refactors. chris From hlapp at gmx.net Mon Aug 24 09:44:43 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:44:43 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> Message-ID: On Aug 23, 2009, at 1:25 PM, Anand C. Patel wrote: > The other piece of potentially useful information is below -- output > from > SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; > (taxon_id 138 maps to ncbi_taxon_id 10090) > > taxon_id name name_class > 138 LK3 transgenic mice includes > 138 Mus muscaris misnomer > 138 Mus musculus scientific name > 138 Mus sp. 129SV includes > 138 house mouse genbank common name > 138 mice C57BL/6xCBA/CaJ hybrid misspelling > 138 mouse common name > 138 nude mice includes > 138 transgenic mice includes > > The source from the genbank entry NM_017474 is: > SOURCE Mus musculus (house mouse) > > Which is why I think the issue is that the name_class is "genbank > common name" rather than common name. Note that apparently NCBI has decided that the common name is 'mouse', not 'house mouse'. Why what they report in the genbank record is different from what they decided to be the common name is beyond me. Note also that the common name in parentheses is optional. If it's missing the record is still in valid format. > What does strike me as odd though is that not even "mouse" shows up > -- common_name is empty. Indeed, that's odd. Can you file this as a bug report and assign to the bioperl-db queue? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Aug 24 09:50:17 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:50:17 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> Message-ID: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > [...] > Code snippet: > my $species = $seq->species; > print "common name = ",$species->common_name, "\n"; > print "scientific name = ",$species->scientific_name, "\n"; > print "species = ",$species->species, "\n"; > print "genus = ",$species->genus, "\n"; > print "sub_species = ",$species->sub_species, "\n"; > print "binomial = ",$species->binomial, "\n"; > print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; > > Output: > common name = > scientific name = musculus > species = musculus > genus = Mus > sub_species = > binomial = Mus musculus > ncbi_taxid = 10090 This points to a problem in Bio::Species::scientific_name(), given that binomial() is correct. Could you file this as a bug report? > The common name is missing, despite having loaded it from NCBI > taxonomy using the provided script. > It is ONLY present as this "genbank common name". > [...] > I could go through and replace all of the instances of "genbank > common name" with "common name" and see if this fixes it. I think we need to first discuss how we want to treat the 'common name' versus 'genbank common name' classes in BioPerl. So question for everyone: do we need to have both available (in which case we need to add an accessor in Bio::Species), or only 'common name', or should 'genbank common name' override 'common name' if both are present and have different values. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Mon Aug 24 10:18:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Aug 2009 15:18:20 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> Message-ID: <320fb6e00908240718q194afe78j4a05b31aeb33e313@mail.gmail.com> On Mon, Jul 27, 2009 at 2:06 PM, Chris Fields wrote: > > I added this (and the others) to our ticket tracking this. ?Looks like > solexa conversion either way is borked, which is very likely an issue > with conversion. Hi Chris, I've been digging into the current SVN code for BioPerl's FASTQ support - I realised you are doing the Solexa to PHRED mapping twice when parsing "fastq-solexa" files. Using "qual" output (which shows the PHRED scores in plain text) makes it very clear something is wrong: $ cat solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<; That is Solexa scores from 40 (h) down to -5 (;), which should map onto PHRED scores from 40 down to 1 (according to our prior discussions). $ ./bioperl_solexa2qual.pl < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 10 9 8 7 6 6 5 5 5 5 4 4 4 4 For reference, $ python biopython_solexa2qual.py < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 9 8 7 6 5 5 4 4 3 3 2 2 1 1 I can "fix" this in fastq.pm by commenting out one of the log mappings, for example see the patch I've just uploaded to Bug 2857: http://bugzilla.open-bio.org/show_bug.cgi?id=2857 That brings me to another problem, consider the following (with the double conversion fixed): $ ./bioperl_solexa2solexa.pl < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJJHGFEDDBB@@>><< If you compare that to the original, you'll notice a loss of detail in the poor quality reads. e.g. Solexa scores 9 (I) and 10 (J) have both been mapped onto 10 (J). I believe this happens because BioPerl is converting the Solexa scores to PHRED scores on loading (which is fine - EMBOSS does this too), but you are also storing them as integers! In order to preserve these details, I think you'll have to hold the converted PHRED scores as floating point numbers (which I think is what EMBOSS does). This has the downside of taking more memory, and may also complicate file output (you may need to round things). Regards, Peter (@Biopython) From acpatel at gmail.com Sat Aug 22 18:44:20 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 17:44:20 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> Message-ID: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> On Aug 22, 2009, at 4:36 PM, Hilmar Lapp wrote: > That's a pretty strange bug. Anand, which version of BioPerl and > Bioperl-db are you running? BioPerl is: https://launchpad.net/ubuntu/karmic/+source/bioperl/1.6.0-2ubuntu1 (1.6.0 loaded via apt-get into ubuntu karmic alpha 4) BioPerl-db is version 1.006 (1.6.0) loaded via CPAN. BioSQL is 1.0.1 I think I know what's broken. Using load_seqdatabases.pl, I'd put a set of sequences from genbank into a biosql db in mysql. I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl script from biosql. When I searched for house (as in house mouse), I found that the name of the type of taxon class was "genbank common name". When I searched for musculus, it does appear as a type of "scientific name". > Note that the genus *is* actually there in the lineage (and hence > does get retrieved from the database). Apparently the Species object > fails to pull it out correctly, though? > > Anand - I suspect there have been some warnings printed to the > terminal - can you post these, and otherwise confirm that there > haven't been any? > > -hilmar I'm not just getting warnings. I'm getting errors. Tons of them. It's a wonder it's working at all. I started with the getentry.cgi script in the cgi-bin folder, and stripped most of it away. Code: #!/usr/bin/perl use DBI; use CGI::Carp qw( fatalsToBrowser ); use CGI qw/:standard/; use Bio::DB::BioDB; use Bio::Seq::RichSeq; use Bio::SeqIO; use IO::String; my $q = new CGI; # create new CGI object print $q->header; # create the HTTP header my $value = "NM_017474"; my $host = "localhost"; my $dbname = "biosql"; my $driver = "mysql"; my $dbuser = "webuser"; my $dbpass = "wrjFfjjW9y243xvF"; my $biodbname = "genbank"; my $seq; eval { my $db = Bio::DB::BioDB->new(-database => "biosql", -host => $host, -dbname => $dbname, -driver => $driver, -user => $dbuser, -pass => $dbpass, -verbose => 10, ); my $seqadaptor = $db->get_object_adaptor('Bio::SeqI'); $seq = Bio::Seq::RichSeq->new( -accession_number => $value, - namespace => $biodbname ); $seq = $seqadaptor->find_by_unique_key($seq); }; my $seqfh = IO::String->new($gbstring); my $ioseq = Bio::SeqIO->new(-fh => $seqfh, -format => 'genbank'); $ioseq->write_seq($seq); if( $@ || !defined $seq) { print "Got fetch exception of...\n
$@\n
"; exit(0); } print "BioSQL display of ". $seq->display_id ."\n"; print "\n"; print "
\n
".$gbstring."\n
\n
\n"; Errors (some but not all): test1.cgi: attempting to load adaptor class for Bio::SeqI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load adaptor class for BioNamespace test1.cgi: \tattempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? test1.cgi: BioNamespaceAdaptor: binding UK column 1 to "genbank" (namespace) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor test1.cgi: preparing UK select statement: SELECT bioentry.bioentry_id, bioentry.name, bioentry.identifier, bioentry.accession, bioentry.description, bioentry.version, bioentry.division, bioentry.biodatabase_id, bioentry.taxon_id FROM bioentry WHERE biodatabase_id = ? AND accession = ? test1.cgi: SeqAdaptor: binding UK column 1 to "1" (bionamespace) test1.cgi: SeqAdaptor: binding UK column 2 to "NM_017474" (accession_number) test1.cgi: attempting to load adaptor class for Bio::PrimarySeq test1.cgi: \tattempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: preparing PK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE biodatabase_id = ? test1.cgi: BioNamespaceAdaptor: binding PK column to "1" test1.cgi: attempting to load adaptor class for Bio::Species test1.cgi: \tattempting to load module Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: preparing PK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND taxon_name.name_class = 'scientific name' AND taxon.taxon_id = ? test1.cgi: SpeciesAdaptor: binding PK column to "138" test1.cgi: prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value >= node.left_value AND taxon.left_value <= node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value test1.cgi: preparing SELECT COMMON_NAME: SELECT taxon_name.name FROM taxon_name WHERE taxon_name.taxon_id = ? AND taxon_name.name_class = 'common_name' test1.cgi: attempting to load adaptor class for Bio::Tree::Tree test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::Root test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::RootI test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeFunctionsI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor test1.cgi: no adaptor found for class Bio::Tree::Tree test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy::list test1.cgi: \tattempting to load module Bio::DB::BioSQL::listAdaptor test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy test1.cgi: \tattempting to load module Bio::DB::BioSQL::TaxonomyAdaptor test1.cgi: no adaptor found for class Bio::DB::Taxonomy::list test1.cgi: attempting to load adaptor class for Biosequence test1.cgi: \tattempting to load module Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BiosequenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: preparing UK select statement: SELECT biosequence.bioentry_id, biosequence.version, biosequence.length, biosequence.alphabet, NULL, NULL, biosequence.bioentry_id FROM biosequence WHERE bioentry_id = ? test1.cgi: BiosequenceAdaptor: binding UK column 1 to "1" (primary_seq) test1.cgi: attempting to load adaptor class for Bio::AnnotationCollectionI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: attempting to load adaptor class for Bio::Annotation::TypeManager test1.cgi: \tattempting to load module Bio::DB::BioSQL::TypeManagerAdaptor test1.cgi: no adaptor found for class Bio::Annotation::TypeManager test1.cgi: attempting to load adaptor class for Bio::Annotation::Reference test1.cgi: \tattempting to load module Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.reference_id, t2.authors, t2.title, t2.location, t2.crc, bioentry_reference.start_pos, bioentry_reference.end_pos, bioentry_reference.rank, t2.dbxref_id FROM bioentry t1, reference t2, bioentry_reference WHERE t1.bioentry_id = bioentry_reference.bioentry_id AND t2.reference_id = bioentry_reference.reference_id AND t1.bioentry_id = ? test1.cgi: ReferenceAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::DBLink test1.cgi: \tattempting to load module Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: preparing PK select statement: SELECT dbxref.dbxref_id, dbxref.dbname, dbxref.accession, dbxref.version, NULL FROM dbxref WHERE dbxref_id = ? test1.cgi: DBLinkAdaptor: binding PK column to "1" test1.cgi: DBLinkAdaptor: binding PK column to "2" test1.cgi: DBLinkAdaptor: binding PK column to "3" test1.cgi: DBLinkAdaptor: binding PK column to "4" test1.cgi: DBLinkAdaptor: binding PK column to "5" test1.cgi: DBLinkAdaptor: binding PK column to "6" test1.cgi: DBLinkAdaptor: binding PK column to "7" test1.cgi: DBLinkAdaptor: binding PK column to "8" test1.cgi: DBLinkAdaptor: binding PK column to "9" test1.cgi: DBLinkAdaptor: binding PK column to "10" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, bioentry_dbxref.rank FROM bioentry t1, dbxref t2, bioentry_dbxref WHERE t1.bioentry_id = bioentry_dbxref.bioentry_id AND t2.dbxref_id = bioentry_dbxref.dbxref_id AND t1.bioentry_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::SimpleValue test1.cgi: \tattempting to load module Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::Ontology test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing UK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE name = ? test1.cgi: OntologyAdaptor: binding UK column 1 to "Annotation Tags" (name) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, bioentry_qualifier_value.value, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::OntologyTerm test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyTermAdaptor test1.cgi: attempting to load adaptor class for Bio::AnnotationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::TermI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::TermAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::Comment test1.cgi: \tattempting to load module Bio::DB::BioSQL::CommentAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::CommentAdaptor test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE t1.bioentry_id = ? test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::Seq::RichSeq: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::SeqFeatureI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: preparing query: SELECT t1.seqfeature_id, t1.display_name, t1.rank, t1.bioentry_id, t1.type_term_id, t1.source_term_id FROM seqfeature t1 WHERE t1.bioentry_id = ? ORDER BY t1.rank test1.cgi: Query FIND FEATURE BY SEQ: binding column 1 to "1" test1.cgi: preparing PK select statement: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL, term.ontology_id FROM term WHERE term_id = ? test1.cgi: TermAdaptor: binding PK column to "245" test1.cgi: attempting to load adaptor class for Bio::Ontology::OntologyI test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing PK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE ontology_id = ? test1.cgi: OntologyAdaptor: binding PK column to "32" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, term_dbxref.rank FROM term t1, dbxref t2, term_dbxref WHERE t1.term_id = term_dbxref.term_id AND t2.dbxref_id = term_dbxref.dbxref_id AND t1.term_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "245" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: preparing: SELECT synonym FROM term_synonym WHERE term_id = ? test1.cgi: SELECT SYNONYMS: executing with values (245) (FK to Bio::Ontology::Term) test1.cgi: TermAdaptor: binding PK column to "246" test1.cgi: OntologyAdaptor: binding PK column to "33" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "246" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (246) (FK to Bio::Ontology::Term) test1.cgi: attempting to load adaptor class for Bio::LocationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::LocationAdaptor test1.cgi: preparing query: SELECT t1.location_id, t1.start_pos, t1.end_pos, t1.strand, t1.rank, t1.seqfeature_id, t1.dbxref_id FROM location t1 WHERE t1.seqfeature_id = ? test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::DB::Persistent::PersistentObjectFactory test1.cgi: \tattempting to load module Bio::DB::BioSQL::PersistentObjectFactoryAdaptor test1.cgi: attempting to load adaptor class for Bio::Factory::ObjectFactoryI test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryAdaptor test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, seqfeature_dbxref.rank FROM seqfeature t1, dbxref t2, seqfeature_dbxref WHERE t1.seqfeature_id = seqfeature_dbxref.seqfeature_id AND t2.dbxref_id = seqfeature_dbxref.dbxref_id AND t1.seqfeature_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, seqfeature_qualifier_value.value, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE 1 = 1 test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::SeqFeature::Generic: binding column 1 to "1" test1.cgi: TermAdaptor: binding PK column to "260" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "260" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (260) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "2" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "250" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "250" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (250) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "3" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "264" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "264" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (264) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "4" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing SELECT statement: SELECT seq FROM biosequence WHERE bioentry_id = ? > > On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > >> Anand, >> >> You should always post emails to the bioperl-l mailing list, never >> to individual developers (you'll get an answer much faster). Keep >> responses on the list as well. >> >> Though I use bioperl-db some, I'm probably not the best person to >> ask. Does anyone know what's going on with this? Does this have >> to do with the Species/Taxon refactoring? >> >> chris >> >> Begin forwarded message: >> >>> From: "Anand C. Patel" >>> Date: August 22, 2009 2:57:42 PM CDT >>> To: cjfields at illinois.edu >>> Subject: problem with bioperl (where's the Mus?) >>> >>> Dr. Fields, >>> >>> I'm struggling with what seems to be a strange quirk in Bioperl >>> +/- Bioperl-db/BioSQL. >>> >>> I've successfully loaded in genbank sequences into a biosql >>> database. >>> >>> When I try to write a genbank sequence back out, a curious thing >>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>> >>> Despite reporting: >>> primary tag: source >>> tag: chromosome >>> value: 3 >>> >>> tag: db_xref >>> value: taxon:10090 >>> >>> tag: map >>> value: 3 74.5 cM >>> >>> tag: mol_type >>> value: mRNA >>> >>> tag: organism >>> value: Mus musculus >>> The sequence when printed out via SeqIO looks like this: >>> LOCUS NM_017474 2935 bp dna linear >>> ROD 13-AUG-2009 >>> DEFINITION Mus musculus chloride channel calcium activated 3 >>> (Clca3), mRNA. >>> ACCESSION NM_017474 XM_978159 >>> VERSION NM_017474.2 GI:255918210 >>> KEYWORDS . >>> SOURCE musculus >>> ORGANISM musculus >>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>> Bilateria; >>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>> Tetrapoda; >>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>> Glires; >>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>> Confession -- I have a final project due Monday wherein I boldly >>> elected to interface Bioperl, MySQL, Perl, and CGI. >>> (I'm an MD getting my MS in Bioinformatics.) >>> After many misadventures, I'm getting to the point where I could >>> actually complete the objectives, but this is bug is rather >>> problematic. >>> Thanks, >>> Anand >>> Anand C. Patel, MD >>> Assistant Professor of Pediatrics >>> Division of Allergy/Pulmonary Medicine >>> Department of Pediatrics >>> Washington University School of Medicine >>> 660 South Euclid Ave, Campus Box 8052 >>> St. Louis, MO 63110 >>> acpatel at wustl.edu >>> acpatel at gmail.com >>> acpatel at jhu.edu >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sat Aug 22 20:04:35 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:04:35 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? First -- before the sequences. In fact, I'm in the midst of reloading the taxonomy into a clean new database. I used namespace "genbank" instead of namespace "bioperl". Could that be the problem? >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). I did not know that! They were flagged "error", so I thought those might be the problem. >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > It works -- I just think I confused the system by not sticking with the default namespace? Thanks, Anand >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sat Aug 22 20:13:37 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:13:37 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Do I need to load ontology before loading sequences? (I promise I've been reading the documentation for days, and could not find a yea or nay on this) Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at usa.net Sat Aug 22 21:13:14 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sat, 22 Aug 2009 20:13:14 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Turns out that using the default namespace bioperl doesn't change anything. Common name -- still "genbank common name" in name_class in the taxon_name table for "house mouse", which I think the module is looking for as "common name". It's not behaving differently despite reloading the sequences. I've created a horrible munge that fixes it for cosmetic purposes: my $species = $seq->species; my $justspecies = $species->scientific_name(); my $binspecies = $species->binomial(); my $gbstring2 = $gbstring; $gbstring2 =~ s/$binspecies/$justspecies/g; $gbstring2 =~ s/$justspecies/$binspecies/g; But this does not strike me as a long term solution. Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From jkb at sanger.ac.uk Mon Aug 24 05:02:34 2009 From: jkb at sanger.ac.uk (James Bonfield) Date: Mon, 24 Aug 2009 10:02:34 +0100 Subject: [Bioperl-l] SCF installation Message-ID: <20090824090234.GB821@sanger.ac.uk> Lincoln Stein wrote: > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If you download and install Staden 1.12, you'll get a library > named libstaden-read rather than libread; Bio::SCF hasn't been updated > for the name change, and so you will have to open up the Makefile.PL > and change "-lread" to "-lstaden-read" in order for it to compile. This post was pointed out to me by one of the Debian maintainers. I'm mailing the list directly but am not a subscriber, so please keep me listed in any replies. The Staden Package home page recently underwent a revamp to use the RSS feeds, automatically updating it. Unfortunately within a couple weeks of doing that sourceforge managed to break the file release RSS and so the site has stopped updating. The News section is still working though, so I ought to add a news post about io_lib-1.12.1 and it'll at least appear somewhere on the home page. Regarding the library name change, this was requested by Debian and also already implemented by Fedora. I agree with it too as libread.so is a truely appalling name, so the new name is here to stay. There shouldn't be a great number of differences compared to the 1.11.x release set though, with the only incompatibility I can immediately think of being the change from int to size_t in the Array structs. James PS. There's been very few changes to SCF over the years so it's likely all working just fine. Most recent io_lib changes have been SRF support, and a few associated tweaks to ZTR necessitated by SRF. -- James Bonfield (jkb at sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From acpatel at usa.net Sun Aug 23 13:17:08 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:17:08 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> On Aug 23, 2009, at 9:38 AM, Hilmar Lapp wrote: >> Common name -- still "genbank common name" in name_class in the >> taxon_name table for "house mouse", which I think the module is >> looking for as "common name". > > If you are loading the NCBI taxonomy first, this is coming from > NCBI, not one of the scripts or BioPerl, and hence we have no > control over it. Are you saying that there is no designated name of > class 'common name' for Mus musculus in the NCBI taxonomy dump? > > Also, the common name being present or not should have no bearing on > the lineage array, where the actual problem is, so I don't > understand right now how this would be connected to the problem you > are seeing. > >> >> It's not behaving differently despite reloading the sequences. >> >> I've created a horrible munge that fixes it for cosmetic purposes: >> my $species = $seq->species; >> my $justspecies = $species->scientific_name(); >> my $binspecies = $species->binomial(); >> >> my $gbstring2 = $gbstring; >> >> $gbstring2 =~ s/$binspecies/$justspecies/g; >> $gbstring2 =~ s/$justspecies/$binspecies/g; > > I don't understand what you are trying to achieve here - it seems > like you are making a substitution and then reverting it? Also, > $species->scientific_name() and $species->binomial() should be > identical for Mus musculus - are you finding different values being > returned? > > So in essence, I wouldn't expect your above code snippet to have any > effect, for both of these reasons. How do you find $gbstring2 to be > different from $gbstring at the end of this block of code? > > -hilmar I should have been clearer. Code snippet: my $species = $seq->species; print "common name = ",$species->common_name, "\n"; print "scientific name = ",$species->scientific_name, "\n"; print "species = ",$species->species, "\n"; print "genus = ",$species->genus, "\n"; print "sub_species = ",$species->sub_species, "\n"; print "binomial = ",$species->binomial, "\n"; print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; Output: common name = scientific name = musculus species = musculus genus = Mus sub_species = binomial = Mus musculus ncbi_taxid = 10090 The common name is missing, despite having loaded it from NCBI taxonomy using the provided script. It is ONLY present as this "genbank common name". So, what I get in $gbstring is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE musculus ORGANISM musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. What I get in $gbstring2 is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE Mus musculus ORGANISM Mus musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. Not perfect -- common name is still missing, but better. I could go through and replace all of the instances of "genbank common name" with "common name" and see if this fixes it. Any other thoughts? Thanks, Anand > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From acpatel at usa.net Sun Aug 23 13:25:16 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:25:16 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> The other piece of potentially useful information is below -- output from SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; (taxon_id 138 maps to ncbi_taxon_id 10090) taxon_id name name_class 138 LK3 transgenic mice includes 138 Mus muscaris misnomer 138 Mus musculus scientific name 138 Mus sp. 129SV includes 138 house mouse genbank common name 138 mice C57BL/6xCBA/CaJ hybrid misspelling 138 mouse common name 138 nude mice includes 138 transgenic mice includes The source from the genbank entry NM_017474 is: SOURCE Mus musculus (house mouse) Which is why I think the issue is that the name_class is "genbank common name" rather than common name. What does strike me as odd though is that not even "mouse" shows up -- common_name is empty. Thanks again, Anand From maj at fortinbras.us Mon Aug 24 10:37:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 10:37:45 -0400 Subject: [Bioperl-l] The Documentation Project Message-ID: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Hi All, I'm starting this journey of 1000 mi (1620 km) with the following step: http://www.bioperl.org/wiki/The_Documentation_Project Please visit and comment. Thanks, Mark From hlapp at gmx.net Mon Aug 24 10:47:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 10:47:34 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Hi Anna, sequence formats all have some varying amount of information that must be present or otherwise the syntax is invalid. If what you need is a two-column table of display_id and species name, then I would simply write that, and not squeeze it into a standard sequence format. (Unless you actually do want the sequence too, in which case you need to add it as a wanted slot; even in that case though, writing a three- column table might serve you better.) -hilmar On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Mon Aug 24 12:50:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 11:50:05 -0500 Subject: [Bioperl-l] The Documentation Project In-Reply-To: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> References: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Message-ID: Mark, We should probably keep some of this discussion on the list, primarily as I've been running into conflicts with responses on the wiki page. It's more amenable to discussion. For anyone out there interested, you should speak up now, this is the best opportunity to do so (we're considering lack of input assent). I want to make a a few key points on behalf of the devs. It's impossible to consistently maintain two active copies of any documentation (wiki vs docs in the distribution). I have tried keeping up with this, helping with the 1.5.2 release, and full-on with the 1.6.0 release, and it's an extreme headache. From the maintenance point-of-view, this is what I would do: 1) Where possible always link to the official POD (either pdoc or CPAN) from the distribution. Make the API documentation link very prominent (I moved it to the docs section in the sidebar). Protect wiki module pages (in line with the 'one official copy' rule), allow writable discussion pages for additional, wiki-specific documentation (which can be added to the official docs as needed). 2) ...or, have a search bar specifically for the module documentation that links directly to the proper API/PDOC/CPAN page. Not sure how feasible that is, particularly since we plan on splitting things up a bit. 3) POD-ify any relevant documentation we intend on including in the wiki that also comes with the distribution (similar to Moose::Manual). I do not want to repeatedly edit a plain text INSTALL/ BUGS/DEPENDENCIES file to correspond with the wikified version for every release (nor vice versa). Long term: (this is my own personal style, YMMV) move all POD to the end of the file. Add a 'Status' tags to any method docs indicating implementation status (virtual, stable, unstable, public, private, etc). Move method POD to it's own section within the main documentation. Implement a coding style (as mentioned recently on list using perltidy, but also using proper method names). HOWTO's are also subject to API changes, but we haven't run into many issues with those yet, and they're wiki-specific. chris On Aug 24, 2009, at 9:37 AM, Mark A. Jensen wrote: > Hi All, > I'm starting this journey of 1000 mi (1620 km) with the following > step: > http://www.bioperl.org/wiki/The_Documentation_Project > Please visit and comment. > Thanks, > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 13:37:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 12:37:39 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: On Aug 24, 2009, at 12:16 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >>> >>> ... >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I think it's bioperl-db-related. You've previously pointed out the incongruity bioperl-db has with Bio::Species in a bug report (I indicated that in a separate post to this thread). >>> The common name is missing, despite having loaded it from NCBI >>> taxonomy using the provided script. >>> It is ONLY present as this "genbank common name". >>> [...] >>> I could go through and replace all of the instances of "genbank >>> common name" with "common name" and see if this fixes it. >> I think we need to first discuss how we want to treat the 'common >> name' versus 'genbank common name' classes in BioPerl. >> So question for everyone: do we need to have both available (in >> which case we need to add an accessor in Bio::Species), or only >> 'common name', or should 'genbank common name' override 'common >> name' if both are present and have different values. > > Bio::Species (via Bio::Taxon) has the common_names() method, for > which common_name() is an alias that in scalar context returns the > first of possibly many common names, one of which may be the genbank > common name. > > See: > http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes Yes, but that method stored names in an array and removes the context, presumed or not. If there are two or more, which names correspond to common_name, which to genbank_common_name (and which should we prefer)? chris From bix at sendu.me.uk Mon Aug 24 13:16:13 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 18:16:13 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> Message-ID: <4A92CADD.10901@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > >> [...] >> Code snippet: >> my $species = $seq->species; >> print "common name = ",$species->common_name, "\n"; >> print "scientific name = ",$species->scientific_name, "\n"; >> print "species = ",$species->species, "\n"; >> print "genus = ",$species->genus, "\n"; >> print "sub_species = ",$species->sub_species, "\n"; >> print "binomial = ",$species->binomial, "\n"; >> print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; >> >> Output: >> common name = >> scientific name = musculus >> species = musculus >> genus = Mus >> sub_species = >> binomial = Mus musculus >> ncbi_taxid = 10090 > > This points to a problem in Bio::Species::scientific_name(), given that > binomial() is correct. Could you file this as a bug report? What code creates the Bio::Species object here? I suspect this code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> The common name is missing, despite having loaded it from NCBI >> taxonomy using the provided script. >> It is ONLY present as this "genbank common name". >> [...] >> I could go through and replace all of the instances of "genbank common >> name" with "common name" and see if this fixes it. > I think we need to first discuss how we want to treat the 'common name' > versus 'genbank common name' classes in BioPerl. > > So question for everyone: do we need to have both available (in which > case we need to add an accessor in Bio::Species), or only 'common name', > or should 'genbank common name' override 'common name' if both are > present and have different values. Bio::Species (via Bio::Taxon) has the common_names() method, for which common_name() is an alias that in scalar context returns the first of possibly many common names, one of which may be the genbank common name. See: http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes From hlapp at gmx.net Mon Aug 24 13:54:13 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 13:54:13 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I see. Any pointer to what would tell me what I need to change or is everything in the Bio::Species POD? BTW what the Bioperl-db code does is instantiate the blank object and then populate it through its accessors (mostly the classification() array). If what it has been doing in the past is now considered incorrect, at least it doesn't raise any warning that would alert one to that ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From robert.bradbury at gmail.com Mon Aug 24 14:38:08 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 24 Aug 2009 14:38:08 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: As a really "off-the-wall" suggestion, you might see if somehow the "name" being pulled is the SwissProt name rather than the species name. I run into this when I'm fetching FASTA sequences from SwissProt in that the sequence identifier names are non-standard for some of the early "standard" species, e.g. "HUMAN", # Homo sapiens "MOUSE", # Mus musculus "RAT", # Rattus norvegicus "BOVIN", # Bos taurus "HORSE", # Equus caballus "PIG", # Sus scrofa "RABIT", # Oryctolagus cuniculus "SHEEP", # Ovis aries "YEAST", # Saccharomyces cerevisiae (Baker's yeast) etc. Eventually they largely adopted the 3+2 letter species derived name, but the early "standard" names are anomalies. You might run a test on a newly sequenced species (Gorilla, Opossum, Armadillo, Dog, etc.) to see if you get a "standard" species name. Robert Bradbury From dan.bolser at gmail.com Mon Aug 24 15:13:26 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 20:13:26 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > From bix at sendu.me.uk Mon Aug 24 15:12:05 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 20:12:05 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: <4A92E605.5090706@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: > >>> This points to a problem in Bio::Species::scientific_name(), given >>> that binomial() is correct. Could you file this as a bug report? >> >> What code creates the Bio::Species object here? I suspect this code >> isn't aware of changes in Bio::Species since BioPerl 1.5.2. > > I see. Any pointer to what would tell me what I need to change or is > everything in the Bio::Species POD? ... I won't guarantee the perfection of the POD ;) > BTW what the Bioperl-db code does is instantiate the blank object and > then populate it through its accessors (mostly the classification() > array). If what it has been doing in the past is now considered > incorrect, at least it doesn't raise any warning that would alert one to > that ... Yuh... If you point out the code that creates the Bio::Species I can look into it for you and suggest what needs changing and why it doesn't work (or if it's a bug in Bio::Species). I can't remember things clearly right now, though classification() I guess was supposed to be backwards compatible. From cjfields at illinois.edu Mon Aug 24 15:52:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 14:52:56 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92E605.5090706@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> Message-ID: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>> This points to a problem in Bio::Species::scientific_name(), >>>> given that binomial() is correct. Could you file this as a bug >>>> report? >>> >>> What code creates the Bio::Species object here? I suspect this >>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> I see. Any pointer to what would tell me what I need to change or >> is everything in the Bio::Species POD? > > ... I won't guarantee the perfection of the POD ;) > > >> BTW what the Bioperl-db code does is instantiate the blank object >> and then populate it through its accessors (mostly the >> classification() array). If what it has been doing in the past is >> now considered incorrect, at least it doesn't raise any warning >> that would alert one to that ... > > Yuh... If you point out the code that creates the Bio::Species I can > look into it for you and suggest what needs changing and why it > doesn't work (or if it's a bug in Bio::Species). I can't remember > things clearly right now, though classification() I guess was > supposed to be backwards compatible. Sendu, I think it's related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 Bio::DB::BioSQL::SpeciesAdaptor and Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in question i think. chris From bix at sendu.me.uk Mon Aug 24 16:01:29 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 21:01:29 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> Message-ID: <4A92F199.2030900@sendu.me.uk> Chris Fields wrote: > > On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > >> Hilmar Lapp wrote: >>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>> This points to a problem in Bio::Species::scientific_name(), given >>>>> that binomial() is correct. Could you file this as a bug report? >>>> >>>> What code creates the Bio::Species object here? I suspect this code >>>> isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>> I see. Any pointer to what would tell me what I need to change or is >>> everything in the Bio::Species POD? >> >> ... I won't guarantee the perfection of the POD ;) >> >> >>> BTW what the Bioperl-db code does is instantiate the blank object and >>> then populate it through its accessors (mostly the classification() >>> array). If what it has been doing in the past is now considered >>> incorrect, at least it doesn't raise any warning that would alert one >>> to that ... >> >> Yuh... If you point out the code that creates the Bio::Species I can >> look into it for you and suggest what needs changing and why it >> doesn't work (or if it's a bug in Bio::Species). I can't remember >> things clearly right now, though classification() I guess was supposed >> to be backwards compatible. > > Sendu, I think it's related to this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 > > Bio::DB::BioSQL::SpeciesAdaptor and > Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in > question i think. Ah, yes, well there you go then. So it is a classification() issue. Judging by what I said in that bug, looks like the db code needs to be changed to put the full scientific name in the first element it passes to classification. From cjfields at illinois.edu Mon Aug 24 16:27:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 15:27:23 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92F199.2030900@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> Message-ID: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>> Hilmar Lapp wrote: >>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>> given that binomial() is correct. Could you file this as a bug >>>>>> report? >>>>> >>>>> What code creates the Bio::Species object here? I suspect this >>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>> I see. Any pointer to what would tell me what I need to change or >>>> is everything in the Bio::Species POD? >>> >>> ... I won't guarantee the perfection of the POD ;) >>> >>> >>>> BTW what the Bioperl-db code does is instantiate the blank object >>>> and then populate it through its accessors (mostly the >>>> classification() array). If what it has been doing in the past is >>>> now considered incorrect, at least it doesn't raise any warning >>>> that would alert one to that ... >>> >>> Yuh... If you point out the code that creates the Bio::Species I >>> can look into it for you and suggest what needs changing and why >>> it doesn't work (or if it's a bug in Bio::Species). I can't >>> remember things clearly right now, though classification() I guess >>> was supposed to be backwards compatible. >> Sendu, I think it's related to this: >> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >> Bio::DB::BioSQL::SpeciesAdaptor and >> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in >> question i think. > > Ah, yes, well there you go then. So it is a classification() issue. > Judging by what I said in that bug, looks like the db code needs to > be changed to put the full scientific name in the first element it > passes to classification. Yup. I believe the only blocking issue with implementing it was potential backwards-compat problems with databases loaded using old behavior and then being updated post-1.5.2 (new behavior). I would think this only affects sequence data loaded w/o taxonomy preloaded, but I'm not sure. I suggest, if you can fix it, go ahead make the necessary change. We can then post a big warning to BioSQL and here about the problem, something along the lines of 'bioperl-db in svn may be backwards incompatible with species information loaded in previous versions; it may eat your first born' or similar. It's an absolutely necessary fix, and may effectively kill a bunch of other db/species-related bugs. chris From Kevin.M.Brown at asu.edu Mon Aug 24 17:48:35 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 24 Aug 2009 14:48:35 -0700 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com><990CEF10B1AD4BD5BE9977FD62DB3437@NewLife><2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B4062D2655@EX02.asurite.ad.asu.edu> You can use Bio::SimpleAlign for those tasks, but you, the programmer, have to remember that you didn't front pad the sequence and so can't utilize certain functions blindly. I've used SimpleAlign with LocatableSeq objects and wrote a few custom methods that did things like creating slices from the simplealign for each locatableseq. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Bolser Sent: Monday, August 24, 2009 12:13 PM To: Chris Fields Cc: bioperl-l at lists.open-bio.org; Mark A. Jensen; Paolo Pavan Subject: Re: [Bioperl-l] Bio::SimpleAlign constructor? Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Mon Aug 24 20:12:18 2009 From: hartzell at alerce.com (George Hartzell) Date: Mon, 24 Aug 2009 17:12:18 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl Message-ID: <19091.11362.190209.844074@already.dhcp.gene.com> There's a warning at Ensembl about the perl api code depending on an old version of bioperl (1.2.3) http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html Does anyone have current information about that dependency? My quick-n-dirty tests suggest that one can't build an app that uses both new Bioperl and the ensembl api without ensembl picking up the newer bioperl libraries (or your app getting the older ones). It's not clear what parts of the ensembl world depend on the older BioPerl. Anyone have any recipes to make it work? Any info on a possible modernization of the ensembl code? Thanks, g. From cjfields at illinois.edu Mon Aug 24 22:29:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 21:29:38 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <19091.11362.190209.844074@already.dhcp.gene.com> References: <19091.11362.190209.844074@already.dhcp.gene.com> Message-ID: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > > There's a warning at Ensembl about the perl api code depending on an > old version of bioperl (1.2.3) > > http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html > > Does anyone have current information about that dependency? > > My quick-n-dirty tests suggest that one can't build an app that uses > both new Bioperl and the ensembl api without ensembl picking up the > newer bioperl libraries (or your app getting the older ones). It's > not clear what parts of the ensembl world depend on the older BioPerl. I've asked this question several times of the ensembl folk w/o an adequate response. My general feeling is even they may not really know for sure (though I recall ewan saying something about feature/ annotation changes around then, and maybe something about the blastreporter). Saying that, the ensembl perl API worked for me using bioperl-live (and bioperl 1.6) as of a couple months ago. You might eventually run into some issues; if so report them back here and to the ensembl list. > Anyone have any recipes to make it work? > > Any info on a possible modernization of the ensembl code? That is completely up to the ensembl folks. bioperl 1.2.3 is full enough of bugs, and I don't plan on backporting any changes to that branch (seems kind of silly, as that branch is now about six yrs old). > Thanks, > > g. np! -chris From hlapp at gmx.net Mon Aug 24 23:17:29 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 23:17:29 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> [...] >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). The code change is for retrieving data, right? So I'm not sure how it would break backwards compatibility, unless one has taxon entries created before the change (i.e., about 3 years ago?) and through loading sequences rather than through loading the NCBI taxonomy. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 25 00:10:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:10:15 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Right, that's what I thought as well, but I just wasn't clear on that. So, basically we're saying, as long as the code change is on the retrieving side, everything's okay? Then I'm pretty sure I know how to fix it, at least partly. I can probably squeeze that in unless Sendu's working on it. Sendu? chris From cjfields at illinois.edu Tue Aug 25 00:28:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:28:26 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Okay, if possible I would like you or Sendu to review that last commit I made to bioperl-db. It includes Sendu's patch; I commented out sections that were modifying the genus/species when loaded in, but there are a few TODO's I noted as well (everything is in populate_from_row()). 02species.t is now failing but I think it's based on the same old behavior; I'll look into it. chris From geoeco at rambler.ru Tue Aug 25 03:01:24 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:01:24 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> Message-ID: <1074529971.1251183684.50392744.40754@mcgi70.rambler.ru> Hi Rohit, Thanks a lot for your comments, it actually worked well, but in fact i only want to extract species names as I want to have it in a separate file together with a fasta file with sequences. So, thanks a lot again! Anna * Rohit Ghai [Mon, 24 Aug 2009 14:53:03 +0200]: > hi > > I think you forgot to add the "seq" in the builder.. thats why the file > is > empty. > Also, the species name, though being parsed, is nowhere in the output. > Here's a version > using fasta output that you can probably customize further. This also > takes > the full > name of the organism and adds to the description line in the output. > > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'fasta'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species','seq','description'); > > while(my $seq = $seq_in->next_seq()) { > > my $desc = $seq->description(); > my $species_string = $seq->species()->binomial('FULL'); > $desc = $desc . " [$species_string]"; > $seq->description($desc); > $seq_out->write_seq($seq); > } > > exit; > > > On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova > wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact I > only > > need a first line under ORGANISM tag (e.i. genus + species). I though > that > > it would be possible to do with the SeqBuilder object by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From geoeco at rambler.ru Tue Aug 25 03:03:56 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:03:56 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Message-ID: <734135890.1251183836.48962856.71827@mcgi59.rambler.ru> hello Chris, Well, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? Thanks, Anna * Chris Fields [Mon, 24 Aug 2009 07:55:56 -0500]: > Anna, > > It's stored in the Bio::Species object. I have to say, though, I > think you're using a stick of dynamite for a scalpel here; if you only > need ORGANISM parse it out directly (it's much faster). Or am I > missing something? > > chris > > On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From geoeco at rambler.ru Tue Aug 25 03:09:43 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:09:43 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Message-ID: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> hello Hilmar, Thanks for your comments. Actually, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? the reason, why I don't want a simple parsing for species names is that i also want to be able to which gene has been sequenced while (my $inseq = $seq_in->next_seq) { if ($inseq->desc =~ m/5\.8S ribosomal RNA/) { $seq_out->write_seq($inseq); } } and only it is 5.8s rRNA I want to extract the species name and a sequences. And I thought that with direct parsing it would be much longer code. Am I wrong? i am a newbie both in bioperl and bioinformatics, so all comments would be appreciated:) Anna * Hilmar Lapp [Mon, 24 Aug 2009 10:47:34 -0400]: > Hi Anna, > > sequence formats all have some varying amount of information that must > be present or otherwise the syntax is invalid. If what you need is a > two-column table of display_id and species name, then I would simply > write that, and not squeeze it into a standard sequence format. > (Unless you actually do want the sequence too, in which case you need > to add it as a wanted slot; even in that case though, writing a three- > column table might serve you better.) > > -hilmar > > On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Aug 25 07:34:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:34:18 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> Message-ID: <4A8C2A89-C212-4969-8B01-3DA7D7DE7862@gmx.net> On Aug 25, 2009, at 12:28 AM, Chris Fields wrote: > Okay, if possible I would like you or Sendu to review that last > commit I made to bioperl-db. Will do. > [...] > 02species.t is now failing but I think it's based on the same old > behavior; I'll look into it. I would expect that if the classification array is now different, so the test will need changing to expect the "new" behavior. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Aug 25 07:52:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:52:11 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> Message-ID: <3B23691B-B165-4CC3-889E-04DE45AB1627@gmx.net> Hi Anna: On Aug 25, 2009, at 3:09 AM, Anna Kostikova wrote: > Actually, my final aim is to get 2 files: first one is a fasta file > with all the sequences, and the seconds one is simply a list of > species names Then I'd change your script to write two files: one with the sequences in FASTA format (you can use Bio::SeqIO for that), and the second one in the format you need it (one species name per line?). (Right now you are writing one file in Genbank format, which is quite unlike the above, right?) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From whs at ebi.ac.uk Tue Aug 25 07:04:23 2009 From: whs at ebi.ac.uk (William Spooner) Date: Tue, 25 Aug 2009 12:04:23 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> Message-ID: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> On 25 Aug 2009, at 03:29, Chris Fields wrote: > On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > >> >> There's a warning at Ensembl about the perl api code depending on an >> old version of bioperl (1.2.3) >> >> http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html >> >> Does anyone have current information about that dependency? >> >> My quick-n-dirty tests suggest that one can't build an app that uses >> both new Bioperl and the ensembl api without ensembl picking up the >> newer bioperl libraries (or your app getting the older ones). It's >> not clear what parts of the ensembl world depend on the older >> BioPerl. > > I've asked this question several times of the ensembl folk w/o an > adequate response. My general feeling is even they may not really > know for sure (though I recall ewan saying something about feature/ > annotation changes around then, and maybe something about the > blastreporter). > > Saying that, the ensembl perl API worked for me using bioperl-live > (and bioperl 1.6) as of a couple months ago. You might eventually > run into some issues; if so report them back here and to the ensembl > list. I'm not sure of the full list of dependencies, but my feeling is that most are related to the Ensembl application/web code; the blast interface in particular. I can support Chris's findings that the API works (AFAIK) with bioperl-live, but this is obviously untested. > >> Anyone have any recipes to make it work? >> >> Any info on a possible modernization of the ensembl code? > > That is completely up to the ensembl folks. bioperl 1.2.3 is full > enough of bugs, and I don't plan on backporting any changes to that > branch (seems kind of silly, as that branch is now about six yrs old). It would be nice if someone at Ensembl could compile a list of BioPerl dependencies. At least that would give a feel for the scope of the problem... Will From ak at ebi.ac.uk Tue Aug 25 09:43:19 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 25 Aug 2009 14:43:19 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <20090825134319.GE12422@qux.windows.ebi.ac.uk> [cut] > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... > > Will Hi Will, and list, These are the BioPerl modules that the Ensembl Core API "use" or otherwise directly call (scanned our current HEAD code): Bio::Annotation::DBLink in Bio::EnsEMBL::DBEntry Bio::Tools::CodonTable in Bio::EnsEMBL::Utils::TranscriptAlleles in Bio::EnsEMBL::PredictionTranscript in Bio::EnsEMBL::Transcript.pm Bio::LocatableSeq in Bio::EnsEMBL::DnaDnaAlignFeature Bio::PrimarySeqI in Bio::EnsEMBL::Slice Bio::Root::IO in Bio::EnsEMBL::Utils::Converter Bio::Root::Root in Bio::EnsEMBL::Utils::EasyArgv Bio::Seq in Bio::EnsEMBL::Utils::PolyA in Bio::EnsEMBL::Intron in Bio::EnsEMBL::Exon in Bio::EnsEMBL::Transcript in Bio::EnsEMBL::Translation in Bio::EnsEMBL::Utils::TranscriptAlleles Bio::SeqFeature::FeaturePair in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeature::Generic in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeatureI in Bio::EnsEMBL::SeqFeatureI Bio::SimpleAlign in Bio::EnsEMBL::DnaDnaAlignFeature Bio::Species in Bio::EnsEMBL::DBSQL::MetaContainer I have not looked at the other Ensembl APIs (Variation, FuncGen, Compara, Web, Pipeline, etc.), and I might possibly have missed references to some BioPerl modules. I have also not indicated the relative importance of any of these modules (clearly Bio::Seq is central, but I don't know how widely the code that accesses Bio::SeqFeature::Generic is used) or investigated if any of the references to BioPerl modules occur in deprecated code. As far as I know, there are currently no plans to get rid of these dependencies. Or there might be, only they are not very far up the priority list right now. I would be happy to look at conservative patches, but can not promise snappy response times. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer -{ }- European Bioinformatics Institute (EMBL-EBI) -{ }- Wellcome Trust Genome Campus, Hinxton -{ }- Cambridge CB10 1SD, United Kingdom -{ }- From cjfields at illinois.edu Tue Aug 25 10:07:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 Aug 2009 09:07:52 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <20090825134319.GE12422@qux.windows.ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <20090825134319.GE12422@qux.windows.ebi.ac.uk> Message-ID: <9D26C8FA-6D74-42C2-A2BD-4EFF529DA05A@illinois.edu> Andreas, Thanks for the response, been waiting for something a bit more official for a while now. We can definitely help you patch these as needed when problems arise, just let us know, or file a bug report listing issues. Scanning through there will be a could of future trouble spots: 1) We are very likely deprecating Bio::Species in favor of Bio::Taxon (that may be relatively easy to map, as Bio::Species now delegates to Bio::Taxon and similar anyway). 2) We will be refactoring Bio::SimpleAlign/LocatableSeq. There are too many corner cases where assumptions are made. We'll try to stick with the current API, but there may be a few delegating methods. More significantly, we're also planning a significant restructuring of bioperl prior to 1.7, basically splitting it into several (more easily maintainable) parts. The exact nature of these is still a bit fuzzy (we have to sort out dependencies) but we do plan on making a bundle package to assemble a complete old-style 'monolithic' bioperl, just a bit more customizable. It's very likely the versioning scheme will stay the same for the core (root) set of modules, but the others may end up having their own versioning for monitoring dependencies. chris On Aug 25, 2009, at 8:43 AM, Andreas K?h?ri wrote: > [cut] >> >> It would be nice if someone at Ensembl could compile a list of >> BioPerl dependencies. At least that would give a feel for the scope >> of the problem... >> >> Will > > Hi Will, and list, > > These are the BioPerl modules that the Ensembl Core API "use" or > otherwise directly call (scanned our current HEAD code): > > Bio::Annotation::DBLink > in Bio::EnsEMBL::DBEntry > > Bio::Tools::CodonTable > in Bio::EnsEMBL::Utils::TranscriptAlleles > in Bio::EnsEMBL::PredictionTranscript > in Bio::EnsEMBL::Transcript.pm > > Bio::LocatableSeq > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::PrimarySeqI > in Bio::EnsEMBL::Slice > > Bio::Root::IO > in Bio::EnsEMBL::Utils::Converter > > Bio::Root::Root > in Bio::EnsEMBL::Utils::EasyArgv > > Bio::Seq > in Bio::EnsEMBL::Utils::PolyA > in Bio::EnsEMBL::Intron > in Bio::EnsEMBL::Exon > in Bio::EnsEMBL::Transcript > in Bio::EnsEMBL::Translation > in Bio::EnsEMBL::Utils::TranscriptAlleles > > Bio::SeqFeature::FeaturePair > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeature::Generic > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeatureI > in Bio::EnsEMBL::SeqFeatureI > > Bio::SimpleAlign > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::Species > in Bio::EnsEMBL::DBSQL::MetaContainer > > > I have not looked at the other Ensembl APIs (Variation, FuncGen, > Compara, Web, Pipeline, etc.), and I might possibly have missed > references to some BioPerl modules. I have also not indicated > the relative importance of any of these modules (clearly Bio::Seq > is central, but I don't know how widely the code that accesses > Bio::SeqFeature::Generic is used) or investigated if any of the > references to BioPerl modules occur in deprecated code. > > As far as I know, there are currently no plans to get rid of these > dependencies. Or there might be, only they are not very far up the > priority list right now. I would be happy to look at conservative > patches, but can not promise snappy response times. > > > Regards, > Andreas > > -- > Andreas K?h?ri, Ensembl Software Developer -{ }- > European Bioinformatics Institute (EMBL-EBI) -{ }- > Wellcome Trust Genome Campus, Hinxton -{ }- > Cambridge CB10 1SD, United Kingdom -{ }- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From acpatel at usa.net Mon Aug 24 23:54:01 2009 From: acpatel at usa.net (Anand C. Patel) Date: Mon, 24 Aug 2009 22:54:01 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <9BA4272D-E7A1-4530-B8D8-B6156823BFDB@usa.net> I preloaded the NCBI taxonomy into the biosql database using the provided script before adding the sequences from genbank format text file (downloaded directly from genbank) using the script provided by bioperl-db, which would be what created the Bio::Species objects (I'd assume) from the text files, prior to inserting them into the database. Hope this helps, Anand On Aug 24, 2009, at 3:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>>> Hilmar Lapp wrote: >>>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>>> given that binomial() is correct. Could you file this as a bug >>>>>>> report? >>>>>> >>>>>> What code creates the Bio::Species object here? I suspect this >>>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>>> I see. Any pointer to what would tell me what I need to change >>>>> or is everything in the Bio::Species POD? >>>> >>>> ... I won't guarantee the perfection of the POD ;) >>>> >>>> >>>>> BTW what the Bioperl-db code does is instantiate the blank >>>>> object and then populate it through its accessors (mostly the >>>>> classification() array). If what it has been doing in the past >>>>> is now considered incorrect, at least it doesn't raise any >>>>> warning that would alert one to that ... >>>> >>>> Yuh... If you point out the code that creates the Bio::Species I >>>> can look into it for you and suggest what needs changing and why >>>> it doesn't work (or if it's a bug in Bio::Species). I can't >>>> remember things clearly right now, though classification() I >>>> guess was supposed to be backwards compatible. >>> Sendu, I think it's related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >>> Bio::DB::BioSQL::SpeciesAdaptor and >>> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules >>> in question i think. >> >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). I would > think this only affects sequence data loaded w/o taxonomy preloaded, > but I'm not sure. > > I suggest, if you can fix it, go ahead make the necessary change. > We can then post a big warning to BioSQL and here about the problem, > something along the lines of 'bioperl-db in svn may be backwards > incompatible with species information loaded in previous versions; > it may eat your first born' or similar. It's an absolutely > necessary fix, and may effectively kill a bunch of other db/species- > related bugs. > > chris > From dan.bolser at gmail.com Tue Aug 25 11:16:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 25 Aug 2009 16:16:14 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? Message-ID: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Hi, Can some one set $wgEnableMWSuggest on the BioPerl wiki please? http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest I generally find this a great feature to have on any MW install. Can we also create a page (usually "BioPerl:Configuration" (or '$wgSiteName:Configuration')) to report details of the specific MW configuration settings used on the wiki? This is also a good place for people to request configuration changes to tweak the way the wiki works. Cheers, Dan. From jason at bioperl.org Tue Aug 25 13:17:44 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 25 Aug 2009 10:17:44 -0700 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: Can you send sysadmin request mail to the helpdesk - support at open-bio.org so mauricio or someone can have it in the queue. [aside] I've had to stop doing OBF sysadmin work so we are definitely looking for someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris Dagdigian who do mediawiki and sysadmin support. We've reached a bit of crunch where there are lots of things to tweak and customize for the various flavors of MW installs that the projects want but we don't have enough dedicated admins to really support this. Most of us have gotten into these projects to support our own bioinformatics programming not sysadmin tasks so there is a bit of gap here. Some of us (me) were not trained as sysadmin but jumped in and figured out how to help and do it - and learned valuable life skills... =) We're discussing plans to upgrade the machines in the future which would improve performance and reliability we hope and also use this opportunity to streamline the MW installs to be a more easily maintained wikifarm. [/aside] -jason On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > Hi, > > Can some one set $wgEnableMWSuggest on the BioPerl wiki please? > > http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest > > > I generally find this a great feature to have on any MW install. Can > we also create a page (usually "BioPerl:Configuration" (or > '$wgSiteName:Configuration')) to report details of the specific MW > configuration settings used on the wiki? This is also a good place for > people to request configuration changes to tweak the way the wiki > works. > > > Cheers, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Tue Aug 25 09:45:59 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 25 Aug 2009 14:45:59 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... I just downloaded ? ensembl ? ensembl-compara ? ensembl-variation ? ensembl-functgenomics from their website and did a regex on the files for /^use (Bio::.+);/ which reveals (filtering out Bio::EnsEMBL::*): Bio::AlignIO Bio::Annotation::DBLink Bio::Das::ProServer::SourceAdaptor Bio::Das::ProServer::SourceAdaptor::Transport::generic Bio::Index::Fastq Bio::LocatableSeq Bio::Location::Simple Bio::MAGE::Experiment::Experiment Bio::MAGE::XMLUtils Bio::Perl Bio::PrimarySeq Bio::PrimarySeqI Bio::Root::Root Bio::Root::RootI Bio::Search::HSP::EnsemblHSP Bio::Seq Bio::SeqFeature::FeaturePair Bio::SeqFeature::Generic Bio::SeqFeatureI Bio::SeqIO Bio::SimpleAlign Bio::Species Bio::Tools::CodonTable Bio::Tools::Run::Phylo::PAML::Codeml Bio::TreeIO does that help? (I have the list broken down by which module/script contains which if that helps also) cheers adam From hartzell at alerce.com Tue Aug 25 16:22:20 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 13:22:20 -0700 Subject: [Bioperl-l] code review on LocatableSeq performance fix. Message-ID: <19092.18428.494334.482303@already.dhcp.gene.com> [For better or worse] I use pairs of locatable seq's to represent alignments between cDNAs (spliced mRNA) and genomic sequence. I end up using column_from_residue_number a lot to map features back and forth between the coordinate system. My sequences tend to be fairly long, and the current implementation of column_from_residue_number (which splits the sequences into arrays of individual characters) performs very badly on them. I've included below a small variation on a patch that I've been using for a while (when I pulled it up to the current bioperl-live I changed a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). Instead of creating whopping big arrays and then looping over them it breaks the sequence down into runs of residues/gaps and strides across them. It also unwinds the strandedness test and avoids the cute trick of using an anonymous sub (which saves a couple of lines in the source file but adds *signficant* overhead every time around the loop). All hail Devel::NYTProf. Chris et al.'s comments about the mysteries and vagaries of Bio::LocatableSeq makes me leary of just committing it. Anyone want to comment on it? g. Index: Bio/LocatableSeq.pm =================================================================== --- Bio/LocatableSeq.pm (revision 16001) +++ Bio/LocatableSeq.pm (working copy) @@ -423,27 +423,47 @@ unless $resnumber =~ /^\d+$/ and $resnumber > 0; if ($resnumber >= $self->start() and $resnumber <= $self->end()) { - my @residues = split //, $self->seq; - my $count = $self->start(); - my $i; - my ($start,$end,$inc,$test); - my $strand = $self->strand || 0; - # the following bit of "magic" allows the main loop logic to be the - # same regardless of the strand of the sequence - ($start,$end,$inc,$test)= ($strand == -1)? - (scalar(@residues-1),0,-1,sub{$i >= $end}) : - (0,scalar(@residues-1),1,sub{$i <= $end}); + my @chunks; + my $column_incr; + my $current_column; + my $current_residue = $self->start - 1; + my $seq = $self->seq; + my $strand = $self->strand || 0; - for ($i=$start; $test->(); $i+= $inc) { - if ($residues[$i] ne '.' and $residues[$i] ne '-') { - $count == $resnumber and last; - $count++; - } - } - # $i now holds the index of the column. - # The actual column number is this index + 1 + if ($strand == -1) { +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = -1; + $current_column = (CORE::length $seq) + 1; + } + else { +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = 1; + $current_column = 0; + } - return $i+1; + while (my $chunk = shift @chunks) { +# if ($chunk =~ m|^[\.\-]|o) { + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { + $current_column += $column_incr * CORE::length($chunk); + } + else { + if ($current_residue + CORE::length($chunk) < $resnumber) { + $current_column += $column_incr * CORE::length($chunk); + $current_residue += CORE::length($chunk); + } + else { + if ($strand == -1) { + $current_column -= $resnumber - $current_residue; + } + else { + $current_column += $resnumber - $current_residue; + } + return $current_column; + } + } + } } $self->throw("Could not find residue number $resnumber"); From hartzell at alerce.com Tue Aug 25 17:07:43 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 14:07:43 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> Message-ID: <19092.21151.457226.192791@already.dhcp.gene.com> Adam Witney writes: > > > > It would be nice if someone at Ensembl could compile a list of > > BioPerl dependencies. At least that would give a feel for the scope > > of the problem... > > I just downloaded > > $,1s"(B ensembl > $,1s"(B ensembl-compara > $,1s"(B ensembl-variation > $,1s"(B ensembl-functgenomics > > from their website and did a regex on the files for > > /^use (Bio::.+);/ > > which reveals (filtering out Bio::EnsEMBL::*): > > Bio::AlignIO > Bio::Annotation::DBLink > Bio::Das::ProServer::SourceAdaptor > Bio::Das::ProServer::SourceAdaptor::Transport::generic > Bio::Index::Fastq > Bio::LocatableSeq > Bio::Location::Simple > Bio::MAGE::Experiment::Experiment > Bio::MAGE::XMLUtils > Bio::Perl > Bio::PrimarySeq > Bio::PrimarySeqI > Bio::Root::Root > Bio::Root::RootI > Bio::Search::HSP::EnsemblHSP > Bio::Seq > Bio::SeqFeature::FeaturePair > Bio::SeqFeature::Generic > Bio::SeqFeatureI > Bio::SeqIO > Bio::SimpleAlign > Bio::Species > Bio::Tools::CodonTable > Bio::Tools::Run::Phylo::PAML::Codeml > Bio::TreeIO > > does that help? (I have the list broken down by which module/script > contains which if that helps also) What would be most useful to me would be to understand where they *need* to use release 1.2.3. Is there something magical about their use of e.g. Bio::Seq. It's worth noting that your technique won't pick up various modules that are loaded on demand by e.g. Bio::SearchIO. g. From maj at fortinbras.us Wed Aug 26 07:39:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 07:39:40 -0400 Subject: [Bioperl-l] code review on LocatableSeq performance fix. In-Reply-To: <19092.18428.494334.482303@already.dhcp.gene.com> References: <19092.18428.494334.482303@already.dhcp.gene.com> Message-ID: <55514878273F4E3F8D9E438FD2F3AB7D@NewLife> I think it's great. column_from_residue_number doesn't have any secret side effects, and the patch preserves nice integer in, nice integer out, and input and output both are 1-origin indices as far as I can tell. I say go for it- MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, August 25, 2009 4:22 PM Subject: [Bioperl-l] code review on LocatableSeq performance fix. > > [For better or worse] I use pairs of locatable seq's to represent > alignments between cDNAs (spliced mRNA) and genomic sequence. > > I end up using column_from_residue_number a lot to map features back > and forth between the coordinate system. > > My sequences tend to be fairly long, and the current implementation of > column_from_residue_number (which splits the sequences into arrays of > individual characters) performs very badly on them. > > I've included below a small variation on a patch that I've been using > for a while (when I pulled it up to the current bioperl-live I changed > a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It > passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). > > Instead of creating whopping big arrays and then looping over them it > breaks the sequence down into runs of residues/gaps and strides across > them. It also unwinds the strandedness test and avoids the cute trick > of using an anonymous sub (which saves a couple of lines in the source > file but adds *signficant* overhead every time around the loop). > > All hail Devel::NYTProf. > > Chris et al.'s comments about the mysteries and vagaries of > Bio::LocatableSeq makes me leary of just committing it. > > Anyone want to comment on it? > > g. > > Index: Bio/LocatableSeq.pm > =================================================================== > --- Bio/LocatableSeq.pm (revision 16001) > +++ Bio/LocatableSeq.pm (working copy) > @@ -423,27 +423,47 @@ > unless $resnumber =~ /^\d+$/ and $resnumber > 0; > > if ($resnumber >= $self->start() and $resnumber <= $self->end()) { > - my @residues = split //, $self->seq; > - my $count = $self->start(); > - my $i; > - my ($start,$end,$inc,$test); > - my $strand = $self->strand || 0; > - # the following bit of "magic" allows the main loop logic to be the > - # same regardless of the strand of the sequence > - ($start,$end,$inc,$test)= ($strand == -1)? > - (scalar(@residues-1),0,-1,sub{$i >= $end}) : > - (0,scalar(@residues-1),1,sub{$i <= $end}); > + my @chunks; > + my $column_incr; > + my $current_column; > + my $current_residue = $self->start - 1; > + my $seq = $self->seq; > + my $strand = $self->strand || 0; > > - for ($i=$start; $test->(); $i+= $inc) { > - if ($residues[$i] ne '.' and $residues[$i] ne '-') { > - $count == $resnumber and last; > - $count++; > - } > - } > - # $i now holds the index of the column. > - # The actual column number is this index + 1 > + if ($strand == -1) { > +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = -1; > + $current_column = (CORE::length $seq) + 1; > + } > + else { > +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = 1; > + $current_column = 0; > + } > > - return $i+1; > + while (my $chunk = shift @chunks) { > +# if ($chunk =~ m|^[\.\-]|o) { > + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { > + $current_column += $column_incr * CORE::length($chunk); > + } > + else { > + if ($current_residue + CORE::length($chunk) < $resnumber) { > + $current_column += $column_incr * CORE::length($chunk); > + $current_residue += CORE::length($chunk); > + } > + else { > + if ($strand == -1) { > + $current_column -= $resnumber - $current_residue; > + } > + else { > + $current_column += $resnumber - $current_residue; > + } > + return $current_column; > + } > + } > + } > } > > $self->throw("Could not find residue number $resnumber"); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Aug 26 10:59:24 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 26 Aug 2009 16:59:24 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis Message-ID: <4A954DCC.4050200@pasteur.fr> Hi, I am playing with Bio::Restriction::* objects and find it very useful. Especially I am filtering output for blunt and cohesive enzymes. However, there's an exception thrown when I use 'cutters' method from B::R::Analysis : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (34). End must be less than the total length of sequence (total=7) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::PrimarySeq::subseq /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 STACK: Bio::Restriction::Analysis::_cuts /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 STACK: Bio::Restriction::Analysis::cut /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 STACK: Bio::Restriction::Analysis::cutters /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion lib/Bio/Restriction/Analysis/blunt.pm:86 STACK: Bio::Restriction::Analysis::blunt::cut_in_frames lib/Bio/Restriction/Analysis/blunt.pm:65 STACK: ./check_phase.pl:213 ----------------------------------------------------------- The problem with this enzyme is that the cut site is over the enzyme recognition site (from Rebase withrefm.907): <1>BceSI <2> <3>SSAAGCG(27/27) <4> <5>Bacillus cereus <6>ATCC 10987 <7> <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. Lett., vol. 202, pp. 189-193. Xu, S.-Y., Unpublished observations. For this enzyme, here are the values stored into B::R::Enzyme object ($e): $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN $e->cut => 34 $e->string => SSAAGCG $e->seq->seq => SSAAGCG So my question is, wouldn't be faire to set B::PrimarySeq::seq with value of $e->site when such enzyme are seen in the source file. NOTE from B::R::Analysis::_enzymes_sites (commented): # The following should not be an exception, both Type I and Type III # enzymes cut outside of their recognition sequences #if ($site < 0 || $site > length($enz->string)) { # $self->throw("This is (probably) not your fault.\nGot a cut site of $site and a # sequence of ".$enz->string); # } And this is exactly the problem I'm facing! In _enzymes_sites the code is trying to subseq our sequence to get before and after seq as : $beforeseq=$enz->seq->subseq(1, $site); $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); and this throws an error as the cutting site is far over (pos 34) the enzyme know recognition site SSAAGCG (length=7). Has anybody a clue on how to fix/patch it? Thanks for any reply Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Aug 26 11:20:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 10:20:59 -0500 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: <07222470-41ED-4E17-9383-65A7D02CE9E1@illinois.edu> What version of Bioperl are you using? Mark Jensen did some refactoring of this code after the 1.6.0 release that should appear in 1.6.1; I'll be working on the first alpha for that release starting Friday. chris On Aug 26, 2009, at 9:59 AM, Emmanuel Quevillon wrote: > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type > III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Wed Aug 26 11:38:44 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Wed, 26 Aug 2009 11:38:44 -0400 Subject: [Bioperl-l] Generalized reciprocal blast Message-ID: I would like to know whether or not anyone has attempted to create a "generalized" reciprocal blast component for BioPerl? One sees papers all the time where they discuss running reciprocal blasts to compare a new species to an old "standard" species or a set of species or running an all-to-all set of comparisons to match up all of the "known" proteins from species and determine which are outliers (and therefore "novel"). There are also accumulating merged sets in NCBI HomoloGene (which seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) and Ensembl (which seems to be working with a much larger set of 40-50 genomes some of which may be somewhat incomplete and are certainly poorly "explored". I have, I believe, seen code "fragments" from various authors, perhaps some on the BioPerl list, which perform some major subset of a typical "reciprocal blast". Now what I am looking for is a relatively generalizable some-to-some reciprocal blast utility. I want to be able to specify the genes (or gene family), e.g. some of the ~150 known DNA repair genes. It would be helpful to also specify how "tolerant" the blast "true reciprocal" criteria are. There are some genes where there is a very strict 1-to-1 relationship across many genomes. But for genes which involve relatively standard domains, e.g. "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for example its more like 5-to-5 and it would be really nice to be able to specify the strictness or quality level [1] for "matching" genes (and even which genes are to be excluded because they are known to be false homologues). Then to top this off I want to be able to combine known public e.g. (HomoloGene / Uniigene / Ensembl) databases with perhaps local private databases or database subsets (e.g. emerging or specialized genomes). The goal here of course to determine the precise phylogenetic relationships between all of the DNA repair genes and how there may be gain / loss / evolution of function that can be related to species characteristics (size, longevity, etc.). Is there a generalized reciprocal blast component in BioPerl? Or is it a "build-it-yourself" situation (that I have to believe has been built probably a few dozen times by various researchers / organizations / companies)? Thanks, Robert Bradbury 1. This would be handled in BioPerl with a customizable user function which could be tailored to handle specific cases -- for example a function which when handed a set of 100 potential "matches" could go through those 100 matches, identify common domains, and then "re-rate" matches based on considerations such as the type and number of common domains, domains being in the same order, etc. I.e. criteria which may be difficult to completely generalize across entire genomes but are fairly obvious if you are looking at a graphical replication of a gene set in HomoloGene. From jason at bioperl.org Wed Aug 26 11:55:04 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 26 Aug 2009 08:55:04 -0700 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: Robert - BioPerl is has traditionally been a toolkit for building these types of pipelines and not intended to necessarily be a place for larger systems. That said, BRH is a pretty easy algorithm that could be applied with the tools in place, the main issue is what kind of lookup table you want to do for establishing the BRH. Hashes are okay, but I think BDB or Sqlite end up being more scalable and allow for persistence. Really, I would use something like OrthoMCL rather than reciprocal BLAST to identify families anyways. It uses Bioperl under the hood for parsing - though it suffers from some pretty inefficient management of the lookup table for the BRH part of the algorithm - it can be run on your own customized datasets to integrate public and private data. You might also find better luck in building good alignments for the key members of your target gene family of interest and then using a profile HMM (or even just the new HMMER3 jackhmmer or phmmer which don't require a MSA) to identify the full set of homologs in all the databases. If this is the only set of families you care about it is a lot less computational work to go through and pull these out with an HMM or HMMER search and build trees from these results rather than dealing with the computational time of the all-vs-all DB searches that you are proposing. -jason On Aug 26, 2009, at 8:38 AM, Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal > blasts to > compare a new species to an old "standard" species or a set of > species or > running an all-to-all set of comparisons to match up all of the > "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI > HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" > genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly > poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, > perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes > (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be > helpful > to also specify how "tolerant" the blast "true reciprocal" criteria > are. > There are some genes where there is a very strict 1-to-1 > relationship across > many genomes. But for genes which involve relatively standard > domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in > mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes > (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic > relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics > (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is > it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user > function which > could be tailored to handle specific cases -- for example a function > which > when handed a set of 100 potential "matches" could go through those > 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, > domains being > in the same order, etc. I.e. criteria which may be difficult to > completely > generalize across entire genomes but are fairly obvious if you are > looking > at a graphical replication of a gene set in HomoloGene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Wed Aug 26 11:20:41 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 11:20:41 -0400 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: Hi Emmanuel-- This may be fixed in the latest version of Bio::Restriction, which is not available in the standard 1.6 distribution. I suggest you try replacing the Bio/Restriction directory in your distribution with the current bioperl-live modules. You can get these by using Subversion: $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction ./Restriction If you're brave, better might be to obtain the latest trunk and reinstall; $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live $ cd bioperl-live $ perl Build.PL $ ./Build $ ./Build test $ ./Build install Please update the list with your progress- cheers Mark ----- Original Message ----- From: "Emmanuel Quevillon" To: Sent: Wednesday, August 26, 2009 10:59 AM Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Aug 26 12:03:59 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 12:03:59 -0400 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: re:aside -- I can help with this; I promise not to break anything. cheers MAJ ----- Original Message ----- From: "Jason Stajich" To: "Dan Bolser" Cc: "BioPerl List" Sent: Tuesday, August 25, 2009 1:17 PM Subject: Re: [Bioperl-l] $wgEnableMWSuggest on the wiki please? > Can you send sysadmin request mail to the helpdesk - support at open-bio.org > so mauricio or someone can have it in the queue. > > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking > for someone to help with the ALL VOLUNTEER team of now just Mauricio > and Chris Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak > and customize for the various flavors of MW installs that the projects > want but we don't have enough dedicated admins to really support > this. Most of us have gotten into these projects to support our own > bioinformatics programming not sysadmin tasks so there is a bit of gap > here. Some of us (me) were not trained as sysadmin but jumped in and > figured out how to help and do it - and learned valuable life > skills... =) > > We're discussing plans to upgrade the machines in the future which > would improve performance and reliability we hope and also use this > opportunity to streamline the MW installs to be a more easily > maintained wikifarm. > > [/aside] > > -jason > On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > >> Hi, >> >> Can some one set $wgEnableMWSuggest on the BioPerl wiki please? >> >> http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest >> >> >> I generally find this a great feature to have on any MW install. Can >> we also create a page (usually "BioPerl:Configuration" (or >> '$wgSiteName:Configuration')) to report details of the specific MW >> configuration settings used on the wiki? This is also a good place for >> people to request configuration changes to tweak the way the wiki >> works. >> >> >> Cheers, >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed Aug 26 12:25:21 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 Aug 2009 18:25:21 +0200 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <628aabb70908260925q25039506nab6e1c661f704e2a@mail.gmail.com> Hi Robert, Just to add another comment on this: The problem of identifying orthologs is quite a bit trickier than it looks, in part due to the many-to-many relationships you noted. There is a whole body of literature on this topic -- here's a recent review that includes OrthoMCL that Jason mentioned and others: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000262 (disclaimer: I work in a lab that offers one of the many attempts to solve this problem) So I would say that although it is possible to make a customizable function as you describe, there are several existing approaches (read: downloadable code you can run on your data) that would probably give better results. Dave From hsa_rim at yahoo.co.in Wed Aug 26 15:56:38 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 01:26:38 +0530 (IST) Subject: [Bioperl-l] Latest Cytoband files Message-ID: <484629.15190.qm@web94612.mail.in2.yahoo.com> Hi, Can anybody tell me how can I get latest cytoband files with stain information for homo spaiens, mus musculus and others. I am using 36.3 version of RefSeq for Humans and 36.1 version of RefSeq for mus musculus. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From cjfields at illinois.edu Wed Aug 26 16:36:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 15:36:31 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates Message-ID: All, I just pushed one very key bit for nextgen sequence analysis to svn, mainly parsing of all three FASTQ variants. These can be called by using: # grabs the FASTQ parser, specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq-illumina', -file => 'mydata.fq'); # same, explicitly specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq', -variant => 'illumina', -file => 'mydata.fq'); # simple 'fastq' format defaults to 'sanger' variant my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>mydata.fq'); FASTQ works for both input and output. As mentioned before, the next_dataset() method also exists for getting simple hashrefs, see the module documentation for more. This was one of the few remaining blockers for the 1.6.1 point release. I'll run a clean checkout of main trunk to test, then work on merging everything over from trunk starting Friday and push out 1.6.0_1 (first alpha) beginning of next week to get some CPAN Tester information. If everything looks fine the final point release will follow soon after. Cheers! chris From rmb32 at cornell.edu Wed Aug 26 16:56:20 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 Aug 2009 13:56:20 -0700 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <4A95A174.3070706@cornell.edu> Hurray! You rock Chris! R From lsbrath at gmail.com Wed Aug 26 17:08:06 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 26 Aug 2009 17:08:06 -0400 Subject: [Bioperl-l] rendering graphics from genbank files. Message-ID: <69367b8f0908261408g6750c1d2we3409a016fe186b7@mail.gmail.com> Hi, I am running into to problems rendering the 5'UTR and 3'UTR features in the graphic. I get an error message saying that these are string literals. Better yet, how do I add the 5'UTR and 3'UTR regions to the CDS feature when the only features in my genbank file are mRNA, CDS, and gene? What I want is to display the gene structure. I am using the last template provided in bioperl howto graphics. Mgavi From biopython at maubp.freeserve.co.uk Wed Aug 26 17:16:08 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 22:16:08 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> On Wed, Aug 26, 2009 at 9:36 PM, Chris Fields wrote: > All, > > I just pushed one very key bit for nextgen sequence analysis to svn, mainly > parsing of all three FASTQ variants. ?These can be called by using: > > ?# grabs the FASTQ parser, specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq-illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# same, explicitly specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -variant ? => 'illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# simple 'fastq' format defaults to 'sanger' variant > ?my $out = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> '>mydata.fq'); > > FASTQ works for both input and output. ?As mentioned before, the > next_dataset() method also exists for getting simple hashrefs, see the > module documentation for more. > > This was one of the few remaining blockers for the 1.6.1 point release. > ... ?If everything looks fine the final point release will follow soon after. It is looking much better than yesterday - nice work :) However, there are a few rough edges still. =========================== Evil wrapping =========================== Chris - Did you get the zip file of FASTQ examples I sent off list? One of these was the evil_wrapping.fastq file already in Biopython CVS/git (under a new name). This is intended as a real torture test, with line wrapped quality strings where plenty of the lines start with "+" or "@" characters. Bioperl doesn't like this file at all - but I have not dug into why. =========================== Sanger To Illumina 1.3+ =========================== When mapping a Sanger FASTQ file with very high scores to Illumina, these don't get the maximum value imposes (ASCII 126, tidle). e.g. $ ./biopython_sanger2illumina < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:676: UserWarning: Data loss - max PHRED quality 62 in Illumina FASTQ warnings.warn("Data loss - max PHRED quality 62 in Illumina FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ But, with bioperl-live SVN, $ ./bioperl_sanger2illumina < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for illumina:63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ You are using "@" (ASCI 64), which in this context means a PHRED score of zero. =========================== Sanger To Solexa =========================== Likewise when mapping a Sanger FASTQ file with very high scores to Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, tidle). For example, $ ./biopython_sanger2solexa < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; But, $ ./bioperl_sanger2solexa < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for solexa:0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFEDB@><< i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 (an odd thing to happen - getting the lowest score wouldn't surprise me so much). Furthermore, notice that PHRED scores 0 and 1 have both been mapped to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. =========================== Still, things are looking up :) Peter From maj at fortinbras.us Wed Aug 26 17:03:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 17:03:13 -0400 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <4A95A174.3070706@cornell.edu> References: <4A95A174.3070706@cornell.edu> Message-ID: <1E03634D20424F659F417AE7F5D26039@NewLife> +1 ----- Original Message ----- From: "Robert Buels" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 26, 2009 4:56 PM Subject: Re: [Bioperl-l] Next-Gen and the next point release - updates > Hurray! You rock Chris! > > R > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sac at bioperl.org Wed Aug 26 18:33:16 2009 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 26 Aug 2009 15:33:16 -0700 Subject: [Bioperl-l] MGED meeting in Phoenix, AZ, Oct 5-8 Message-ID: <8f200b4c0908261533y74c42b1aif662ef13a8fe6711@mail.gmail.com> The MGED Society's annual meeting is of potential interest to anyone working with functional genomics data sets, or interested in best practices for analyzing and annotating their functional genomics experiments. The meeting topic is "Next-Gen Sequencing and Translational Genomics" and as usual, they've got a great line-up of speakers (included below). It's in Phoenix, AZ Oct 5-8, early registration ends on 5 Sep. (Note that MGED has expanded its reach beyond just microarrays.) For more information on registration and abstract submission, go to * http://www.mgedmeeting.org* For hotel accommodations, go to * http://www.starwoodmeeting.com/StarGroupsWeb/res?id=0903232443&key=42DE2* Keynotes *Hank Greely* Deane F. and Kate Edelman Johnson Professor of Law Stanford Law School *Elaine Mardis* Associate Professor, Genetics, Molecular Microbiology Washington University in St. Louis School of Medicine *Daniel Von Hoff* Director, Clinical Translational Research Division Translational Genomics Research Institute (TGen) Plenary Speakers: *Steven Brenner* Associate Professor, Plant and Microbial Biology University of California, Berkeley *Lynda Chin* Associate Professor, Dermatology Dana Farber Cancer Institute, Harvard Medical School *David Craig* Associate Director, Neurogenomics Division Translational Genomics Research Institute (TGen) *Michael Eisen* Scientist, Lawrence Berkeley National Lab and Associate Professor Department of Molecular and Cellular Biology, University of California, Berkeley *Gad Getz* Head of Cancer Genome Analysis at the Broad Institute of MIT and Harvard *Mathieu Lupien* Assistant Professor, Genetics Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center *Joanna Mountain* Senior Director, Research 23andMe, Inc. *Dana Pe'er* Assistant Professor, Biology and Computer Science Columbia University Biological Sciences *John Quackenbush* Professor of Computational Biology & Bioinformatics, Biostatistics Dana Farber Cancer Institute, Harvard School of Public Health *Cole Trapnell* Ph. D. Student, Computer Science University of Maryland, College Park From cjfields at illinois.edu Wed Aug 26 22:52:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 21:52:13 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: On Aug 26, 2009, at 4:16 PM, Peter wrote: > It is looking much better than yesterday - nice work :) > However, there are a few rough edges still. Not unexpected, actually. > =========================== > Evil wrapping > =========================== > Chris - Did you get the zip file of FASTQ examples I sent off list? > One of > these was the evil_wrapping.fastq file already in Biopython CVS/git > (under > a new name). This is intended as a real torture test, with line > wrapped > quality strings where plenty of the lines start with "+" or "@" > characters. > Bioperl doesn't like this file at all - but I have not dug into why. Now fixed; I've saved this as very_tricky.fastq, but it's the same file. > =========================== > Sanger To Illumina 1.3+ > =========================== > When mapping a Sanger FASTQ file with very high scores to Illumina, > these don't get the maximum value imposes (ASCII 126, tidle). e.g. ... Yes, I know where that one is going wrong. Fixed now for bounds for the above. Partly related to the below. > =========================== > Sanger To Solexa > =========================== > Likewise when mapping a Sanger FASTQ file with very high scores to > Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, > tidle). For example, > > $ ./biopython_sanger2solexa < sanger_93.fastq > /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: > UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ > warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFECB@>;; > > But, > > $ ./bioperl_sanger2solexa < sanger_93.fastq > > --------------------- WARNING --------------------- > MSG: Quality values not found for > solexa: > 0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 > --------------------------------------------------- > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFEDB@><< > > i.e. You've mapped the high value scores to "<", ASCII 60, thus > Solexa -4 > (an odd thing to happen - getting the lowest score wouldn't surprise > me so > much). This one is fixed, it was the same bounding issue as above. > Furthermore, notice that PHRED scores 0 and 1 have both been mapped > to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning > Solexa -5. The two conversions to solexa are still failing. I'm not sure but I think it's something fairly simple, but I can't work on it until Friday (got too many other things on my plate ATM). If I get stumped I'll post a message. > =========================== > > Still, things are looking up :) > > Peter Yes they are, much more so that previously. I'll add these to the tests. chris From tuco at pasteur.fr Thu Aug 27 04:28:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Thu, 27 Aug 2009 10:28:41 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: References: <4A954DCC.4050200@pasteur.fr> Message-ID: <4A9643B9.7000709@pasteur.fr> Mark A. Jensen wrote: > Hi Emmanuel-- > This may be fixed in the latest version of Bio::Restriction, which is not > available in the standard 1.6 distribution. I suggest you try replacing the > Bio/Restriction directory in your distribution with the current > bioperl-live > modules. You can get these by using Subversion: > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction > ./Restriction Hi Mark, Thanks for pointing me to this svn repo. I've just updated the Bio::Restriction::* part just to test it. I don't get any error anymore. I just need to continue working on this with my ideas. I'll let you know if I encounter any other problem. Cheers Emmanuel > > If you're brave, better might be to obtain the latest trunk and reinstall; > > $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > $ cd bioperl-live > $ perl Build.PL > $ ./Build > $ ./Build test > $ ./Build install > > Please update the list with your progress- > cheers > Mark >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From dan.bolser at gmail.com Thu Aug 27 06:34:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 27 Aug 2009 11:34:00 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: <2c8757af0908270334kcb3dfc4w17553e65f7e0e4b5@mail.gmail.com> 2009/8/25 Jason Stajich : > Can you send sysadmin request mail to the helpdesk - support at open-bio.org?so > mauricio or someone can have it in the queue. OK. > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking for > someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris > Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak and > customize for the various flavors of MW installs that the projects want but > we don't have enough dedicated admins to really support this. ?Most of us I know how you feel! > have gotten into these projects to support our own bioinformatics > programming not sysadmin tasks so there is a bit of gap here. Some of us > (me) were not trained as sysadmin but jumped in and figured out how to help > and do it - and learned valuable life skills... =) > > We're discussing plans to upgrade the machines in the future which would > improve performance and reliability we hope and also use this opportunity to > streamline the MW installs to be a more easily maintained wikifarm. Sounds like a good idea. There are also extensions that put more of the MW config on the website itself (restricted to admins of course). Dan. From hsa_rim at yahoo.co.in Thu Aug 27 07:14:03 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 16:44:03 +0530 (IST) Subject: [Bioperl-l] Mapping of genome with cytoband Message-ID: <29549.68962.qm@web94610.mail.in2.yahoo.com> Hi, I need gene , mrna , cds , sts and exon files as per the mapping with cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and .gbk files but the genes and other features are not coming across the whole chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from .gbk / .gbs files the locations on chromosome 1 genes show only half way on the ideogram graph. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From biopython at maubp.freeserve.co.uk Thu Aug 27 07:55:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 27 Aug 2009 12:55:55 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: <320fb6e00908270455y2a80907chfae8007df60e72e2@mail.gmail.com> On Thu, Aug 27, 2009 at 3:52 AM, Chris Fields wrote: > > On Aug 26, 2009, at 4:16 PM, Peter wrote: > >> It is looking much better than yesterday - nice work :) >> However, there are a few rough edges still. > > Not unexpected, actually. > >> =========================== >> Evil wrapping >> =========================== >> Chris - Did you get the zip file of FASTQ examples I sent off list? One of >> these was the evil_wrapping.fastq file already in Biopython CVS/git (under >> a new name). This is intended as a real torture test, with line wrapped >> quality strings where plenty of the lines start with "+" or "@" >> characters. >> Bioperl doesn't like this file at all - but I have not dug into why. > > Now fixed; I've saved this as very_tricky.fastq, but it's the same file. Looks good. >> =========================== >> Sanger To Illumina 1.3+ >> =========================== >> When mapping a Sanger FASTQ file with very high scores to Illumina, >> these don't get the maximum value imposes (ASCII 126, tidle). e.g. > > ... > > Yes, I know where that one is going wrong. ?Fixed now for bounds for the > above. ?Partly related to the below. Looks good. >> =========================== >> Sanger To Solexa >> =========================== >> Likewise when mapping a Sanger FASTQ file with very high scores to >> Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, >> tidle). For example, >> ... >> i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 >> (an odd thing to happen - getting the lowest score wouldn't surprise me so >> much). > > This one is fixed, it was the same bounding issue as above. Yes, the high score truncation looks good. >> Furthermore, notice that PHRED scores 0 and 1 have both been mapped >> to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. > > The two conversions to solexa are still failing. ?I'm not sure but I think > it's something fairly simple, but I can't work on it until Friday (got too > many other things on my plate ATM). ?If I get stumped I'll post a message. Actually it's not just PHRED 0 and 1 that look wrong, all of the low scores are messed up. I could repeat this using the sanger_93.fastq file, but to avoid email line wrapping here I'm using a smaller example file with PHRED scores in the range 40 to 0 only: $ cat sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! Biopython: $ python ./biopython_sanger2solexa.py < sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; BioPerl SVN (with Chris' latest fixes): $ ./bioperl_sanger2solexa.pl < sanger_faked.fastq --------------------- WARNING --------------------- MSG: Data loss for solexa: following values exceed max 62 0 --------------------------------------------------- @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFDCA?=~ The last ten characters are wrong (i.e. PHRED score 0 to 9, which is precisely the range where the PHRED/Solexa mapping is non trivial). Also note that data loss warning is misleading (0 is less than 62). Plus you get the exactly same problems with Illumina to Solexa. This should narrow it down - the bug is in mapping PHRED scores (from either Sanger or Illumina 1.3+ files) to the Solexa encoding. Peter From sanjaysingh765 at gmail.com Thu Aug 27 09:59:13 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Thu, 27 Aug 2009 19:29:13 +0530 Subject: [Bioperl-l] query about libwww-perl collection Message-ID: hello, i want to use libwww-perl collection to query BLINK with multiple queries. it works in very good way for single but how can i used it for multiple queries...lz help me out regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From bosborne11 at verizon.net Thu Aug 27 11:10:30 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 11:10:30 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: > > ----- Original Message ----- From: "Hilmar Lapp" > ... >> As for the FASTA example, I can understand - I've heard repeatedly >> from people that one of the things that they are missing is >> documentation for every SeqIO format we support (such as GenBank, >> UniProt, FASTA, etc) about where to find a particular piece of the >> format in the object model. > .... > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help create > our list of action items. > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 27 13:38:45 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:38:45 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations Message-ID: <4A96C4A5.9090406@cornell.edu> Hi all, Recently a user came into #bioperl looking to truncate an annotated sequence (leaving the region between e.g. 150 to 250 nt), and have the annotations from the original sequence be remapped onto the new truncated sequence. Poking through code, I came across an undocumented function trunc() that from the comments looks like it was written by Jason as part of a master plan to implement this very functionality. Just wondering, what's the status of that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rmb32 at cornell.edu Thu Aug 27 13:40:41 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:40:41 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C4A5.9090406@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> Message-ID: <4A96C519.3020001@cornell.edu> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 Rob Robert Buels wrote: > Hi all, > > Recently a user came into #bioperl looking to truncate an annotated > sequence (leaving the region between e.g. 150 to 250 nt), and have the > annotations from the original sequence be remapped onto the new > truncated sequence. > > Poking through code, I came across an undocumented function trunc() that > from the comments looks like it was written by Jason as part of a master > plan to implement this very functionality. > > Just wondering, what's the status of that? > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 27 14:20:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 13:20:42 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C519.3020001@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> Message-ID: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> It's not implemented completely. As Jason mentioned in the bug report, it was meant to be part of an overall system to truncate sequences with remapped features, but the implementation in place is substandard. It's open for implementation if anyone wants to take it up. I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal with this in a more elegant and lightweight way, and is probably the direction I would take. YMMV. chris On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 > > Rob > > Robert Buels wrote: >> Hi all, >> Recently a user came into #bioperl looking to truncate an annotated >> sequence (leaving the region between e.g. 150 to 250 nt), and have >> the annotations from the original sequence be remapped onto the new >> truncated sequence. >> Poking through code, I came across an undocumented function trunc() >> that from the comments looks like it was written by Jason as part >> of a master plan to implement this very functionality. >> Just wondering, what's the status of that? >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Aug 27 14:41:28 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 11:41:28 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> Message-ID: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Yeah one thought that we batted around at a hackathon many moons ago had been to use Bio::DB::SeqFeature in a lightweight way under the hood to represent sequences in layers more rather than the arbitrary data model that is setup by focusing on handling GenBank records. A lot of the architecture development (that is like 10-15 years old now!) was initially just focused on round-tripping the sequence files. We more recently felt like a new model was more appropriate. With the fast SQLite implementation that Lincoln has put in for DB::SeqFeature we could in theory map every sequence into a SQLite DB and then have the power of the interface. Some more bells and whistles might be needed but the basic API is respected AFAIK and it prevents needing to store whole sequences in memory. The SeqIO->DB::SeqFeature loading would need some finessing so that as parsed the sequence object could be updated efficiently. Actually this might also help reduce the number of objects needed to be created by basically efficiently serializing sequences into the DB on parsing (and with some simple caching this could make for pretty fast system). Since disk is basically not a limitation now could be an interesting experiment? Maybe it is too out there, but if not it could be something major enough that it has to go in a bioperl-2/ bioperl-ng. It sort of assumes the data model of Bio::DB::SeqFeature is adequate for all the messiness of sequence data formats and one problem for some people has been the seq file format => GFF in order to load it into a SeqFeature DB for Gbrowse... So I don't know what are the boundary cases here. Certainly for FASTA it should be straightforward. -jason On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > It's not implemented completely. As Jason mentioned in the bug > report, it was meant to be part of an overall system to truncate > sequences with remapped features, but the implementation in place is > substandard. It's open for implementation if anyone wants to take > it up. > > I should point out, though, in my opinion Bio::DB::GFF/SeqFeature > deal with this in a more elegant and lightweight way, and is > probably the direction I would take. YMMV. > > chris > > On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > >> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >> >> Rob >> >> Robert Buels wrote: >>> Hi all, >>> Recently a user came into #bioperl looking to truncate an >>> annotated sequence (leaving the region between e.g. 150 to 250 >>> nt), and have the annotations from the original sequence be >>> remapped onto the new truncated sequence. >>> Poking through code, I came across an undocumented function >>> trunc() that from the comments looks like it was written by Jason >>> as part of a master plan to implement this very functionality. >>> Just wondering, what's the status of that? >>> Rob >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From lsbrath at gmail.com Thu Aug 27 15:04:36 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 27 Aug 2009 15:04:36 -0400 Subject: [Bioperl-l] rendering the 5' & 3' UTR in a graphic Message-ID: <69367b8f0908271204p7f153be1p6673faac931b646d@mail.gmail.com> Hello, I am able to render all of the features except the 5' & 3' UTR. This is how the features part of the Genbank file looks: FEATURES Location/Qualifiers source 1..185000 /note="locus_tag=Nbl1" /organism="Mus musculus" gene 142646..153328 /note="locus_tag=Nbl1" /gene="ENSMUSG00000041120" /note="neuroblastoma, suppression of tumorigenicity 1 [Source:MGI;Acc:MGI:104591]" 5'UTR 142646..150000 /note="Nbl1" mRNA join(142646..142794,149973..150167,150269..150380, 152019..153328) /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" CDS join(150001..150167,150269..150380,152019..152276) /db_xref="CCDS:CCDS18839.1" /db_xref="MGI:Nbl1" /db_xref="Vega_mouse_transcript:OTTMUST00000022949" /protein_id="ENSMUSP00000045608" /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" misc_feature 150001..152276 /note="deletion" 3'UTR 152277..153328 /gene="Nbl1" ORIGIN - 1 GACCAGAGCC ACTCGCTAGG AGTCACACCG AGCCTGGGGG TCCGAAGGGA ACAGCATCAA He is the code: # file: embl2picture.pl # This is code example 6 in the Graphics-HOWTO # Author: Lincoln Stein use strict; #use lib "$ENV{HOME}/projects/bioperl-live"; use Bio::Graphics; use Bio::SeqIO; use constant USAGE =>< Render a GenBank/EMBL entry into drawable form. Return as a GIF or PNG image on standard output. File must be in embl, genbank, or another SeqIO- recognized format. Only the first entry will be rendered. Example to try: embl2picture.pl factor7.embl | display - END my $file = shift or die USAGE; my $io = Bio::SeqIO->new(-file=>$file) or die USAGE; my $seq = $io->next_seq or die USAGE; my $wholeseq = Bio::SeqFeature::Generic->new( -start => 1, -end => $seq->length, -display_name => $seq->display_name ); # script reads the features from the sequence object by calling all_SeqFeatures() my @features = $seq->all_SeqFeatures; # sorts each feature by its primary tag into a hash # of array references named %sorted_features my %sorted_features; my %want = map {$_ =>1} qw/source CDS gene utr5prime utr3prime mRNA misc_feature/; for my $f (@features) { #get cds, primer_bind, and genes features only my $tag = $f->primary_tag; # create a hash of $f keys and $tag values #push @{$sorted_features{$tag}},$f if ($tag =~ /CDS|gene|mRNA|source|misc_feature|5'UTR|3'UTR/); push @{$sorted_features{$tag}},$f if ($want{$tag}); } # we create the Bio::Graphics::Panel object. # As in previous examples, we specify the width of the image, # as well as some extra white space to pad out the left and right borders. my $panel = Bio::Graphics::Panel->new( -length => $seq->length, -key_style => 'between', -width => 400, -pad_left => 10, -pad_right => 10, ); # We now add two tracks, one for the scale # and the other for the sequence as a whole. $panel->add_track($wholeseq, -glyph => 'arrow', -bump => 0, -double => 1, -tick => 2, -bgcolor => 'blue', -label => 1, ); =cut $panel->add_track($wholeseq, -glyph => 'generic', -bgcolor => 'blue', -label => 1, ); =cut # Locate primary tag of "CDS" and create a track using a glyph # at creation time. After we handle this special case, we remove # the CDS feature type from the %sorted_features associative array. if ($sorted_features{CDS}) { $panel->add_track($sorted_features{CDS}, -glyph => 'transcript2', -bgcolor => 'orange', -fgcolor => 'black', -font2color => 'red', -key => 'CDS', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'CDS'}; } # Locate primary tag of "mRNA" and create a track using a glyph # at creation time. After we handle this special case, we remove # the mRNA feature type from the %sorted_features associative array. if ($sorted_features{mRNA}) { $panel->add_track($sorted_features{mRNA}, -glyph => 'transcript2', -bgcolor => 'red', -fgcolor => 'black', -font2color => 'red', -key => 'mRNA', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'mRNA'}; } #=cut # Locate primary tag of "5'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 5'UTR feature type from the %sorted_features associative array. if ($sorted_features{utr5prime}) { $panel->add_track($sorted_features{utr5prime}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => 'utr5prime', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{utr5prime}; } =cut # Locate primary tag of "3'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 3'UTR feature type from the %sorted_features associative array. if ($sorted_features{3\'UTR}) { $panel->add_track($sorted_features{'3\'UTR'}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => '3\'UTR', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'3\'UTR'}; } =cut # general case # Create a track for each feature type. In order to distinguish the tracks by color, # we initialize an array of 9 color names and simply cycle through them my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua); my $idx = 0; for my $tag (sort keys %sorted_features) { my $features = $sorted_features{$tag}; $panel->add_track($features, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'red', -key => "${tag}s", -bump => +1, -height => 8, # -description option to point to a subroutine # that will generate more informative description strings. -description => \&generic_description, ); } binmode(STDOUT); print $panel->png; exit 0; sub gene_label { my $feature = shift; my @notes; foreach (qw(product gene)) { @notes = eval {$feature->get_tag_values($_)}; last; } $notes[0]; } sub gene_description { my $feature = shift; my @notes; foreach (qw(note)) { # Notice that we place calls to get_tag_values() inside eval{} blocks # in order to avoid having an exception raised if the feature does not # have a tag with the desired value. @notes = eval{$feature->get_tag_values($_)}; last; } return unless @notes; substr($notes[0],30) = '...' if length $notes[0] > 30; $notes[0]; } sub generic_description { my $feature = shift; my $description; foreach ($feature->get_all_tags) { my @values = $feature->get_tag_values($_); $description .= $_ eq 'note' ? "@values" : "$_=@values; "; } $description =~ s/; $//; # get rid of last $description; } sub fp_utr{ my $five_prime_utr = '5\'UTR'; return $five_prime_utr; } This is how the image currently looks: Any ideas why I am unable to render the 5' & 3' UTR features? From jorvis at gmail.com Thu Aug 27 15:23:05 2009 From: jorvis at gmail.com (Joshua Orvis) Date: Thu, 27 Aug 2009 15:23:05 -0400 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: I should weigh in here since I am the above-mentioned 'user' who posed the question in #bioperl. To clarify, to train one particular gene finder I need to take a full genbank file with annotation for a whole genome and create separate gbk records, one for each gene. Each record will then contain the gene, exon coordinates for the CDS and sequence for the gene. I can iterate through the features of the full record and do the math myself for each spliced coordinate, making/writing individual records as I go, but thought I would see if BioPerl had any mechanism to extract a region of an annotated record and treat the starting base of that extraction as position 1, recoordinating all the other features that were present. Then I could just iterate through the features of the whole entry, extracting regions for each gene as I see them. Hopefully this makes sense. Joshua On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > > Yeah one thought that we batted around at a hackathon many moons ago had > been to use Bio::DB::SeqFeature in a lightweight way under the hood to > represent sequences in layers more rather than the arbitrary data model that > is setup by focusing on handling GenBank records. A lot of the architecture > development (that is like 10-15 years old now!) was initially just focused > on round-tripping the sequence files. We more recently felt like a new model > was more appropriate. With the fast SQLite implementation that Lincoln has > put in for DB::SeqFeature we could in theory map every sequence into a > SQLite DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is respected > AFAIK and it prevents needing to store whole sequences in memory. The > SeqIO->DB::SeqFeature loading would need some finessing so that as parsed > the sequence object could be updated efficiently. > > Actually this might also help reduce the number of objects needed to be > created by basically efficiently serializing sequences into the DB on > parsing (and with some simple caching this could make for pretty fast > system). Since disk is basically not a limitation now could be an > interesting experiment? Maybe it is too out there, but if not it could be > something major enough that it has to go in a bioperl-2/bioperl-ng. It > sort of assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some people has > been the seq file format => GFF in order to load it into a SeqFeature DB for > Gbrowse... So I don't know what are the boundary cases here. Certainly for > FASTA it should be straightforward. > > -jason > > On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > > It's not implemented completely. As Jason mentioned in the bug report, it >> was meant to be part of an overall system to truncate sequences with >> remapped features, but the implementation in place is substandard. It's >> open for implementation if anyone wants to take it up. >> >> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >> with this in a more elegant and lightweight way, and is probably the >> direction I would take. YMMV. >> >> chris >> >> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >> >> Looks like bug 1572 is related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>> >>> Rob >>> >>> Robert Buels wrote: >>> >>>> Hi all, >>>> Recently a user came into #bioperl looking to truncate an annotated >>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>> annotations from the original sequence be remapped onto the new truncated >>>> sequence. >>>> Poking through code, I came across an undocumented function trunc() that >>>> from the comments looks like it was written by Jason as part of a master >>>> plan to implement this very functionality. >>>> Just wondering, what's the status of that? >>>> Rob >>>> >>> >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Aug 27 16:00:24 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 13:00:24 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: So when I did this for the retraining of AUGUSTUS I loaded all my gene models in Bio::DB::GFF as GFF3 and then just extracted each locus I needed +/- some surrounding sequence context and wrote it out as genbank file. There might have been one or two problems collapsing the features back into Genbank's concept of a CDS as a single-feature rather than individual, but I just make a split-location and added the sub-pieces to it. It was only a few lines of code to do it right - the flatten/unflatten being one of the most annoying parts maybe we could work out to streamline. -jason On Aug 27, 2009, at 12:23 PM, Joshua Orvis wrote: > I should weigh in here since I am the above-mentioned 'user' who > posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate > gbk > records, one for each gene. Each record will then contain the gene, > exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the > math myself > for each spliced coordinate, making/writing individual records as I > go, but > thought I would see if BioPerl had any mechanism to extract a region > of an > annotated record and treat the starting base of that extraction as > position > 1, recoordinating all the other features that were present. Then I > could > just iterate through the features of the whole entry, extracting > regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich > wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons >> ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood >> to >> represent sequences in layers more rather than the arbitrary data >> model that >> is setup by focusing on handling GenBank records. A lot of the >> architecture >> development (that is like 10-15 years old now!) was initially just >> focused >> on round-tripping the sequence files. We more recently felt like a >> new model >> was more appropriate. With the fast SQLite implementation that >> Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence >> into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is >> respected >> AFAIK and it prevents needing to store whole sequences in memory. >> The >> SeqIO->DB::SeqFeature loading would need some finessing so that as >> parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed >> to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). Since disk is basically not a limitation now could be an >> interesting experiment? Maybe it is too out there, but if not it >> could be >> something major enough that it has to go in a bioperl-2/bioperl- >> ng. It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate >> for all >> the messiness of sequence data formats and one problem for some >> people has >> been the seq file format => GFF in order to load it into a >> SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. >> Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> It's not implemented completely. As Jason mentioned in the bug >> report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is >>> substandard. It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature >>> deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an >>>>> annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and >>>>> have the >>>>> annotations from the original sequence be remapped onto the new >>>>> truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function >>>>> trunc() that >>>>> from the comments looks like it was written by Jason as part of >>>>> a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Thu Aug 27 16:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 15:19:56 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: On Aug 27, 2009, at 1:41 PM, Jason Stajich wrote: > Yeah one thought that we batted around at a hackathon many moons ago > had been to use Bio::DB::SeqFeature in a lightweight way under the > hood to represent sequences in layers more rather than the arbitrary > data model that is setup by focusing on handling GenBank records. A > lot of the architecture development (that is like 10-15 years old > now!) was initially just focused on round-tripping the sequence > files. We more recently felt like a new model was more appropriate. > With the fast SQLite implementation that Lincoln has put in for > DB::SeqFeature we could in theory map every sequence into a SQLite > DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is > respected AFAIK and it prevents needing to store whole sequences in > memory. The SeqIO->DB::SeqFeature loading would need some finessing > so that as parsed the sequence object could be updated efficiently. Exactly my thought. Probably worth pushing the FeatureHolderI interface into something like a SeqFeature::Collection. What about annotation? Maybe add that to the 'source' feature? Also makes me think Seq needs to be RangeI (or potentially locatable to another sequence). Bio::DB::SF::Segment is. I'm thinking the old way of doing it (parsing a file) is still possible, but underneath would be an Bio::Index or similar, and the returned Bio::Seq would have a backend Bio::Index/ Bio::SeqFeature::Collection database (the latter maybe being lazily implemented). > Actually this might also help reduce the number of objects needed to > be created by basically efficiently serializing sequences into the > DB on parsing (and with some simple caching this could make for > pretty fast system). Since disk is basically not a limitation now > could be an interesting experiment? Yes. > Maybe it is too out there, but if not it could be something major > enough that it has to go in a bioperl-2/bioperl-ng. It sort of > assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some > people has been the seq file format => GFF in order to load it into > a SeqFeature DB for Gbrowse... So I don't know what are the boundary > cases here. Certainly for FASTA it should be straightforward. > > -jason Well, one could possibly test something like this on a branch, or with their own Bio::Seq, or in Biome ;> Just sayin'.... chris From maj at fortinbras.us Thu Aug 27 20:58:34 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 27 Aug 2009 20:58:34 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Message-ID: <4C2E185C74CF449495BC8FDC26419702@NewLife> Thanks Brian; these are really valuable insights and suggestions. Of course, the "todo list" is not "mine", but the community's (otherwise, I would have used Post-its), and I have added your action items to it. My thinking about a survey is twofold. Intermittent users may, likely will, have different issues than the usual suspects here on the list, or they will put those issues in a different way--likely with more expression of affect, which I personally think is key. It seems to me that documentation is the public face of this project, and hearing visceral reactions from "the public" will help us (or me) prioritize. The other fold is, this kind of data is better acquired a) actively, rather than passively ("Please respond to this thread") and b) anonymously. Obviously, it can't be active in the sense of spamming, but we could reduce the energy barrier by providing something clickable with a few textboxes to the list. cheers MAJ ----- Original Message ----- From: Brian Osborne To: Mark A. Jensen Cc: BioPerl List ; Chris Fields Sent: Thursday, August 27, 2009 11:10 AM Subject: Re: [Bioperl-l] on BP documentation Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: ----- Original Message ----- From: "Hilmar Lapp" ... As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Aug 27 22:00:01 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 22:00:01 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <4C2E185C74CF449495BC8FDC26419702@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> <4C2E185C74CF449495BC8FDC26419702@NewLife> Message-ID: <047387CF-C3AD-4E2E-8FB8-091AB23D5FEE@verizon.net> Mark, As you wish. As I said, the one who does the work calls the shots, this is not a democracy. The fundamental problem is, and I speak with some experience here, that detailed examination of documentation is of so little interest that participation in the survey will be limited ("the usual suspects"), and the results will be skewed. You're not going to get reactions from "the public", the thousands of Bioperl users. But, if you feel comfortable with the notion that a survey will justify your actions, do it. But honestly, I know that you already know what to do. Brian O. On Aug 27, 2009, at 8:58 PM, Mark A. Jensen wrote: > My thinking about a survey is twofold. Intermittent users may, > likely will, have different issues than the usual suspects here on > the list, or they will put those issues in a different way--likely > with more expression of affect, which I personally think is key. It > seems to me that documentation is the public face of this project, > and hearing visceral reactions from "the public" will help us (or > me) prioritize. The other fold is, this kind of data is better > acquired a) actively, rather than passively ("Please respond to this > thread") and b) anonymously. Obviously, it can't be active in the > sense of spamming, but we could reduce the energy barrier by > providing something clickable with a few textboxes to the list. From David.Messina at sbc.su.se Fri Aug 28 04:40:47 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 Aug 2009 10:40:47 +0200 Subject: [Bioperl-l] on BP documentation Message-ID: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> > - Use Dobfuscator links to reveal method documentation > -- Most notably in SeqIO HOWTO Do you mean to click on a method name in a HOWTO and open up the Deobfuscator view of that method's documentation? I like that. > -- Does Deobfuscator have a bug or two that need to be fixed? I use > it, it seems to work but I've heard a rumor... It's true -- sometimes the Deobfuscator claims that a method isn't documented when it is. Mark, I can commit to fixing this. It's long overdue, so I'm happy to use your doc push as an impetus. Dave From maj at fortinbras.us Fri Aug 28 07:31:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 28 Aug 2009 07:31:05 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> References: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> Message-ID: Dave-- thanks for stepping up- MAJ ----- Original Message ----- From: "Dave Messina" To: "Brian Osborne" Cc: "Mark A. Jensen" ; "BioPerl List" ; "Chris Fields" Sent: Friday, August 28, 2009 4:40 AM Subject: Re: [Bioperl-l] on BP documentation > >> - Use Dobfuscator links to reveal method documentation >> -- Most notably in SeqIO HOWTO > > Do you mean to click on a method name in a HOWTO and open up the Deobfuscator > view of that method's documentation? I like that. > > >> -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it >> seems to work but I've heard a rumor... > > It's true -- sometimes the Deobfuscator claims that a method isn't documented > when it is. > > Mark, I can commit to fixing this. It's long overdue, so I'm happy to use > your doc push as an impetus. > > > Dave > > > From fgarret at ub.edu Fri Aug 28 12:37:54 2009 From: fgarret at ub.edu (Filipe Garrett) Date: Fri, 28 Aug 2009 18:37:54 +0200 Subject: [Bioperl-l] splice alignment Message-ID: <4A9807E2.4080608@ub.edu> Hi all, I need to analyse the 1st, 2nd and 3rd positions of an alignment separately. I've been through BioPerl pages but couldn't find no direct way to do it. The closest I fond was "slice" (AlignI) but it just extracts a contiguous subsequence. Is there any subroutine that does the job? Or maybe a more generic one, so we can select the columns to be extracted; eg: @aln_pos = qw/1,4,7,10,13,14,17,20/; $aln_1 = $aln->get_pos(@aln_pos); thanks in adv, FG -- Filipe G. Vieira Departament de Genetica Universitat de Barcelona Av. Diagonal, 645 08028 Barcelona SPAIN Phone: +34 934 035 306 Fax: +34 934 034 420 fgarret at ub.edu http://www.ub.edu/molevol/ From mmorley at mail.med.upenn.edu Fri Aug 28 17:18:28 2009 From: mmorley at mail.med.upenn.edu (Michael Morley) Date: Fri, 28 Aug 2009 17:18:28 -0400 Subject: [Bioperl-l] How to plot coverage using Bio::DB::Sam and Bio::Graphics? Message-ID: <4A9849A4.7060702@mail.med.upenn.edu> Have a few questions some perhaps too simple which I know I should have been able to find the answers but have eluded me. Problem: What I want to do visualize coverage (Illumina RNA-seq) across a gene for 40 or so samples. I thought about gbrowse but what I was hoping to was to use Bio::Graphics and created a few PNGs of the genes I'm interested in, nothing too fancy. My current attempt: So I've used Bio::DB::Sam (thank you LDS!!,great package) as following.. Works perfect. my $features = $sam->features(-type=>'coverage',-seq_id=>$chrom,-start=>$genomest,-end=>$genomest); Then I tried this: $panel->add_track($features, -glyph => 'xyplot', -graph_type=>'histogram', ); After poking at the return of '-type=converge', I don't think this is possible directly but any ideas how I can do it? The coverage is too deep in the region to plot every sequence in the alignment, I was able to do it just was not useful. One last question.. I also would like to plot the gene model as well. If I simply grab the genbank file for refseq NM###, the features only have exon,cds,etc and coordinates based off the mRNA seq. So how does one get the genomic info and then create the track for a gene/transcript as you would see in gbrowse? Any help I'd greatly appreciate it! -Michael From roy.chaudhuri at gmail.com Sat Aug 29 09:22:53 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Sat, 29 Aug 2009 23:22:53 +1000 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: <1372eece0908290622mc21f297w503225242d82ada9@mail.gmail.com> Hi Joshua, A couple of years ago I did implement (in a fairly hacky way) a trunc_with_features method that does exactly this. It was incorporated into Bio::SeqUtils and is still there as far as I know. Maybe it would be suitable for your purposes? Roy. 2009/8/28 Joshua Orvis : > I should weigh in here since I am the above-mentioned 'user' who posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate gbk > records, one for each gene. ?Each record will then contain the gene, exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the math myself > for each spliced coordinate, making/writing individual records as I go, but > thought I would see if BioPerl had any mechanism to extract a region of an > annotated record and treat the starting base of that extraction as position > 1, recoordinating all the other features that were present. ?Then I could > just iterate through the features of the whole entry, extracting regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood to >> represent sequences in layers more rather than the arbitrary data model that >> is setup by focusing on handling GenBank records. ?A lot of the architecture >> development (that is like 10-15 years old now!) was initially just focused >> on round-tripping the sequence files. We more recently felt like a new model >> was more appropriate. ?With the fast SQLite implementation that Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is respected >> AFAIK and it prevents needing to store whole sequences in memory. ?The >> SeqIO->DB::SeqFeature loading would need some finessing so that as parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). ?Since disk is basically not a limitation now could be an >> interesting experiment? ?Maybe it is too out there, but if not it could be >> something major enough that it has to go in a bioperl-2/bioperl-ng. ? It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate for all >> the messiness of sequence data formats and one problem for some people has >> been the seq file format => GFF in order to load it into a SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. ?Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> ?It's not implemented completely. ?As Jason mentioned in the bug report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is substandard. ?It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. ?YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> ?Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>>> annotations from the original sequence be remapped onto the new truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function trunc() that >>>>> from the comments looks like it was written by Jason as part of a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY ?14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adlai at refenestration.com Sun Aug 30 12:16:41 2009 From: adlai at refenestration.com (adlai burman) Date: Sun, 30 Aug 2009 18:16:41 +0200 Subject: [Bioperl-l] Install on host server Message-ID: Hey there, I have an embarrassingly silly question. I have BioPerl set up and working on my computer. Does anyone here know if there is a standard way to ask one's hosting server to install BioPerl so you can use it within a web page? Barring that, is there a standard way to set it up for your own domain on a hosting server that knows nothing about BioPerl? Thanks, Adlai From ymc at yahoo.com Mon Aug 31 02:10:10 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 30 Aug 2009 23:10:10 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? Message-ID: <472878.20951.qm@web30402.mail.mud.yahoo.com> Hi Chris I added a check for LocatableSeq in dpAlign.pm. It will now create an Bio::Seq object internally to copy the sequence in LocatableSeq but taking out all the gaps. This should make it behave properly. I commited the updated Bio/Tools/dpAlign.pm to SVN. In dpAlign.pm, I also added a note saying what will happen if you supplied LocatableSeq to the functions in this module. With regard to that warning, I think the person who reported the bug misused the instantiator of LocatableSeq. He/she can't use the length of the sequence with gaps as the "end". The "end" should be the length without gaps. Let me know if you have any questions or concerns. Have a great day! Yee Man --- On Wed, 8/19/09, Yee Man Chan wrote: > From: Yee Man Chan > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 8:01 PM > I noticed that the $qalseq is a > LocatableSeq with gaps. I don't think my program was written > to support LocatableSeq with gaps. If I removed the gaps, > then I would have the scores agree with each other which > should be the desired outcome. > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9986984 residue count gives end value > 104. > Overriding value [101] with value 104 for > Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 291 > Getting score for ABC|9986984 -> ABC|9944760 > = 291 > > Do you think I should check for this LocatableSeq type and > give an error or should I remove the gaps if this is a > LocatableSeq? > > Yee Man > > > --- On Wed, 8/19/09, Chris Fields > wrote: > > > From: Chris Fields > > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for > CPAN, was Re:? Problems with Bioperl-ext package on > WinVista? > > To: "Yee Man Chan" > > Cc: "Robert Buels" , > "BioPerl List" > > Date: Wednesday, August 19, 2009, 7:49 AM > > I'll have a look.? It's probably > > something that hasn't been updated to deal with > > LocatableSeq's pathological end point checking. > > > > chris > > > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > > > > I tried that sample script that reportedly caused > the > > dpAlign "bug" but I can't reproduced it. All I get is > a > > warning from LocatableSeq. > > > ------------------------------------------- > > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > > "-Iblib/lib" "-Iblib/arch" > > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > > > --------------------- WARNING > --------------------- > > > MSG: In sequence ABC|9944760 residue count gives > end > > value 101. > > > Overriding value [104] with value 101 for > > Bio::LocatableSeq::end(). > > > > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > > > --------------------------------------------------- > > > Getting score for ABC|9944760 -> ABC|9986984 > > > = 300 > > > Getting score for ABC|9986984 -> ABC|9944760 > > > = 303 > > > ------------------------------------------ > > > > > > Does the test script crash in your machine? > > > > > > Yee Man > > > > > > --- On Tue, 8/18/09, Chris Fields > > wrote: > > > > > >> From: Chris Fields > > >> Subject: Re: Packaging Bio::Ext::HMM for > CPAN, was > > Re: [Bioperl-l] Problems with Bioperl-ext package on > > WinVista? > > >> To: "Robert Buels" > > >> Cc: "Yee Man Chan" , > > "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM > > >> On Aug 18, 2009, at 11:37 PM, Robert > > >> Buels wrote: > > >> > > >>> Yee Man Chan wrote: > > >>>> Is it going to be an arrangement > similar > > to > > >> bioconductor? If so, I suppose then it makes > > sense. But you > > >> might want to develop scripts to > automatically > > download and > > >> install new modules to make it user > friendly. > > >>> Yes, we are probably going to make a > > Task::BioPerl or > > >> something similar. > > >>> > > >>>> What do you mean by Bio-Ext is going > away? > > I > > >> notice quite many people using dpAlign. So > if > > Bio-Ext is > > >> going away, then at least dpAlign should > become > > another spin > > >> off. > > >>> By going away, I meant that everything > in > > there is > > >> going to be spinned off.? Except modules > that > > are no > > >> longer maintainable, if there are any in > there. > > >>> > > >>> Rob > > >> > > >> dpAlign could become another spinoff, yes, if > it's > > used > > >> (and works fine).? The problematic code > dealt > > with pSW, > > >> alignment statistics, and staden io_lib > support > > (the latter > > >> which is fairly bit rotted now): > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > >> > > >> dpAlign has it's own bug: > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > >> > > >> chris > > >> > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > From tuco at pasteur.fr Mon Aug 31 10:13:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Mon, 31 Aug 2009 16:13:41 +0200 Subject: [Bioperl-l] Can't add track to Panel Bio::Graphics Message-ID: <4A9BDA95.2020109@pasteur.fr> Hi, I'm trying to create png image using Bio::Graphics. I followed the Howto available at bioperl.org. I'm stacked when trying to add new track to my panel. So far, I can create the panel, add 2 tracks, then, probably mistaking, I can add more tracks to my panel. Here is the code. my $panel = Bio::Graphics::Panel->new( -length => $self->seq()->length(), -width => 800, -pad_top => 5, -pad_bottom => 5, -pad_left => 5, -pad_right => 5, #-key_style => 'between', ); my $bsg = Bio::SeqFeature::Generic->new( -start => 1, -seq => $self->seq()->seq(), -end => $self->seq()->length(), -display_name => $self->seq()->id(). " (".$self->seq->length()." na)", ); $bsg->attach_seq($self->seq()); #Display the reference sequence ############ #### Those 2 tracks are well displayed on the final image ########### $panel->add_track($bsg, -glyph => 'dna', -label => 1); $panel->add_track($bsg, -glyph => 'arrow', -tick => 2, -fgcolor => 'black'); #Build, if present, the single cut if(keys %$spositions){ #Create the specail track for the single cut my $strack = $panel->add_track( -glyph => 'crossbox', -label => 1, -fgcolor => 'red', -key => 'Single cut', -connector => 'dashed', ); foreach my $enz (sort { $a cmp $b } keys %{$spositions->{$strand}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->start(), -end => $spositions->{$strand}->{$enz}->{$enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->end(), -end => $spositions->{$strand}->{$enz}->{$enz}->end()); $strack->add_feature($bsfg); $strack->add_feature($bsfg2); } } #Build, if present, the double cut if(keys %$dpositions){ my $dtrack = $panel->add_track( -glyph => 'crossbox', -label => 1, -key => 'Double cut', -connector => 'dashed', ); foreach my $couple (sort { $a cmp $b } keys %{$dpositions->{$strand}}){ foreach my $cc_enz (sort { $a cmp $b } keys %{$dpositions->{$strand}->{$couple}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $couple, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->start(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $cc_enz, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->end(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->end()); $dtrack->add_feature($bsfg); $dtrack->add_feature($bsfg2); } } } print $panel->png(); Can somebody tell me what I'm missing or doing wrong? Thanks for any help Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From marcelo011982 at gmail.com Mon Aug 31 14:12:58 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Mon, 31 Aug 2009 15:12:58 -0300 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> Message-ID: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> done: #!/usr/bin/perl -w use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'Rpp2Blast.txt'); ... while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION #---------------------------------------------- my $accGB = $hit->description(); $accGB =~ m/(gb=.*?\s)/; #---------------------------------------------- print MYFILE ... $1,"\t" , #numero de acesso ao genbank ... $hsp->hit->end, "\t","\n"; ... } } } On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata wrote: > hi all.. > I was doing a script that take some information of the results of blastn > files. > Everythig was ok, but i have some dificult to pic the Genbank code number > (the 'gb' below). > I tried > > $obj->each_accession_number > $hit->name > > And some variation of this. > > > > ------------------------------ > >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h > segment 1 gmrtDrNS01 > Glycine max cDNA 3', mRNA sequence /clone_end=3' > /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 > Length = 853 > > Score = 1336 bits (674), Expect = 0.0 > Identities = 793/832 (95%), Gaps = 8/832 (0%) > Strand = Plus / Minus > > > Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt > 294917 > |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| > Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc > 794 > ---------------------------------------- > > > But, i still don't get it. > > thank you > with regards > Miwata > From jason at bioperl.org Mon Aug 31 15:49:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 31 Aug 2009 12:49:08 -0700 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> Message-ID: <4DBC8ED9-6D98-414A-A361-3FAB3EEE955C@bioperl.org> if you run blastall with -I T (show GI's in defline) you will also be able to get the genbank identifier out with $hit->ncbi_gi through some automagic parsing of the ID line -jason On Aug 31, 2009, at 11:12 AM, Marcelo Iwata wrote: > done: > > #!/usr/bin/perl -w > use strict; > use Bio::SearchIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => 'Rpp2Blast.txt'); > ... > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION > #---------------------------------------------- > my $accGB = $hit->description(); > $accGB =~ m/(gb=.*?\s)/; > #---------------------------------------------- > > > print MYFILE > ... > > $1,"\t" , #numero de acesso ao genbank > ... > $hsp->hit->end, "\t","\n"; > ... > > } > } > } > > > > On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata >wrote: > >> hi all.. >> I was doing a script that take some information of the results of >> blastn >> files. >> Everythig was ok, but i have some dificult to pic the Genbank code >> number >> (the 'gb' below). >> I tried >> >> $obj->each_accession_number >> $hit->name >> >> And some variation of this. >> >> >> >> ------------------------------ >>> gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water >>> stressed 5h >> segment 1 gmrtDrNS01 >> Glycine max cDNA 3', mRNA sequence /clone_end=3' >> /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 >> Length = 853 >> >> Score = 1336 bits (674), Expect = 0.0 >> Identities = 793/832 (95%), Gaps = 8/832 (0%) >> Strand = Plus / Minus >> >> >> Query: 294858 >> aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt >> 294917 >> |||||||||||| |||||| ||||||||||||||||| >> |||||||||||||||||||| >> Sbjct: 853 >> aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc >> 794 >> ---------------------------------------- >> >> >> But, i still don't get it. >> >> thank you >> with regards >> Miwata >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Russell.Smithies at agresearch.co.nz Mon Aug 31 17:43:25 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 1 Sep 2009 09:43:25 +1200 Subject: [Bioperl-l] Mapping of genome with cytoband In-Reply-To: <29549.68962.qm@web94610.mail.in2.yahoo.com> References: <29549.68962.qm@web94610.mail.in2.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB81F183@exchsth.agresearch.co.nz> Have you tried getting the data from UCSC (or the test site: http://genome-test.cse.ucsc.edu ) If you use Galaxy to get the data then convert to gff, it may save a bit of work. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shafeeq rim > Sent: Thursday, 27 August 2009 11:14 p.m. > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Mapping of genome with cytoband > > Hi, > > I need gene , mrna , cds , sts and exon files as per the mapping with > cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and > .gbk files but the genes and other features are not coming across the whole > chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from > .gbk / .gbs files the locations on chromosome 1 genes show only half way on > the ideogram graph. > > Thanks > > > > See the Web's breaking stories, chosen by people like you. Check out > Yahoo! Buzz. http://in.buzz.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Sat Aug 1 02:22:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 31 Jul 2009 21:22:17 -0500 Subject: [Bioperl-l] Bio::Moose is now.... In-Reply-To: <4A730554.704@cornell.edu> References: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> <4A730554.704@cornell.edu> Message-ID: <0FB9B117-185B-4995-A63F-1BA14313DED0@illinois.edu> I think, before any CPAN release, I want to nip the monolith in the bud. Just have Meta/Root and simple interfaces (roles) describing classes in Biome, actual implementations or other additions going into BiomeX::*. The current Biome::Location/Annotation/etc would eventually be moved into their own BiomeX repos. Bundle with Task::Biome (maybe add some automated bundling options). Sound familiar? I'll try to get a ROADMAP up next week. chris On Jul 31, 2009, at 9:53 AM, Robert Buels wrote: > I think this sounds great. GREAT news about the Biome::PrimarySeq > performance. > > Rob > > Chris Fields wrote: >> Biome! This makes the most sense to me; as Mark points out the >> name works as an appropriate acronym (BioPerl with Metaclass >> Extensions), as well as a biome being (per wikipedia): >> "a climatically and geographically defined areas of ecologically >> similar climatic conditions such as communities of plants, animals, >> and soil organisms ... often referred to as ecosystems". >> Seems a fitting name for a open-source project. I'll be moving the >> namespace over to Biome over the next couple of days on github. >> Now I owe Mark some beer... >> Now, for extensions, should I assume this will eventually be >> BioPerl2 (and thus use BioX::*)? Or stick with BiomeX::*? >> chris >> PS: Just a quick benchmark for the current Bio::Moose::PrimarySeq >> implementation (we don't have SeqIO working as of yet, so the >> benchmark script does the heavy lifting): >> http://gist.github.com/158317 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jncline at gmail.com Sat Aug 1 03:24:56 2009 From: jncline at gmail.com (Jonathan Cline) Date: Fri, 31 Jul 2009 22:24:56 -0500 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl Message-ID: I recently mentioned working on Bio::Robotics for Tecan. Vendors being MS-Win specific, the vendor software allows third-party software communication through a named pipe (the literal filename is "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific and this pseudo-pipe is opened with sysopen() ). This is broken under cygwin-perl due to cygwin's method of handling paths -- the sysopen fails. However it works under ActiveState Perl and communication through the named pipe (to the robot hardware) is OK. The standard workaround is usually to use cygwin bash, and force the PATH to use ActiveState perl. (Typical MS Windows incompatibility problem.) The issue is: Perl module libraries for CPAN work under cygwin-perl (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN module use, or "make test", result in a bad list of incompatibility problems. Yet ActiveState Perl is required for communicating to the vendor application (unless there is some workaround to raw filesystem access in cygwin-perl that I haven't found in 2 days of working this). The stand-alone scripts I have work fine to access the named pipe (using ActiveState Perl) since the standalone scripts have no module INC dependencies, no CPAN module test harness, etc etc. This isn't specifically a Bio:: issue, though if anyone has suggestions please email. I could try msys and see if it handles the named-pipe-special-file better, if msys has an msys-perl distribution. -- ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## From maj at fortinbras.us Sat Aug 1 03:50:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 31 Jul 2009 23:50:24 -0400 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: References: Message-ID: Jonathan- I have an utter kludge for this very problem, if I understand it correctly. The kludge works for me a majority of the time. Be warned that this is in no way optimized or clever; there is no warranty expressed or implied... Two scripts are below; one runs the other. Together they convert a makefile generated by ActiveState into one suitable for a cygwin make. When the cygwin make is run after conversion, the installation occurs in the ActiveState locations. A demo session follows (note that 'asperl' is an alias, defined as alias asperl=/cygdrive/c/Perl/bin/perl ) cygwin session: $ wget http://search.cpan.org/CPAN/authors/id/N/NI/NI-S/Devel-Leak-0.03.tar.gz $ tar -xzf Devel-Leak-0.03.tar.gz $ cd Devel-Leak-0.03 $ asperl Makefile.PL $ as2cyg.sh $ make $ make test $ make install This is how I constantly install CPAN modules "by hand" into my ActiveState instance. I really hope this helps. The scripts are below. cheers and good luck- Mark cygwin paths...note these are both in $PATH /usr/local/bin/as2cyg.sh : #!/usr/bin/bash TF=$(uuidgen) conv-ASmake.sh Makefile > $TF mv $TF Makefile #end of as2cyg.sh /usr/local/bin/conv-ASMake.sh : (note this is a sed script) #!/usr/bin/sed -f #converting an ActiveState PERL Makefile to run under cygwin make: s/^DIRFILESEP = ^\\/DIRFILESEP = \// s/^NOOP = rem/NOOP = :/ # -or- NOOP = echo -n # byebye volume s/C:/\/cygdrive\/c/ # sed to convert directory \ to / s/\([\)0-9a-zA-Z.]\)\\\([\(0-9a-zA-Z]\)/\1\/\2/g # convert full perl s/\/usr\/bin\/perl/\/cygdrive\/c\/Perl\/bin\/perl/ # a key conversion for DOC_INSTALL action /^DESTINSTALLVENDORHTMLDIR/ a\ DECYGDESTINSTALLARCHLIB = $(subst /cygdrive/c,c:,$(DESTINSTALLARCHLIB)) # --- MakeMaker tools_other section: # let cygwin do native linux commands /^MAKE/ c\ MAKE = make /^CHMOD/ c\ CHMOD = chmod /^CP/ c\ CP = cp /^MV/ c\ #end of conv-ASMake.sh ----- Original Message ----- From: "Jonathan Cline" To: Cc: Sent: Friday, July 31, 2009 11:24 PM Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl >I recently mentioned working on Bio::Robotics for Tecan. Vendors > being MS-Win specific, the vendor software allows third-party software > communication through a named pipe (the literal filename is > "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific > and this pseudo-pipe is opened with sysopen() ). This is broken under > cygwin-perl due to cygwin's method of handling paths -- the sysopen > fails. However it works under ActiveState Perl and communication > through the named pipe (to the robot hardware) is OK. The standard > workaround is usually to use cygwin bash, and force the PATH to use > ActiveState perl. (Typical MS Windows incompatibility problem.) The > issue is: Perl module libraries for CPAN work under cygwin-perl > (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN > module use, or "make test", result in a bad list of incompatibility > problems. Yet ActiveState Perl is required for communicating to the > vendor application (unless there is some workaround to raw filesystem > access in cygwin-perl that I haven't found in 2 days of working this). > The stand-alone scripts I have work fine to access the named pipe > (using ActiveState Perl) since the standalone scripts have no module > INC dependencies, no CPAN module test harness, etc etc. > > This isn't specifically a Bio:: issue, though if anyone has > suggestions please email. I could try msys and see if it handles the > named-pipe-special-file better, if msys has an msys-perl distribution. > > -- > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Sat Aug 1 04:35:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 1 Aug 2009 00:35:04 -0400 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: References: Message-ID: <99E27D08408340B9B0611751A17DF266@NewLife> Sorry, I cut off the last script. The entire thing follows: /usr/local/bin/conv-ASMake.sh : #!/usr/bin/sed -f #converting an ActiveState PERL Makefile to run under cygwin make: s/^DIRFILESEP = ^\\/DIRFILESEP = \// s/^NOOP = rem/NOOP = :/ # -or- NOOP = echo -n # byebye volume s/C:/\/cygdrive\/c/ # sed to convert directory \ to / s/\([\)0-9a-zA-Z.]\)\\\([\(0-9a-zA-Z]\)/\1\/\2/g # convert full perl s/\/usr\/bin\/perl/\/cygdrive\/c\/Perl\/bin\/perl/ # a key conversion for DOC_INSTALL action /^DESTINSTALLVENDORHTMLDIR/ a\ DECYGDESTINSTALLARCHLIB = $(subst /cygdrive/c,c:,$(DESTINSTALLARCHLIB)) # --- MakeMaker tools_other section: # let cygwin do native linux commands /^MAKE/ c\ MAKE = make /^CHMOD/ c\ CHMOD = chmod /^CP/ c\ CP = cp /^MV/ c\ MV = mv /^NOOP/ c\ NOOP = : /^RM_F/ c\ RM_F = rm -f /^RM_RF/ c\ RM_RF = rm -rf /^TEST_F[^I]/ c\ TEST_F = test -f /^TOUCH/ c\ TOUCH = touch /^TEST_S/ c\ TEST_S = test -s /^DEV_NULL/ c\ DEV_NULL = > /dev/null 2>&1 /^ECHO[^_]/ c\ ECHO = echo /^ECHO_N/ c\ ECHO_N = echo -n # override OS-specific File::Spec /^MOD_INSTALL/ c\ MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e "use File::Spec::Cygwin;@File::Spec::ISA=('File::Spec::Cygwin');" -e "map { s[/cygdrive/c][] } @ARGV;install({@ARGV}, '$(VERBINST)', 0, '$(UNINST)');" -- /^FIXIN/ c\ FIXIN = $(PERLRUN) "-MExtUtils::MY" -e "MY->fixin(shift)" # remove cygwin volume prefix for doc installs /Appending installation info to/ s/DESTIN/DECYGDESTIN/ /perllocal\.pod/ s/DESTIN/DECYGDESTIN/ /NOECHO) \$(MKPATH/ s/DESTIN/DECYGDESTIN/ #end conv-ASMake.sh ----- Original Message ----- From: "Jonathan Cline" To: Cc: Sent: Friday, July 31, 2009 11:24 PM Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl >I recently mentioned working on Bio::Robotics for Tecan. Vendors > being MS-Win specific, the vendor software allows third-party software > communication through a named pipe (the literal filename is > "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific > and this pseudo-pipe is opened with sysopen() ). This is broken under > cygwin-perl due to cygwin's method of handling paths -- the sysopen > fails. However it works under ActiveState Perl and communication > through the named pipe (to the robot hardware) is OK. The standard > workaround is usually to use cygwin bash, and force the PATH to use > ActiveState perl. (Typical MS Windows incompatibility problem.) The > issue is: Perl module libraries for CPAN work under cygwin-perl > (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN > module use, or "make test", result in a bad list of incompatibility > problems. Yet ActiveState Perl is required for communicating to the > vendor application (unless there is some workaround to raw filesystem > access in cygwin-perl that I haven't found in 2 days of working this). > The stand-alone scripts I have work fine to access the named pipe > (using ActiveState Perl) since the standalone scripts have no module > INC dependencies, no CPAN module test harness, etc etc. > > This isn't specifically a Bio:: issue, though if anyone has > suggestions please email. I could try msys and see if it handles the > named-pipe-special-file better, if msys has an msys-perl distribution. > > -- > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jncline at gmail.com Mon Aug 3 03:32:20 2009 From: jncline at gmail.com (Jonathan Cline) Date: Sun, 02 Aug 2009 22:32:20 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> Message-ID: <4A765A44.7030902@gmail.com> Smithies, Russell wrote: > I "acquired" an old Biomek 1000 that I'm thinking of modernising. It was originally controlled by a monstrously large but slow pc (IBM Value Point Model 466DX2 computer with Microsoft Windows* Version 3.1) > My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) and use software like mach3 www.machsupport.com along with G-code to control it. > I come from an engineering background so it seemed like the easy way to me :-) > > Now I just need a bit of free time to get it working... > > --Russell > > > I agree, that's probably the best way to go. It's hard to know what amount of s/w processing was done on the host PC vs. the embedded controller. If you were able to connect directly to the robot hardware with serial port(s) or whatever it's using, it would be tough to find out the comm protocol unless someone has already reverse engineered it (which is doubtful). Also from what I have seen online, attempting to run the old software under virtual machine is unpredictable due to timing differences in the serial port communication. So removal of the old electronics is probably the best bet. If it has one arm, then it's much easier. As for robots with working workstation software, it seems the annoyance factor is that while the scripting languages are powerful (for GUI scripting that is), they are still relatively low level. Bio types with a bit of CS seem to immediately turn to visual basic, labview, or even excel spreadsheets and macros, in order to provide a higher level abstraction for the workstation software. To me, it seems natural that there should be a "protocol compiler" which takes biology protocols as input, and gives robot instructions as output (google "protolexer"). The huge bottleneck of course is that everyone's robotics work tables and equipment are somewhat unique to their needs. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >> Sent: Thursday, 30 July 2009 2:07 p.m. >> To: bioperl-l at lists.open-bio.org >> Cc: Jonathan Cline >> Subject: [Bioperl-l] Bio::Robotics namespace discussion >> >> I am writing a module for communication with biology robotics, as >> discussed recently on #bioperl, and I invite your comments. >> >> Currently this mode talks to a Tecan genesis workstation robot ( >> http://images.google.com/images?q=tecan genesis ). Other vendors are >> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >> 'net with the exception of some visual basic and labview scripts which I >> have found. There are some computational biologists who program for >> robots via high level s/w, but these scripts are not distributed as OSS. >> >> With Tecan, there is a datapipe interface for hardware communication, as >> an added $$ option from the vendor. I haven't checked other vendors to >> see if they likewise have an open communication path for third party >> software. By allowing third-party communication, then naturally the >> next step is to create a socket client-server; especially as the robot >> vendor only support MS Win and using the local machine has typical >> Microsoft issues (like losing real time communication with the hardware >> due to GUI animation, bad operating system stability, no unix except >> cygwin, etc). >> >> >> On Namespace: >> >> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >> s/w modules already called 'robots' (web spider robots, chat bots, www >> automate, etc) so I chose the longer name "robotics" to differentiate >> this module as manipulating real hardware. Bio::Robotics is the >> abstraction for generic robotics and Bio::Robotics::(vendor) is the >> manufacturer-specific implementation. Robot control is made more >> complex due to the very configurable nature of the work table (placement >> of equipment, type of equipment, type of attached arm, etc). The >> abstraction has to be careful not to generalize or assume too much. In >> some cases, the Bio::Robotics modules may expand to arbitrary equipment >> such as thermocyclers, tray holders, imagers, etc - that could be a >> future roadmap plan. >> >> Here is some theoretical example usage below, subject to change. At >> this time I am deciding how much state to keep within the Perl module. >> By keeping state, some robot programming might be simplified (avoiding >> deadlock or tracking tip state). In general I am aiming for a more >> "protocol friendly" method implementation. >> >> >> To use this software with locally-connected robotics hardware: >> >> use Bio::Robotics; >> >> my $tecan = Bio::Robotics->new("Tecan") || die; >> $tecan->attach() || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack1"); >> $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray", to >> => "DNATray"); >> ... >> >> To use this software with remote robotics hardware over the network: >> >> # On the local machine, run: >> use Bio::Robotics; >> >> my @connected_hardware = Bio::Robotics->query(); >> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >> @connected_hardware\n"; >> $tecan->attach() || die; >> $tecan->configure("my work table configuration file") || die; >> # Run the server and process commands >> while (1) { >> $error = $tecan->server(passwordplaintext => "0xd290"); >> if ($tecan->lastClientCommand() =~ /^shutdown/) { >> last; >> } >> } >> $tecan->detach(); >> exit(0); >> >> # On the remote machine (the client), run: >> use Bio::Robotics; >> >> my $server = "heavybio.dyndns.org:8080"; >> my $password = "0xd290"; >> my $tecan = Bio::Robotics->new("Tecan"); >> $tecan->connect($server, $mypassword) || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack200"); >> $tecan->pipette(aspirate => "1", dispense => "1", >> from => "sampleTray A1", to => "DNATray A2", >> volume => "45", liquid => "Buffer"); >> $tecan->pipette(drop => "1"); >> ... >> $tecan->disconnect(); >> exit(0); >> >> >> >> -- >> >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From dan.bolser at gmail.com Tue Aug 4 12:03:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:03:00 +0100 Subject: [Bioperl-l] problem with t/LocalDB/SeqFeature.t when host ne localhost In-Reply-To: References: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> Message-ID: <2c8757af0908040503oe2a258dkac4311bb099dc3ac@mail.gmail.com> 2009/7/31 Chris Fields : > Dan, > > Can you file this as a BioPerl bug? ?I'm planning on driving towards > releasing 1.6.1 alpha1 soon (next few weeks) and I would like to get this > one fixed. http://bugzilla.open-bio.org/show_bug.cgi?id=2899 Dan. From dan.bolser at gmail.com Tue Aug 4 12:14:02 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:14:02 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0908040514w198085cfgf4a1adc344095f36@mail.gmail.com> 2009/4/27 Heikki Lehvaslaiho : > Dan, > > Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. > > Test and extend, > > ? ?-Heikki Thanks for help with this. I finally got round to looking at the code (after several others had done the same). I have messed with the code a bit, and added a 'mask_below_threshold' method [1] and some tests to go with it (including some extra tests) [2]. Cheers, Dan. [1] http://bugzilla.open-bio.org/show_bug.cgi?id=2897 [2] http://bugzilla.open-bio.org/show_bug.cgi?id=2898 > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> I'll take your code and put it into bioperl-live rewritten the way I >> suggested and add few tests. >> >> That should get you started, >> >> ? -Heikki >> >> 2009/4/27 Dan Bolser : >>> Hi Heikki, >>> >>> Thanks very much for the advice on how to better implement the clear >>> range method within the Bio::Seq::Quality object. I can understand the >>> logic of what you have written, and it all sounds reasonable. The only >>> problem is that I am very inexperienced with working on object >>> oriented Perl (my 'one man' projects to date have never really >>> required me to think beyond scripts, and its been years since I >>> actually tried to code objects in Perl). >>> >>> To be specific, when you say, "Lets add a method that sets the >>> threshold and stores it internally as $self->_threshold", ignoring any >>> other functionality, what would that method look like? in particular, >>> how would $self->_threshold be implemented? >>> >>> I think once I see that detail, I can go ahead and try to code what >>> you suggested. >>> >>> >>> Similarly (Chris), where would I put the tests / how would they be implemented? >>> >>> >>> Thanks again for the feedback. >>> >>> All the best, >>> Dan. >>> >>> >>> >>> 2009/4/27 Heikki Lehvaslaiho : >>>> Dan, >>>> >>>> It looks like your method does two different things: >>>> >>>> 1. Returns the longest subsequence above the threshold >>>> 2. Analyses the the sequence for the number of ranges the current >>>> threshold creates. >>>> >>>> Why not separate these functions? >>>> >>>> Lets add a method that sets the threshold and stores it internally as >>>> $self->_threshold. Setting it to a new values should trigger emptying >>>> all the caches (see below.) >>>> >>>> Lets have two more public methods: >>>> >>>> 1. get_clean_range() - optional argument 'threshold' >>>> >>>> It returns the longest clean subseq. >>>> >>>> 2. count_clean_ranges() -again optional argument 'threshold' >>>> >>>> This returns the number of ranges detected. >>>> >>>> Both methods call first the public method threshold if the argument >>>> has been given and then an internal method ?_find_clean_ranges(). That >>>> method calculates all the ranges and stores them internally ?(as >>>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>>> that needs to be emptied whenever any of the critical values of the >>>> object changes: threshold, quality or seq. Create an internal method >>>> $self->_clear_cache, that does that. >>>> >>>> Now the quality new object does not get created until you call >>>> get_clean_range() which accesses the cached values (or creates them if >>>> they are not there). >>>> >>>> This design allows you to have no extra penalty for adding more >>>> methods that act on cached values. For example, it might be sensible >>>> thing to do ?at some point to look at all the ranges that are longer >>>> than some length. Then you could write in your program: >>>> >>>> >>>> $qual->threshold(10); >>>> if ($qual->count_clean_ranges = 1) { >>>> ?my $newqual = $qual->get_clean_range() >>>> ?# do your analysis >>>> } elsif ($qual->count_clean_ranges = 0) { >>>> ? # do some reporting and logging >>>> } else { ?# more than one ranges >>>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>>> ? # do some more work and possibly select the best one(s) >>>> } >>>> >>>> >>>> >>>> Yours, >>>> >>>> ? -Heikki >>>> >>>> 2009/4/24 Chris Fields : >>>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>>> possible, tests don't hurt either! >>>>> >>>>> chris >>>>> >>>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>>> >>>>>> Its a bit rough and ready, but it does what I need... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> =head2 get_clear_range >>>>>> >>>>>> Title ? ?: get_clear_range >>>>>> >>>>>> Title ? ?: subqual >>>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>>> Function : Get the clear range using the given quality score as a >>>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>>> >>>>>> Returns ?: a new Bio::Seq::Quality object >>>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>>> >>>>>> =cut >>>>>> >>>>>> sub get_clear_range >>>>>> { >>>>>> ? my $self = shift; >>>>>> ? my $qual = $self->qual; >>>>>> ? my $minQual = shift || 13; >>>>>> >>>>>> ? my (@ranges, $rangeFlag); >>>>>> >>>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Log the range >>>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? ? ? ?else{ >>>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? } >>>>>> ? ## Did we exit the last clear range? >>>>>> ? if(defined($rangeFlag)){ >>>>>> ? ? ? ?my $i = scalar(@$qual); >>>>>> ? ? ? ?## Log the range >>>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? } >>>>>> >>>>>> ? unless(@ranges){ >>>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>>> ? } >>>>>> >>>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>>> >>>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>>> >>>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>>> >>>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>>> ? ? ? ?} >>>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>>> >>>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>>> $_->[1]+1), >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>>> $_->[1]+1) >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>>> ? } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>>> in (apart from all the debugging output that I spit out). >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>>> >>>>>> 2009/4/24 Dan Bolser : >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I couldn't find out how to get the 'clear range' from a >>>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>>> >>>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>>> I take the time to finish that off. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Dan. >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>>> -- >>>> ? ?-Heikki >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +27 (0)714328090 >>>> Sent from Claremont, WC, South Africa >>>> >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From dan.bolser at gmail.com Tue Aug 4 16:32:31 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 17:32:31 +0100 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> Message-ID: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> 2009/7/28 shalabh sharma : > Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to find > overall percentage similarity between them. > How i can do that? Tried using blast? You can download that. Try asking in irc://irc.freenode.net/#bioinformatics Dan. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shameer at ncbs.res.in Tue Aug 4 16:43:40 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 4 Aug 2009 22:13:40 +0530 (IST) Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> Message-ID: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Hello Shalabh, You may try ALISTAT. Available as a part of SQUID library from Prof. Sean Eddy. Make an alignment of your 100 sequences and use alignment as input of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ Best, Khader Shameer > 2009/7/28 shalabh sharma : >> Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to >> find >> overall percentage similarity between them. >> How i can do that? > > Tried using blast? > > You can download that. > > > Try asking in irc://irc.freenode.net/#bioinformatics > > Dan. > > >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 4 17:36:34 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 Aug 2009 13:36:34 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Message-ID: <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> Hi All, thanks a lot. @Khader Shameer, ALISTAT is what i was looking for. But still it gives you the average identity, what i need exactly is the average similarity. Thanks Shalabh Sharma On Tue, Aug 4, 2009 at 12:43 PM, K. Shameer wrote: > Hello Shalabh, > > You may try ALISTAT. Available as a part of SQUID library from Prof. Sean > Eddy. Make an alignment of your 100 sequences and use alignment as input > of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ > > Best, > Khader Shameer > > > 2009/7/28 shalabh sharma : > >> Hi All, I have some protein sequences (around 100) i need to > >> find > >> overall percentage similarity between them. > >> How i can do that? > > > > Tried using blast? > > > > You can download that. > > > > > > Try asking in irc://irc.freenode.net/#bioinformatics > > > > Dan. > > > > > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From shalabh.sharma7 at gmail.com Wed Aug 5 13:31:21 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 Aug 2009 09:31:21 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> Message-ID: <9fcc48c70908050631q1a080b74x12e81985b455332e@mail.gmail.com> Hi, Thanks for the reply. I used clustalW for the MSA. Also i was just wondering that what if i use smith Waterman (EMBOSS' water) and pass the same library as query sequences and reference library, then just parse it and calculate average similarity.Is this right approach? Thanks Shalabh On Wed, Aug 5, 2009 at 3:10 AM, Dan Bolser wrote: > 2009/8/4 shalabh sharma : > > Hi All, thanks a lot. > > @Khader Shameer, ALISTAT is what i was looking for. But still it gives > you > > the average identity, what i need exactly is the average similarity. > > The problem is that identity is well defined. Similarity is more > vague, and at least depends on a particular alignment scoring matrix. > How did you align your sequences? > > Dan. > > >> > Try asking in irc://irc.freenode.net/#bioinformatics > >> > > > ;-) > From michael.watson at bbsrc.ac.uk Wed Aug 5 13:50:35 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 5 Aug 2009 14:50:35 +0100 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank Message-ID: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Hi I want to download GSS sequences using Bio::DB::GenBank. When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. I'm using bioperl 1.5.1. Any clues? Mick From rmb32 at cornell.edu Wed Aug 5 15:28:46 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 05 Aug 2009 08:28:46 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <4A79A52E.7000104@cornell.edu> I think you're looking for the -db => 'nucgss' option. I'll add a better listing of this (undocumented) options to the Bio::DB::Query::GenBank docs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu michael watson (IAH-C) wrote: > Hi > > I want to download GSS sequences using Bio::DB::GenBank. > > When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. > > I'm using bioperl 1.5.1. > > Any clues? > > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Wed Aug 5 16:16:04 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 5 Aug 2009 09:16:04 -0700 Subject: [Bioperl-l] Job opening at Genentech [SSF, CA]. Message-ID: <19065.45124.4999.922147@already.dhcp.gene.com> I have an opening in my group in the Bioinformatics department at Genentech [South San Francisco, CA]. At the moment (for the next year or so) our main focus is rebuilding and extending a system for collecting, processing, and disseminating information about mutations and variations (think web interfaces, relational databases, alignments, workflows/pipelines). In the future we'll pick up projects related to next-gen sequencing (Me too!!! In the future, what isn't related to next-gen?), data integration, and/or lab-specific projects. First and foremost I'm looking for someone who's sharp and who enjoys computers, biology, and technology; someone who gets excited about picking up new tools but who also has a sense of responsibility and restraint. I'm looking for someone who's familiar with several languages and tools; modern Perl complemented with C is my first choice these days, supplemented with R and (when necessary) anything from the rest of the programming language bestiary. There's a fair amount of Java flying around here too so familiarity with it and the JVM world will help. Relational databases are part of the picture: Oracle for the big stuff; SQLite, Postgresql, and MySQL play niche roles. I generally interact with them via ORM's, lately it's been Rose::DB::Object on the Perl side though I've been convinced to take another look at DBIx::Class. Most of my web apps use CGI::Application, as fastcgi's, mod_perl, or simple CGI scripts, but (as with ORM's) I may take another look at Catalyst. I'm looking for someone who's interested in building real software. We'll be putting together a set of tools and data that need to hang together and evolve for at least 4-5 years. Deploy and run won't cut it. Requirements will change, so it's important to me that we build things so they're as modular and flexible as possible. Testing, source control, and documentation matter. A strong candidate will have an understanding of basic bioinformatics concepts and the ability to pick up new biology and computer science concepts as necessary. At the junior end of the spectrum I'd expect a bachelor's degree + 3 years of experience, at the upper end would a masters + 5 years (or a PhD interested in moving towards the production side of the house). I can imagine running through one or more detail oriented interview questions that drilled down (or took of on a tangent) from the following: - What's the difference between Smith-Waterman, blast, sim4, gmap, and/or bowtie alignment algorithms or tools? Which would you use when, and why? - Why is Moose better than Class::Accessor? (yes, it's Perl centered, but it could spin out into any language [e.g. why is Java better than Perl?]). What's a MOP? Who cares? - CVS, subversion, git, mercurial. You've already picked one? Which one? Why? Why not? - XML or JSON or YAML. Pick one for moving data back and forth in an Ajax based interface. Why? Would it also work well in other contexts? - How would you store information about positional features on a genome so that you could get fast random access? How would your solution tie into a larger data context? Genentech's a great place to work: solid salaries, great benefits, Bay Area location (who could ask for more?). We're open source friendly and with the arrival Robert Gentleman (our new Director, of Bioconductor/R fame) likely to become more so. The recent Roche acquisition hasn't changed life much, it seems to mostly be a source of opportunities for those of us in Research. If you know anyone who fits the bill, have them drop me a note. Thanks! g. From hilgert at cshl.edu Wed Aug 5 20:27:28 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Wed, 5 Aug 2009 16:27:28 -0400 Subject: [Bioperl-l] Bio::SeqIO issue Message-ID: Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org From cjfields at illinois.edu Wed Aug 5 21:04:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:04:14 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > Is my impression correct that Bio::SeqIO just assumes that sequences > are > being submitted in FASTA format? No. See: http://www.bioperl.org/wiki/HOWTO:SeqIO SeqIO tries to guess at the format using the file extension, and if one isn't present makes use of Bio::Tools::GuessSeqFormat. It's possible that the extension is causing the problem, or that GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to guessing). In any case, it's always advisable to explicitly indicate the format when possible. Relevant lines: return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/i; ... return 'raw' if /\.(txt)$/i; > In our experience, implementing > Bio::SeqIO led to the first line of files being cut off, regardless of > whether the files were indeed fasta files or files that only contained > sequence. Files that only contain sequence are 'raw'. Ones in FASTA are 'fasta'. > Which, in the latter, led to sequence submissions that had the > first line of nucleotides removed. Has anyone tried to write a fix for > this? This sounds like a bug, but we have very little to go on beyond your description. What version of bioperl are you using, OS, etc? What does your data look like? File extension? chris > Thanks, > > Uwe > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Aug 5 21:03:04 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:03:04 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40624DA61@EX02.asurite.ad.asu.edu> SeqIO is just a base framework for reading/writing of files. If you want it to read a fasta format, then you tell it create it the object. $seqio = Bio::SeqIO->new(-format=>'fasta'); Will tell the program to use Bio::SeqIO::fasta for the object. Look at the docs for the various formats that Bio::SeqIO supports. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hilgert, Uwe Sent: Wednesday, August 05, 2009 1:27 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::SeqIO issue Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 5 21:37:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:37:52 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> Message-ID: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Uwe, Please keep replies on the list. It's very possible that's the issue; IIRC the fasta parser pulls out the full sequence in chunks (based on local $/ = "\n>") and splits the header off as the first line in that chunk. You could probably try leaving the format out and letting SeqIO guess it, or passing the file into Bio::Tools::GuessSeqFormat directly, but it's probably better to go through the files and add a file extension that corresponds to the format. chris On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > Thanks, Chris. The files have no extension, but we indicate what > format > to use, like in the manual: > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > I wonder now whether this could exactly cause the problem: as we are > telling that input files are in fasta format they are being treated as > such (=remove first line) - regardless of whether they really are > fasta? > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > Uwe Hilgert, Ph.D. > Dolan DNA Learning Center > Cold Spring Harbor Laboratory > > C: (516) 857-1693 > V: (516) 367-5185 > E: hilgert at cshl.edu > F: (516) 367-5182 > W: http://www.dnalc.org > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, August 05, 2009 5:04 PM > To: Hilgert, Uwe > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >> Is my impression correct that Bio::SeqIO just assumes that sequences >> are >> being submitted in FASTA format? > > No. See: > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > SeqIO tries to guess at the format using the file extension, and if > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > possible that the extension is causing the problem, or that > GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to > guessing). In any case, it's always advisable to explicitly indicate > the format when possible. > > Relevant lines: > > return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > i; > ... > return 'raw' if /\.(txt)$/i; > >> In our experience, implementing >> Bio::SeqIO led to the first line of files being cut off, regardless >> of >> whether the files were indeed fasta files or files that only >> contained >> sequence. > > Files that only contain sequence are 'raw'. Ones in FASTA are > 'fasta'. > >> Which, in the latter, led to sequence submissions that had the >> first line of nucleotides removed. Has anyone tried to write a fix >> for >> this? > > This sounds like a bug, but we have very little to go on beyond your > description. What version of bioperl are you using, OS, etc? What > does your data look like? File extension? > > chris > >> Thanks, >> >> Uwe >> >> >> >> >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> >> Uwe Hilgert, Ph.D. >> >> Dolan DNA Learning Center >> >> Cold Spring Harbor Laboratory >> >> >> >> V: (516) 367-5185 >> >> E: hilgert at cshl.edu >> >> F: (516) 367-5182 >> >> W: http://www.dnalc.org >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Wed Aug 5 21:45:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:45:03 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <1A4207F8295607498283FE9E93B775B40624DA9B@EX02.asurite.ad.asu.edu> I'm not sure, but I think the module is fasta, not Fasta. So it should be -format=>'fasta', unless you're on a case-insensitive system that is forgiving the capital... Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Wednesday, August 05, 2009 2:38 PM > To: Hilgert, Uwe > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and > splits the > header off as the first line in that chunk. You could probably try > leaving the format out and letting SeqIO guess it, or passing > the file > into Bio::Tools::GuessSeqFormat directly, but it's probably > better to > go through the files and add a file extension that > corresponds to the > format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > > > Thanks, Chris. The files have no extension, but we indicate what > > format > > to use, like in the manual: > > > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > > > I wonder now whether this could exactly cause the problem: as we are > > telling that input files are in fasta format they are being > treated as > > such (=remove first line) - regardless of whether they really are > > fasta? > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > C: (516) 857-1693 > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, August 05, 2009 5:04 PM > > To: Hilgert, Uwe > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > > > >> Is my impression correct that Bio::SeqIO just assumes that > sequences > >> are > >> being submitted in FASTA format? > > > > No. See: > > > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > > SeqIO tries to guess at the format using the file extension, and if > > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > > possible that the extension is causing the problem, or that > > GuessSeqFormat guessing wrong (it's apt to do that, as it's > forced to > > guessing). In any case, it's always advisable to > explicitly indicate > > the format when possible. > > > > Relevant lines: > > > > return 'fasta' if > /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > > i; > > ... > > return 'raw' if /\.(txt)$/i; > > > >> In our experience, implementing > >> Bio::SeqIO led to the first line of files being cut off, > regardless > >> of > >> whether the files were indeed fasta files or files that only > >> contained > >> sequence. > > > > Files that only contain sequence are 'raw'. Ones in FASTA are > > 'fasta'. > > > >> Which, in the latter, led to sequence submissions that had the > >> first line of nucleotides removed. Has anyone tried to > write a fix > >> for > >> this? > > > > This sounds like a bug, but we have very little to go on beyond your > > description. What version of bioperl are you using, OS, etc? What > > does your data look like? File extension? > > > > chris > > > >> Thanks, > >> > >> Uwe > >> > >> > >> > >> > >> > >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >> > >> Uwe Hilgert, Ph.D. > >> > >> Dolan DNA Learning Center > >> > >> Cold Spring Harbor Laboratory > >> > >> > >> > >> V: (516) 367-5185 > >> > >> E: hilgert at cshl.edu > >> > >> F: (516) 367-5182 > >> > >> W: http://www.dnalc.org > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Aug 5 22:53:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Aug 2009 18:53:56 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Aug 5 23:12:52 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 5 Aug 2009 19:12:52 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> If these items were included in a Bugzilla report, that would be most convenient (= most likely to get looked carefully) and is the best place for us to keep track of these kinds of issues-- http://bugzilla.bioperl.org/ cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 05, 2009 6:53 PM Subject: Re: [Bioperl-l] Bio::SeqIO issue >I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >>> guessing). In any case, it's always advisable to explicitly indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 6 04:43:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 23:43:45 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: The SeqIO::fasta parser sets: local $/ = "\n>"; then splits the resulting chunks of data (each corresponding to a full FASTA-formatted sequence) into two pieces: my ($top,$sequence) = split(/\n/,$entry,2); If there is no description line (e.g. the file is all raw sequence data) these lines would result in reading in the whole file, then split out the first line. chris On Aug 5, 2009, at 5:53 PM, Hilmar Lapp wrote: > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show > us your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the > line, or that the line endings in your data file are from a > different OS than the one you're running the script on. (Or that you > are running a very old version of BioPerl, which is entirely > possible if you installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls >> out the full sequence in chunks (based on local $/ = "\n>") and >> splits the header off as the first line in that chunk. You could >> probably try leaving the format out and letting SeqIO guess it, or >> passing the file into Bio::Tools::GuessSeqFormat directly, but it's >> probably better to go through the files and add a file extension >> that corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being >>> treated as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a >>>> fix for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 05:12:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 00:12:13 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> Message-ID: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be > most convenient (= most likely to get looked carefully) > and is the best place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From eigenrosen at gmail.com Thu Aug 6 07:12:24 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 00:12:24 -0700 Subject: [Bioperl-l] Trouble with Clustalw Message-ID: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> I'm a complete bioperl novice, trying to do Clustalw on some fasta files, and am running into trouble: ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 550. Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 551. Can't exec "align": No such file or directory at /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/ Root/Root.pm:328 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 STACK: TestClust:22 ----------------------------------------------------------- Here's my code: #!/usr/bin/perl -w use Bio::Perl; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::Seq; use strict; use warnings; my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); my @seq_array = read_all_sequences($ARGV[0],'fasta'); for (my $i = 0; $i < @seq_array; $i++){ (my $seq = $seq_array[$i]->seq()) =~ s/-//g; $seq_array[$i]->seq($seq); } write_sequence(">test",'fasta', at seq_array); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); my @align_array = $aln->each_seq(); write_sequence(">testfile",'fasta', at align_array); The loop is just there to take out some gaps that were placed in a blast previous to this. The write_sequence call confirms that @seq_array is a valid array of Bio:Seq objects at the time align calls it. Here's some output in "test": >A0220B0939one.1 FV584Q101DEWY9 TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >A0220B0939one.2 FV584Q101A4DG7 TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG ... Thanks, Mike From florian.mittag at uni-tuebingen.de Thu Aug 6 09:38:38 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 11:38:38 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907151500.21947.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> Message-ID: <200908061138.38809.florian.mittag@uni-tuebingen.de> Hi! I just noticed, that we didn't solve this problem completely. On Wednesday, 15. July 2009 15:00, Florian Mittag wrote: > > Well, it is like this with version 9.5 of DB2 Express-C: > > > > SELECT NULL FROM bioentry; > > > > yields: > > SQL0206N "NULL" is not valid in the context where it is used. > > SQLSTATE=42703 SQLCODE=-206 > > > > But if I do: > > > > SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; > > > > [...] > > > > It ran fine without the NULL column, but that isn't necessarily a sign of > > correctness. My problem was that (as stated above) the old version of DB2 > > requires you to cast the NULL value to a data type, which I wasn't able > > to determine from the code. With the new version, it should work, so I'll > > have to rerun my tests again and see if the problem is still there. > > You convinced me that the NULL column is supposed to be there, so I found > another workaround around line 1273 in BaseDriver.pm: > > if((! $attr) || (! $entitymap->{$tbl}) || > $dont_select_attrs->{$tbl .".". $attr}) { > #push(@attrs, "NULL"); > push(@attrs, "cast(NULL as VARCHAR(255))"); > } else { > > Since I don't know how to determine the datatype of the column that is set > to NULL, I simply chose VARCHAR and tested it. And it worked! (BTW: The > column set to NULL is named "rank" in the case below.) Although this solution works, it is not the best, because it breaks compatibility with all other database types, e.g., MySQL. Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" only when the driver is DB2? - Florian From hlapp at gmx.net Thu Aug 6 13:36:08 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:36:08 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: Why is specifying fasta format when your input is not in fasta format not a user error? I agree with the not removing newlines in raw format being a bug. -hilmar On Aug 6, 2009, at 1:12 AM, Chris Fields wrote: > Just to confirm: the following is using bioperl-live on my macbook > pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug > or a user issue (if it's the former, we can easily add an exception > indicating lack of a header). Note that 'raw' also fails for the > raw example below (doesn't appear to remove newlines). > > -c > > cjfields4:fasta cjfields$ cat raw_v_fasta.pl > #!/usr/bin/perl -w > > use strict; > use warnings; > use IO::String; > use Bio::SeqIO; > use Test::More qw(no_plan); > > my %seq; > > $seq{raw} = < MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > RAW > > $seq{fasta} = < >CATH_RAT > MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > FASTA > > my %newdata; > for my $input (sort keys %seq) { > my $fh = IO::String->new($seq{$input}); > my $seq = Bio::SeqIO->new(-format => 'fasta', > -fh => $fh)->next_seq; > $newdata{$input} = $seq->seq; > } > is($newdata{raw}, $newdata{fasta}, 'format'); > > cjfields4:fasta cjfields$ perl raw_v_fasta.pl > not ok 1 - format > # Failed test 'format' > # at raw_v_fasta.pl line 36. > # got: > 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > # expected: > 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > 1..1 > # Looks like you failed 1 test of 1. > > On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > >> If these items were included in a Bugzilla report, that would be >> most convenient (= most likely to get looked carefully) >> and is the best place for us to keep track of these kinds of >> issues-- http://bugzilla.bioperl.org/ >> cheers MAJ >> ----- Original Message ----- From: "Hilmar Lapp" >> To: "Chris Fields" >> Cc: "BioPerl List" >> Sent: Wednesday, August 05, 2009 6:53 PM >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> Uwe - I'd like you to go back to Chris' initial questions that >>> you haven't answered yet: "What version of bioperl are you using, >>> OS, etc? What does your data look like?" I'd add to that, can >>> you show us your full script, or a smaller code snippet that >>> reproduces the problem. >>> I suspect that either something in your script is swallowing the >>> line, or that the line endings in your data file are from a >>> different OS than the one you're running the script on. (Or that >>> you are running a very old version of BioPerl, which is entirely >>> possible if you installed through CPAN.) >>> -hilmar >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out the full sequence in chunks (based on local $/ = "\n>") and >>>> splits the header off as the first line in that chunk. You >>>> could probably try leaving the format out and letting SeqIO >>>> guess it, or passing the file into Bio::Tools::GuessSeqFormat >>>> directly, but it's probably better to go through the files and >>>> add a file extension that corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate >>>>> what format >>>>> to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated as >>>>> such (=remove first line) - regardless of whether they really >>>>> are fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences >>>>>> are >>>>>> being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>>> forced to >>>>> guessing). In any case, it's always advisable to explicitly >>>>> indicate >>>>> the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>>> $/ i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless of >>>>>> whether the files were indeed fasta files or files that only >>>>>> contained >>>>>> sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix for >>>>>> this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Aug 6 13:42:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:42:06 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200908061138.38809.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> Message-ID: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > only when the driver is DB2? Not yet, but that's the solution I had in mind, i.e., introducing a method in the Bio::DB::DBI::* (driver-specific) classes that returns whatever NULL as a SELECT field should be represented as. What will be very hard or nearly impossible to do is to cast to the actual type of the column, so if simply using VARCHAR(255) does the trick for DB2 that'd be great. BTW you did check that simply aliasing the column does not fix the problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will throw an error, right? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florian.mittag at uni-tuebingen.de Thu Aug 6 14:12:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 16:12:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> Message-ID: <200908061612.21852.florian.mittag@uni-tuebingen.de> On Thursday, 6. August 2009 15:42, Hilmar Lapp wrote: > On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > > only when the driver is DB2? > > Not yet, but that's the solution I had in mind, i.e., introducing a > method in the Bio::DB::DBI::* (driver-specific) classes that returns > whatever NULL as a SELECT field should be represented as. Sounds like a good idea! > What will be > very hard or nearly impossible to do is to cast to the actual type of > the column, so if simply using VARCHAR(255) does the trick for DB2 > that'd be great. Surprisingly, it does. At least, I haven't noticed any problems if the target data type is for example an integer. With all the trouble I have with DB2, I didn't expect this. > BTW you did check that simply aliasing the column does not fix the > problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will > throw an error, right? Yepp: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL AS col1, term.ontology_id FROM term WHERE identifier = ? [IBM][CLI Driver][DB2/LINUX] SQL0418N A statement contains a use of an untyped parameter marker or a null value that is not valid. - Florian From hilgert at cshl.edu Thu Aug 6 15:01:05 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:01:05 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: I'm not sure what version we have. Cornel may have installed it a while ago from CVS: Module id = Bio::Root::Build CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm INST_VERSION 1.006900 cpan> m Bio::Root::Version Module id = Bio::Root::Version CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm INST_VERSION 1.006900 cpan> m Bio::SeqIO Module id = Bio::SeqIO CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm INST_VERSION undef Cornel still has the checked-out "bioperl-live" directory and the last changes are from March this year. As per why he used "Fasta" instead of 'fasta" as the format parameter in Bio::SeqIO, it's because that what it says in the modules manual. He now tried 'fasta' instead and see no changes in behavior. Omitting the format parameter altogether, fasta-formatted sequence continues to be treated correctly, the first line being removed. However, raw sequence is being treated differently in that the first line is not being removed any more. Instead, the program returns the first line only. Which, in the example I am going to forward in my next message, will return 60 amino acids out of raw sequence of 300 aa. Can't win with raw sequence... The files may be created on different platforms, we didn't notice any difference between using files created on Windows or Linux. Thanks Uwe -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Wednesday, August 05, 2009 6:54 PM To: Chris Fields Cc: Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hilgert at cshl.edu Thu Aug 6 15:03:53 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:03:53 -0400 Subject: [Bioperl-l] FW: Bio::SeqIO issue Message-ID: If you don't specify any format only the first line gets returned: not ok 1 - format # Failed test 'format' # at test/test_fasta.pl line 35. # got: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. -----Original Message----- From: Hilgert, Uwe Sent: Thursday, August 06, 2009 9:12 AM To: Ghiban, Cornel Subject: FW: [Bioperl-l] Bio::SeqIO issue -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 1:12 AM To: Mark A. Jensen Cc: Hilgert, Uwe; BioPerl List; Hilmar Lapp Subject: Re: [Bioperl-l] Bio::SeqIO issue Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWT FSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCK FNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVG YGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be most > convenient (= most likely to get looked carefully) and is the best > place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From hlapp at gmx.net Thu Aug 6 15:18:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 11:18:06 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while > ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the > format parameter altogether, fasta-formatted sequence continues to be > treated correctly, the first line being removed. However, raw sequence > is being treated differently in that the first line is not being > removed > any more. Instead, the program returns the first line only. Which, in > the example I am going to forward in my next message, will return 60 > amino acids out of raw sequence of 300 aa. Can't win with raw > sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bosborne11 at verizon.net Thu Aug 6 15:20:49 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 11:20:49 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <2F73C3DC-D943-4EC3-834A-EA2984FDDB5D@verizon.net> Uwe et al, Yes, this argument works irrespective of case: The format name is case-insensitive: 'FASTA', 'Fasta' and 'fasta' are all valid. From Bio::SeqIO. Brian O. On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the From cjfields at illinois.edu Thu Aug 6 16:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:30:01 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> On Aug 6, 2009, at 8:36 AM, Hilmar Lapp wrote: > Why is specifying fasta format when your input is not in fast format > not a user error? Agreed. My point is should we worry about adding an exception (which may be a little more user-friendly). Right now the bad stuff happens silently. > I agree with the not removing newlines in raw format being a bug. > > -hilmar Acc. to the SeqIO::raw docs, this is a little trickier. The documented behavior explicitly indicates that each line (sans non- whitespace) is assumed to be a separate sequence, so changing that behavior breaks API. I suppose we can have $/ set locally to a cached $/ default value or undef: # assumes entire file is read in my $io = Bio::SeqIO->new(-format => 'raw', -gulp => 1); chris From hlapp at gmx.net Thu Aug 6 16:42:00 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 12:42:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> Message-ID: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> On Aug 6, 2009, at 12:30 PM, Chris Fields wrote: > Agreed. My point is should we worry about adding an exception > (which may be a little more user-friendly). Right now the bad stuff > happens silently. Great point. We don't want silent failures, do we. > >> I agree with the not removing newlines in raw format being a bug. >> >> -hilmar > > Acc. to the SeqIO::raw docs, this is a little trickier. The > documented behavior explicitly indicates that each line (sans non- > whitespace) is assumed to be a separate sequence, so changing that > behavior breaks API. Ah - true indeed. I like the optional argument feature - that way it's easy for the user to choose. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Thu Aug 6 16:49:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:49:53 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From biopython at maubp.freeserve.co.uk Thu Aug 6 16:51:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Aug 2009 17:51:34 +0100 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> Message-ID: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: > >>> I agree with the not removing newlines in raw format being a bug. >>> >>> ? ? ? ?-hilmar >> >> Acc. to the SeqIO::raw docs, this is a little trickier. ?The documented >> behavior explicitly indicates that each line (sans non-whitespace) is >> assumed to be a separate sequence, so changing that behavior breaks API. > > Ah - true indeed. I like the optional argument feature - that way it's easy > for the user to choose. > For reference, "raw" as a format in EMBOSS seems to give just one sequence regardless of any line breaks. Adding an optional argument might be clearest, but have you considered using the new BioPerl SeqIO variant argument to have two forms of raw (the original variant giving one sequence per line, and a new variant where you just get one sequence regardless of any line breaks)? Peter From cjfields at illinois.edu Thu Aug 6 16:58:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:58:07 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> Message-ID: On Aug 6, 2009, at 11:51 AM, Peter wrote: > On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: >> >>>> I agree with the not removing newlines in raw format being a bug. >>>> >>>> -hilmar >>> >>> Acc. to the SeqIO::raw docs, this is a little trickier. The >>> documented >>> behavior explicitly indicates that each line (sans non-whitespace) >>> is >>> assumed to be a separate sequence, so changing that behavior >>> breaks API. >> >> Ah - true indeed. I like the optional argument feature - that way >> it's easy >> for the user to choose. >> > > For reference, "raw" as a format in EMBOSS seems to give just one > sequence regardless of any line breaks. Yes, and that's the behavior I would expect, actually. > Adding an optional argument might be clearest, but have you considered > using the new BioPerl SeqIO variant argument to have two forms of raw > (the original variant giving one sequence per line, and a new variant > where you just get one sequence regardless of any line breaks)? > > Peter That's a good point. We'd have to keep 'raw' as the prior behavior, but 'raw-complete' could be used for such a circumstance ('raw-gulp' sounds just wrong ;) chris From rmb32 at cornell.edu Thu Aug 6 17:14:12 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 Aug 2009 10:14:12 -0700 Subject: [Bioperl-l] tigrxml parsing Message-ID: <4A7B0F64.9070205@cornell.edu> Hi all, Recently in #bioperl somebody came by trying to use Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz svn HEAD tigrxml.pm was not at all happy with these files, eventually dieing with ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: start is undefined STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 STACK: Bio::RangeI::contains Bio/RangeI.pm:255 STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/Generic.pm:783 STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/Base.pm:266 STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/Expat.pm:225 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/Expat.pm:45 STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm:2631 STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 STACK: /crypt/rob/test2.pl:10 ----------------------------------------------------------- Looking at the medicago XML and comparing it to the bioperl-live/t/data/test.tigrxml, the two look VERY different in structure. Lots of things that are attrs in test.tigrxml seem to be elements in the medicago XML, for example. So I guess the question is: is the medicago TIGR XML malformed? Can tigrxml.pm be expected to parse it? What, if anything, should be done about this? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From hilgert at cshl.edu Thu Aug 6 19:36:36 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 15:36:36 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Hmmm, I fail to see how supplying raw sequence could be a called "bad" input or a "problem". In our case, for example, not every user is a bioinformatics expert and Cornel was suggesting to account for that instead of trying to "train" the user to adhere to requirements that have not much to do with what s/he tries to accomplish. I don't really see data being modified, rather that the data format is being adopted to the needs of the software; which I would argue should be something the software is being able to take care of. Uwe -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 12:50 PM To: Ghiban, Cornel Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 20:09:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:09:22 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <6729F9CC-ACF9-4BC4-9905-7EA24C1DCA61@illinois.edu> If one supplies raw sequence (no descriptor) to a FASTA parser (requires a descriptor), then it is bad input. One can't reasonably expect the parser to work correctly under those circumstance. Garbage in, garbage out. The simplest and (IMHO) best solution under such circumstances is for the parser to die meaningfully ("Sequence is not FASTA format; '>' descriptor line is missing" or similar). Tacking a '>' onto bad data doesn't make it magically work, it's just bad data with a '>' appended. To take this one step further, what if this were genbank data? Or XML? A well-formed exception, though initially inconvenient to the user, will indicate the problem right away. Silently trying to fix the problem by appending '>' to bad input data wouldn't work, and the resulting failure downstream (likely from validate_seq) would obscure the real problem, being the user is using the wrong format parser. chris On Aug 6, 2009, at 2:36 PM, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being > adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > >> Hi, >> >> It doesn't matter what sequence we use. As Chris Fields's showed in >> his test, not having >> ">" as the 1st character on the first line is the problem. >> We always assumed the sequence is in FASTA format and this seems to >> be wrong. >> >> I think, the solution to our problem is to check whether the ">" >> symbol is present or not. >> If not present then it will be added. >> >> Thank you, >> Cornel Ghiban >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Thursday, August 06, 2009 11:18 AM >> To: Hilgert, Uwe >> Cc: Chris Fields; BioPerl List; Ghiban, Cornel >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> Uwe - could you send an actual data file (as an attachment) that >> reproduces the problem, or is that not possible? >> >> -hilmar >> >> On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: >> >>> I'm not sure what version we have. Cornel may have installed it a >>> while ago from CVS: >>> >>> Module id = Bio::Root::Build >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::Root::Version >>> Module id = Bio::Root::Version >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::SeqIO >>> Module id = Bio::SeqIO >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >>> INST_VERSION undef >>> >>> Cornel still has the checked-out "bioperl-live" directory and the >>> last >>> changes are from March this year. >>> >>> As per why he used "Fasta" instead of 'fasta" as the format >>> parameter >>> in Bio::SeqIO, it's because that what it says in the modules manual. >>> He now tried 'fasta' instead and see no changes in behavior. >>> Omitting >>> the format parameter altogether, fasta-formatted sequence continues >>> to >>> be treated correctly, the first line being removed. However, raw >>> sequence is being treated differently in that the first line is not >>> being removed any more. Instead, the program returns the first line >>> only. Which, in the example I am going to forward in my next >>> message, >>> will return 60 amino acids out of raw sequence of 300 aa. Can't win >>> with raw sequence... >>> >>> >>> The files may be created on different platforms, we didn't notice >>> any >>> difference between using files created on Windows or Linux. >>> >>> Thanks >>> Uwe >>> >>> >>> >>> >>> -----Original Message----- >>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>> Sent: Wednesday, August 05, 2009 6:54 PM >>> To: Chris Fields >>> Cc: Hilgert, Uwe; BioPerl List >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> >>> Uwe - I'd like you to go back to Chris' initial questions that you >>> haven't answered yet: "What version of bioperl are you using, OS, >>> etc? >>> What does your data look like?" I'd add to that, can you show us >>> your >>> full script, or a smaller code snippet that reproduces the problem. >>> >>> I suspect that either something in your script is swallowing the >>> line, >>> or that the line endings in your data file are from a different OS >>> than the one you're running the script on. (Or that you are >>> running a >>> very old version of BioPerl, which is entirely possible if you >>> installed through CPAN.) >>> >>> -hilmar >>> >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out >>>> the full sequence in chunks (based on local $/ = "\n>") and splits >>>> the header off as the first line in that chunk. You could probably >>>> try leaving the format out and letting SeqIO guess it, or passing >>>> the >>>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>>> better to go through the files and add a file extension that >>>> corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate what >>>>> format to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated >>>>> as such (=remove first line) - regardless of whether they really >>>>> are >>>>> fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe >>>>> Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences are being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>>> to guessing). In any case, it's always advisable to explicitly >>>>> indicate the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>>> i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless >>>>>> of whether the files were indeed fasta files or files that only >>>>>> contained sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix >>>>>> for this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 20:25:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:25:45 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> Message-ID: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Michael, Are you using ClustalW 2? I'm not sure but I don't think the wrapper has been updated for the latest version (I think parsing still works, though). chris On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > I'm a complete bioperl novice, trying to do Clustalw on some fasta > files, and am running into trouble: > > ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 550. > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 551. > Can't exec "align": No such file or directory at /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - > output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ > Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 > STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 > STACK: TestClust:22 > ----------------------------------------------------------- > > Here's my code: > > #!/usr/bin/perl -w > > use Bio::Perl; > use Bio::AlignIO; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::Seq; > use strict; > use warnings; > > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my @seq_array = read_all_sequences($ARGV[0],'fasta'); > > for (my $i = 0; $i < @seq_array; $i++){ > (my $seq = $seq_array[$i]->seq()) =~ s/-//g; > $seq_array[$i]->seq($seq); > } > > write_sequence(">test",'fasta', at seq_array); > > my $seq_array_ref = \@seq_array; > my $aln = $factory->align($seq_array_ref); > > my @align_array = $aln->each_seq(); > write_sequence(">testfile",'fasta', at align_array); > > > The loop is just there to take out some gaps that were placed in a > blast previous to this. The write_sequence call confirms that > @seq_array is a valid array of Bio:Seq objects at the time align > calls it. Here's some output in "test": > > >A0220B0939one.1 FV584Q101DEWY9 > TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC > CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT > TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT > TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG > CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG > CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA > CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA > CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT > AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG > >A0220B0939one.2 FV584Q101A4DG7 > TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG > ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC > AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG > TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG > GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA > GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT > CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT > CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT > ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG > ... > > Thanks, > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 20:30:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:30:30 -0500 Subject: [Bioperl-l] tigrxml parsing In-Reply-To: <4A7B0F64.9070205@cornell.edu> References: <4A7B0F64.9070205@cornell.edu> Message-ID: Robert, This popped up recently (may be related): http://thread.gmane.org/gmane.comp.lang.perl.bio.general/19782 http://bugzilla.open-bio.org/show_bug.cgi?id=2868 It might be possible to map this into bioperl, but someone needs to take it up. chris On Aug 6, 2009, at 12:14 PM, Robert Buels wrote: > Hi all, > > Recently in #bioperl somebody came by trying to use > Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz > > svn HEAD tigrxml.pm was not at all happy with these files, > eventually dieing with > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: start is undefined > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 > STACK: Bio::RangeI::contains Bio/RangeI.pm:255 > STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/ > Generic.pm:783 > STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 > STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 > STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/ > Base.pm:266 > STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/ > Expat.pm:225 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm: > 469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/ > Expat.pm:45 > STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 > STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm: > 2631 > STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 > STACK: /crypt/rob/test2.pl:10 > ----------------------------------------------------------- > > Looking at the medicago XML and comparing it to the bioperl-live/t/ > data/test.tigrxml, the two look VERY different in structure. Lots > of things that are attrs in test.tigrxml seem to be elements in the > medicago XML, for example. > > So I guess the question is: is the medicago TIGR XML malformed? > Can tigrxml.pm be expected to parse it? What, if anything, should > be done about this? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From eigenrosen at gmail.com Thu Aug 6 20:39:09 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 13:39:09 -0700 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Hi Chris, I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the top of the module being called. Mike On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the > wrapper has been updated for the latest version (I think parsing > still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > >> I'm a complete bioperl novice, trying to do Clustalw on some fasta >> files, and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >> Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a >> blast previous to this. The write_sequence call confirms that >> @seq_array is a valid array of Bio:Seq objects at the time align >> calls it. Here's some output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Thu Aug 6 20:49:56 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 6 Aug 2009 16:49:56 -0400 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <69367b8f0908061349i48f4d2b1tcbccb00d5a3de5ca@mail.gmail.com> Hi Micheal, Have you considered calling clustalw from perl's "system" command and passing in the files for alignment? Mgavi On Thu, Aug 6, 2009 at 4:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > > I'm a complete bioperl novice, trying to do Clustalw on some fasta files, >> and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf -output=gcg >> -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a blast >> previous to this. The write_sequence call confirms that @seq_array is a >> valid array of Bio:Seq objects at the time align calls it. Here's some >> output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Aug 6 21:00:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 16:00:37 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <2C8DF4CB-40B0-41DB-882A-AAF346A008B2@illinois.edu> Michael, No, I meant was what version of clustalw (the actual executable) you are using. This is the bioperl wrapper svn version. What happens if you enter 'clustalw' on the command line? Do you get: ************************************************************** ******** CLUSTAL 2.0.11 Multiple Sequence Alignments ******** ************************************************************** I think the above version has problems with bioperl, though I can't recall exactly what the problems were. chris On Aug 6, 2009, at 3:39 PM, Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at > the top of the module being called. > > Mike > On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has been updated for the latest version (I think parsing >> still works, though). >> >> chris >> >> On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: >> >>> I'm a complete bioperl novice, trying to do Clustalw on some fasta >>> files, and am running into trouble: >>> >>> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 550. >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 551. >>> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >>> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >>> Bio/Root/Root.pm:328 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >>> STACK: TestClust:22 >>> ----------------------------------------------------------- >>> >>> Here's my code: >>> >>> #!/usr/bin/perl -w >>> >>> use Bio::Perl; >>> use Bio::AlignIO; >>> use Bio::Tools::Run::Alignment::Clustalw; >>> use Bio::SimpleAlign; >>> use Bio::Seq; >>> use strict; >>> use warnings; >>> >>> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >>> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >>> >>> for (my $i = 0; $i < @seq_array; $i++){ >>> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >>> $seq_array[$i]->seq($seq); >>> } >>> >>> write_sequence(">test",'fasta', at seq_array); >>> >>> my $seq_array_ref = \@seq_array; >>> my $aln = $factory->align($seq_array_ref); >>> >>> my @align_array = $aln->each_seq(); >>> write_sequence(">testfile",'fasta', at align_array); >>> >>> >>> The loop is just there to take out some gaps that were placed in a >>> blast previous to this. The write_sequence call confirms that >>> @seq_array is a valid array of Bio:Seq objects at the time align >>> calls it. Here's some output in "test": >>> >>> >A0220B0939one.1 FV584Q101DEWY9 >>> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >>> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >>> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >>> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >>> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >>> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >>> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >>> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >>> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >>> >A0220B0939one.2 FV584Q101A4DG7 >>> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >>> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >>> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >>> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >>> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >>> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >>> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >>> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >>> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >>> ... >>> >>> Thanks, >>> Mike >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From bosborne11 at verizon.net Thu Aug 6 20:01:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 16:01:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Chris, Yes, I think so. By the way, this is related to an old bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1508 Brian O. > This is a simple validation issue: should we throw an exception on > bad input (no '>') From bix at sendu.me.uk Thu Aug 6 21:18:02 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 06 Aug 2009 22:18:02 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <4A7B488A.2060600@sendu.me.uk> Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the > top of the module being called. I'm guessing your error is caused simply by not having clustalw installed. BioPerl run modules provide perl wrappers to external executables. They don't replace the need for those executables. From cjfields at illinois.edu Fri Aug 7 00:47:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 19:47:47 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: I added the exception and tests to svn (r15895), so I closed that bug out. Almost forgot about that one, thanks for pointing it out! chris On Aug 6, 2009, at 3:01 PM, Brian Osborne wrote: > Chris, > > Yes, I think so. > > By the way, this is related to an old bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1508 > > > Brian O. > > >> This is a simple validation issue: should we throw an exception on >> bad input (no '>') > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Aug 7 02:30:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 21:30:09 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A765A44.7030902@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: Jonathan, Just to make sure you aren't accidentally 'warnocked' by the core devs: Your code sounds quite nice! However, we will begin the process of massively restructuring bioperl pretty soon, so I don't think it's a good idea to gear your code towards fitting directly into core. The best alternative should be fairly obvious, which is to release it to CPAN listing BioPerl 1.6.0 as a dependency if it is required. Your modules may or may not need the Bio* namespace (that's up to you, actually); there are several non-bioperl modules that also share the Bio* namespace, and I believe there are modules that aren't Bio* that use BioPerl (Gbrowse comes to mind). If you're focusing on interaction with robotics, Robotics::Bio::X might be a better namespace for instance (b/c you could expand later into other possibly non-bio robotics interfaces). The cpan-discuss list is probably a good place to ask, or (after you register on PAUSE) you can register the module namespace and see if there are any objections to the request. chris On Aug 2, 2009, at 10:32 PM, Jonathan Cline wrote: > Smithies, Russell wrote: >> I "acquired" an old Biomek 1000 that I'm thinking of modernising. >> It was originally controlled by a monstrously large but slow pc >> (IBM Value Point Model 466DX2 computer with Microsoft Windows* >> Version 3.1) >> My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) >> and use software like mach3 www.machsupport.com along with G-code >> to control it. >> I come from an engineering background so it seemed like the easy >> way to me :-) >> >> Now I just need a bit of free time to get it working... >> >> --Russell >> >> >> > I agree, that's probably the best way to go. It's hard to know what > amount of s/w processing was done on the host PC vs. the embedded > controller. If you were able to connect directly to the robot > hardware > with serial port(s) or whatever it's using, it would be tough to find > out the comm protocol unless someone has already reverse engineered it > (which is doubtful). Also from what I have seen online, attempting > to > run the old software under virtual machine is unpredictable due to > timing differences in the serial port communication. So removal of > the > old electronics is probably the best bet. If it has one arm, then > it's > much easier. > > As for robots with working workstation software, it seems the > annoyance > factor is that while the scripting languages are powerful (for GUI > scripting that is), they are still relatively low level. Bio types > with > a bit of CS seem to immediately turn to visual basic, labview, or even > excel spreadsheets and macros, in order to provide a higher level > abstraction for the workstation software. To me, it seems natural > that > there should be a "protocol compiler" which takes biology protocols as > input, and gives robot instructions as output (google "protolexer"). > The huge bottleneck of course is that everyone's robotics work tables > and equipment are somewhat unique to their needs. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > > >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>> Sent: Thursday, 30 July 2009 2:07 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Cc: Jonathan Cline >>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>> >>> I am writing a module for communication with biology robotics, as >>> discussed recently on #bioperl, and I invite your comments. >>> >>> Currently this mode talks to a Tecan genesis workstation robot ( >>> http://images.google.com/images?q=tecan genesis ). Other vendors >>> are >>> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >>> 'net with the exception of some visual basic and labview scripts >>> which I >>> have found. There are some computational biologists who program for >>> robots via high level s/w, but these scripts are not distributed >>> as OSS. >>> >>> With Tecan, there is a datapipe interface for hardware >>> communication, as >>> an added $$ option from the vendor. I haven't checked other >>> vendors to >>> see if they likewise have an open communication path for third party >>> software. By allowing third-party communication, then naturally the >>> next step is to create a socket client-server; especially as the >>> robot >>> vendor only support MS Win and using the local machine has typical >>> Microsoft issues (like losing real time communication with the >>> hardware >>> due to GUI animation, bad operating system stability, no unix except >>> cygwin, etc). >>> >>> >>> On Namespace: >>> >>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are >>> many >>> s/w modules already called 'robots' (web spider robots, chat bots, >>> www >>> automate, etc) so I chose the longer name "robotics" to >>> differentiate >>> this module as manipulating real hardware. Bio::Robotics is the >>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>> manufacturer-specific implementation. Robot control is made more >>> complex due to the very configurable nature of the work table >>> (placement >>> of equipment, type of equipment, type of attached arm, etc). The >>> abstraction has to be careful not to generalize or assume too >>> much. In >>> some cases, the Bio::Robotics modules may expand to arbitrary >>> equipment >>> such as thermocyclers, tray holders, imagers, etc - that could be a >>> future roadmap plan. >>> >>> Here is some theoretical example usage below, subject to change. At >>> this time I am deciding how much state to keep within the Perl >>> module. >>> By keeping state, some robot programming might be simplified >>> (avoiding >>> deadlock or tracking tip state). In general I am aiming for a more >>> "protocol friendly" method implementation. >>> >>> >>> To use this software with locally-connected robotics hardware: >>> >>> use Bio::Robotics; >>> >>> my $tecan = Bio::Robotics->new("Tecan") || die; >>> $tecan->attach() || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack1"); >>> $tecan->pipette(aspirate => "1", dispense => "1", from => >>> "sampleTray", to >>> => "DNATray"); >>> ... >>> >>> To use this software with remote robotics hardware over the network: >>> >>> # On the local machine, run: >>> use Bio::Robotics; >>> >>> my @connected_hardware = Bio::Robotics->query(); >>> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >>> @connected_hardware\n"; >>> $tecan->attach() || die; >>> $tecan->configure("my work table configuration file") || die; >>> # Run the server and process commands >>> while (1) { >>> $error = $tecan->server(passwordplaintext => "0xd290"); >>> if ($tecan->lastClientCommand() =~ /^shutdown/) { >>> last; >>> } >>> } >>> $tecan->detach(); >>> exit(0); >>> >>> # On the remote machine (the client), run: >>> use Bio::Robotics; >>> >>> my $server = "heavybio.dyndns.org:8080"; >>> my $password = "0xd290"; >>> my $tecan = Bio::Robotics->new("Tecan"); >>> $tecan->connect($server, $mypassword) || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack200"); >>> $tecan->pipette(aspirate => "1", dispense => "1", >>> from => "sampleTray A1", to => "DNATray A2", >>> volume => "45", liquid => "Buffer"); >>> $tecan->pipette(drop => "1"); >>> ... >>> $tecan->disconnect(); >>> exit(0); >>> >>> >>> >>> -- >>> >>> ## Jonathan Cline >>> ## jcline at ieee.org >>> ## Mobile: +1-805-617-0223 >>> ######################## >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Aug 7 09:19:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Aug 2009 10:19:14 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? ?I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris That shouldn't matter, according to Des Higgins ClustalW 2 is intended to be completely compatible with ClustalW 1.83, including the command line options. They will be adding new stuff in ClustalW 3. The only think to worry about with ClustalW 2 is parsing the output, as the header line of the alignments has changed very slightly. I can tell you from personal experience that the Biopython command line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for example, and would expect the same to be true for BioPerl. Peter From paola.bisignano at gmail.com Fri Aug 7 12:11:58 2009 From: paola.bisignano at gmail.com (Paola Bisignano via Scour) Date: Fri, 7 Aug 2009 05:11:58 -0700 Subject: [Bioperl-l] Scour Friend Invite Message-ID: <4a7c1a0e5b82d@gmail.com> Hey, Check out: http://scour.com/invite/paola82/ I'm using a new search engine called Scour.com. It shows Google/Yahoo/MSN results and user comments all on one page. Best of all we get rewarded for using it by collecting points with every search, comment and vote. The points are redeemable for Visa gift cards. Join through my invite link so we can be friends and search socially! I know you'll like it, - Paola Bisignano This message was sent to you as a friend referral to join scour.com, please feel free to review our http://scour.com/privacy page and our http://scour.com/communityguidelines/antispam page. If you prefer not to receive invitations from ANY scour members, please click here - http://www.scour.com/unsub/e/YmlvcGVybC1sQGxpc3RzLm9wZW4tYmlvLm9yZw== Write to us at: Scour, Inc., 15303 Ventura Blvd. Suite 220, Sherman Oaks, CA 91403, USA. campaignid: scour200908070001 Scour.com From hlapp at gmx.net Fri Aug 7 13:21:51 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 7 Aug 2009 09:21:51 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4a7c1a0e5b82d@gmail.com> References: <4a7c1a0e5b82d@gmail.com> Message-ID: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Just FYI, I am addressing this offline. Note to everyone: we don't tolerate this and it will get you removed from the list immediately (and banned for the second offense). This is a large list. You better spend the time and be very careful who you send this kind of stuff to before you waste everyone else's. -hilmar From stefan.kirov at bms.com Fri Aug 7 14:25:52 2009 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 07 Aug 2009 10:25:52 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> References: <4a7c1a0e5b82d@gmail.com> <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Message-ID: <4A7C3970.10501@bms.com> Hilmar Lapp wrote: > Just FYI, I am addressing this offline. Note to everyone: we don't > tolerate this and it will get you removed from the list immediately > (and banned for the second offense). This is a large list. You better > spend the time and be very careful who you send this kind of stuff to > before you waste everyone else's. > > -hilmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > It is quite possible this guy has no idea scour is spamming people on his behalf. It seems to me there should be spam-filter trained to take care of these guys. As a reference: http://forums.digitalpoint.com/showthread.php?t=955786 http://markmail.org/message/fzlutwd3mkforbsu -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From jdalzell03 at qub.ac.uk Mon Aug 3 23:18:24 2009 From: jdalzell03 at qub.ac.uk (Johnathan Dalzell) Date: Tue, 4 Aug 2009 00:18:24 +0100 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 Message-ID: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl 5.10 and the activePerl equivalent. I'm wrking through vista, and ovver multiple times, this is the furthest I can get through installation.... Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] a - will install all scripts Do you want to run tests that require connection to servers across the internet (likely to cause some failures)? y/n [n] y - will run internet-requiring tests Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/lib/Data/Dumper.pm lin e 190, line 9. Creating new 'Build' script for 'BioPerl' version '1.006000' ---- Unsatisfied dependencies detected during ---- ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- SOAP::Lite [requires] GraphViz [requires] Convert::Binary::C [requires] Algorithm::Munkres [requires] XML::Twig [requires] DB_File [requires] Set::Scalar [requires] XML::Parser::PerlSAX [requires] XML::Writer [requires] XML::SAX::Writer [requires] Clone [requires] XML::DOM::XPath [requires] PostScript::TextBlock [requires] Running Build test Delayed until after prerequisites Running Build install Delayed until after prerequisites Running install for module 'SOAP::Lite' Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP-Lite-0.710.08.tar.gz ok CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz We are about to install SOAP::Lite and for your convenience will provide you with list of modules and prerequisites, so you'll be able to choose only modules you need for your configuration. XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by default. Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. Press to see the detailed list. Feature Prerequisites Install? ----------------------------- ---------------------------- -------- Core Package [*] Scalar::Util always [*] Test::More [*] URI [*] MIME::Base64 [*] version [*] XML::Parser (v2.23) Client HTTP support [*] LWP::UserAgent always Client HTTPS support [ ] Crypt::SSLeay [ no ] Client SMTP/sendmail support [ ] MIME::Lite [ no ] Client FTP support [*] IO::File [ yes ] [*] Net::FTP Standalone HTTP server [*] HTTP::Daemon [ yes ] Apache/mod_perl server [ ] Apache [ no ] FastCGI server [ ] FCGI [ no ] POP3 server [ ] MIME::Parser [ no ] [*] Net::POP3 IO server [*] IO::File [ yes ] MQ transport support [ ] MQSeries [ no ] JABBER transport support [ ] Net::Jabber [ no ] MIME messages [ ] MIME::Parser [ no ] DIME messages [*] IO::Scalar (v2.105) [ no ] [ ] DIME::Tools (v0.03) [ ] Data::UUID (v0.11) SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] Compression support for HTTP [*] Compress::Zlib [ yes ] MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] --- An asterix '[*]' indicates if the module is currently installed. Do you want to proceed with this configuration? [yes] yes Checking if your kit is complete... Looks good Writing Makefile for SOAP::Lite cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport\TCP.pm cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport\POP3.pm cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema19 99.pm cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema20 01.pm cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport\MQ.pm cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport\FTP.pm cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP\Transport\JABBER.pm cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_2.pm cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport\IO.pm cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_1.pm cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP\Transport\LOCAL.pm cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP\Transport\MAILTO.pm cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/SOAPsh.pl blib\script\S OAPsh.pl pl2bat.bat blib\script\SOAPsh.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/stubmaker.pl blib\scrip t\stubmaker.pl pl2bat.bat blib\script\stubmaker.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/XMLRPCsh.pl blib\script \XMLRPCsh.pl pl2bat.bat blib\script\XMLRPCsh.pl MKUTTER/SOAP-Lite-0.710.08.tar.gz C:\strawberry\c\bin\dmake.EXE -- OK Running make test C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib\lib' , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/013-array-deserializati on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03-server.t t/04-attach. t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08-schema.t t/096_characters.t t /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t t/IO/SessionSet.t t/SO AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/Deserializer/XMLSchema199 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t t /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/SOAP/Transport/FTP.t t/S OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t t/SOAP/Transport/MAILT O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/HTTP/CGI.t t/XML/Parser /Lite.t t/XMLRPC/Lite.t t/01-core.t .................................. ok t/010-serializer.t ........................... ok t/012-cloneable.t ............................ ok t/013-array-deserialization.t ................ ok t/014_UNIVERSAL_use.t ........................ ok t/015_UNIVERSAL_can.t ........................ ok t/02-payload.t ............................... ok t/03-server.t ................................ ok t/04-attach.t ................................ skipped: Could not find MIME::Parser - is M IME::Tools installed? Aborting. t/05-customxml.t ............................. ok t/06-modules.t ............................... ok t/07-xmlrpc_payload.t ........................ ok t/08-schema.t ................................ ok t/096_characters.t ........................... skipped: (no reason given) t/097_kwalitee.t ............................. skipped: (no reason given) t/098_pod.t .................................. skipped: (no reason given) t/099_pod_coverage.t ......................... skipped: (no reason given) t/IO/SessionData.t ........................... ok t/IO/SessionSet.t ............................ ok t/SOAP/Data.t ................................ ok t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok t/SOAP/Lite/Packager.t ....................... ok t/SOAP/Schema/WSDL.t ......................... ok t/SOAP/Serializer.t .......................... 1/12 Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Lite .pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. t/SOAP/Serializer.t .......................... ok t/SOAP/Transport/FTP.t ....................... 1/7 Use of uninitialized value in split at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 55. substr outside of string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SO AP/Transport/FTP.pm line 56. Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/perl/lib/IO/Socket/INET. pm line 117. Use of uninitialized value $server in concatenation (.) or string at C:\strawberry\cpan\bu ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. t/SOAP/Transport/FTP.t ....................... ok t/SOAP/Transport/HTTP.t ...................... ok t/SOAP/Transport/HTTP/CGI.t .................. everytime I get to the CGI.t at the end here the installation won't move! Any suggestions would be greatly appreciated, I've been trying to force it through, literally for 5 hours now.... cheers, jonny From ghiban at cshl.edu Thu Aug 6 16:04:38 2009 From: ghiban at cshl.edu (Ghiban, Cornel) Date: Thu, 6 Aug 2009 12:04:38 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Message-ID: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Hi, It doesn't matter what sequence we use. As Chris Fields's showed in his test, not having ">" as the 1st character on the first line is the problem. We always assumed the sequence is in FASTA format and this seems to be wrong. I think, the solution to our problem is to check whether the ">" symbol is present or not. If not present then it will be added. Thank you, Cornel Ghiban -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Thursday, August 06, 2009 11:18 AM To: Hilgert, Uwe Cc: Chris Fields; BioPerl List; Ghiban, Cornel Subject: Re: [Bioperl-l] Bio::SeqIO issue Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format parameter > in Bio::SeqIO, it's because that what it says in the modules manual. > He now tried 'fasta' instead and see no changes in behavior. Omitting > the format parameter altogether, fasta-formatted sequence continues to > be treated correctly, the first line being removed. However, raw > sequence is being treated differently in that the first line is not > being removed any more. Instead, the program returns the first line > only. Which, in the example I am going to forward in my next message, > will return 60 amino acids out of raw sequence of 300 aa. Can't win > with raw sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, etc? > What does your data look like?" I'd add to that, can you show us your > full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing the >> file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>> Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences are being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to guessing). In any case, it's always advisable to explicitly >>> indicate the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, regardless >>>> of whether the files were indeed fasta files or files that only >>>> contained sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 8 12:38:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 Aug 2009 08:38:46 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4A7C3970.10501@bms.com> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> Message-ID: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Thanks Stefan--this makes a lot more sense to me than supposing a priori that a previous legitimate user of this list is spamming bioperl-l intentionally. I would prefer to initially give the benefit of the doubt to the intelligence of the users, rather than scare people off who are likely to be already mortified that their emails have been commandeered like this. I would definitely support an spam filter that works. MAJ ----- Original Message ----- From: "Stefan Kirov" To: "Hilmar Lapp" Cc: "BioPerl List" Sent: Friday, August 07, 2009 10:25 AM Subject: Re: [Bioperl-l] Scour Friend Invite > Hilmar Lapp wrote: >> Just FYI, I am addressing this offline. Note to everyone: we don't >> tolerate this and it will get you removed from the list immediately >> (and banned for the second offense). This is a large list. You better >> spend the time and be very careful who you send this kind of stuff to >> before you waste everyone else's. >> >> -hilmar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > It is quite possible this guy has no idea scour is spamming people on > his behalf. It seems to me there should be spam-filter trained to take > care of these guys. > As a reference: > http://forums.digitalpoint.com/showthread.php?t=955786 > http://markmail.org/message/fzlutwd3mkforbsu > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 14:18:59 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 10:18:59 -0400 Subject: [Bioperl-l] SeqIO documentation Message-ID: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Chris, Since we've been discussing formats I just wanted to mention that I've changed this documentation from SeqIO.pm: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then Fasta format is assumed. To: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then SeqIO will throw a fatal error. The code is clear, if SeqIO can't figure out what the format is then it dies, "fasta" is not the default format. Brian O. From cjfields at illinois.edu Sat Aug 8 16:23:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:23:44 -0500 Subject: [Bioperl-l] SeqIO documentation In-Reply-To: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> References: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Message-ID: Brian, That fits current behavior, so yes that makes sense. chris On Aug 8, 2009, at 9:18 AM, Brian Osborne wrote: > Chris, > > Since we've been discussing formats I just wanted to mention that > I've changed this documentation from SeqIO.pm: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then Fasta > format is assumed. > > To: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then SeqIO > will throw a fatal error. > > The code is clear, if SeqIO can't figure out what the format is then > it dies, "fasta" is not the default format. > > > Brian O. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 16:24:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:24:48 -0500 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Message-ID: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite > > >> Hilmar Lapp wrote: >>> Just FYI, I am addressing this offline. Note to everyone: we don't >>> tolerate this and it will get you removed from the list immediately >>> (and banned for the second offense). This is a large list. You >>> better >>> spend the time and be very careful who you send this kind of stuff >>> to >>> before you waste everyone else's. >>> >>> -hilmar >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> It is quite possible this guy has no idea scour is spamming people on >> his behalf. It seems to me there should be spam-filter trained to >> take >> care of these guys. >> As a reference: >> http://forums.digitalpoint.com/showthread.php?t=955786 >> http://markmail.org/message/fzlutwd3mkforbsu >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 16:26:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:55 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> Message-ID: <0A43205F-828F-4CC9-ADC3-EBCE92690765@illinois.edu> On Aug 7, 2009, at 4:19 AM, Peter wrote: > On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields > wrote: >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has >> been updated for the latest version (I think parsing still works, >> though). >> >> chris > > That shouldn't matter, according to Des Higgins ClustalW 2 is intended > to be completely compatible with ClustalW 1.83, including the command > line options. They will be adding new stuff in ClustalW 3. The only > think to worry about with ClustalW 2 is parsing the output, as the > header line of the alignments has changed very slightly. > > I can tell you from personal experience that the Biopython command > line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for > example, and would expect the same to be true for BioPerl. > > Peter I would think so as well, but I encountered some issues on my OS using ClustalW 2 with the last release: http://bugzilla.open-bio.org/show_bug.cgi?id=2728 I think it's something small, like something hard-coded in (version maybe) that's causing the problem, just didn't have time to check. chris From cjfields at illinois.edu Sat Aug 8 16:26:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:38 -0500 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <0963ED84-359B-465B-9BA2-956A0AB23587@illinois.edu> Have you tried installing SOAP::Lite directly? That seems to be the hanging point. The funny thing is this is somehow assigning everything as a requirement (SOAP::Lite is a 'recommends'). Worth investigating, but I don't have access to a Windows box (either for XP, Vista, or Win7). Hopefully we'll get a PPM up soon; it's in the roadmap for 1.6.1. In the meantime, (as a strictly temporary measure) have you tried setting PERL5LIB to point to a local copy of bioperl-1.6? chris On Aug 3, 2009, at 6:18 PM, Johnathan Dalzell wrote: > Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl > 5.10 and the activePerl equivalent. I'm wrking through vista, and > ovver multiple times, this is the furthest I can get through > installation.... > > > Install [a]ll Bioperl scripts, [n]one, or choose groups > [i]nteractively? [a] a > - will install all scripts > Do you want to run tests that require connection to servers across > the internet > (likely to cause some failures)? y/n [n] y > - will run internet-requiring tests > Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/ > lib/Data/Dumper.pm lin > e 190, line 9. > Creating new 'Build' script for 'BioPerl' version '1.006000' > ---- Unsatisfied dependencies detected during ---- > ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- > SOAP::Lite [requires] > GraphViz [requires] > Convert::Binary::C [requires] > Algorithm::Munkres [requires] > XML::Twig [requires] > DB_File [requires] > Set::Scalar [requires] > XML::Parser::PerlSAX [requires] > XML::Writer [requires] > XML::SAX::Writer [requires] > Clone [requires] > XML::DOM::XPath [requires] > PostScript::TextBlock [requires] > Running Build test > Delayed until after prerequisites > Running Build install > Delayed until after prerequisites > Running install for module 'SOAP::Lite' > Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP- > Lite-0.710.08.tar.gz > ok > CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > We are about to install SOAP::Lite and for your convenience will > provide > you with list of modules and prerequisites, so you'll be able to > choose > only modules you need for your configuration. > XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by > default. > Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. > Press to see the detailed list. > Feature Prerequisites Install? > ----------------------------- ---------------------------- -------- > Core Package [*] Scalar::Util always > [*] Test::More > [*] URI > [*] MIME::Base64 > [*] version > [*] XML::Parser (v2.23) > Client HTTP support [*] LWP::UserAgent always > Client HTTPS support [ ] Crypt::SSLeay [ no ] > Client SMTP/sendmail support [ ] MIME::Lite [ no ] > Client FTP support [*] IO::File [ yes ] > [*] Net::FTP > Standalone HTTP server [*] HTTP::Daemon [ yes ] > Apache/mod_perl server [ ] Apache [ no ] > FastCGI server [ ] FCGI [ no ] > POP3 server [ ] MIME::Parser [ no ] > [*] Net::POP3 > IO server [*] IO::File [ yes ] > MQ transport support [ ] MQSeries [ no ] > JABBER transport support [ ] Net::Jabber [ no ] > MIME messages [ ] MIME::Parser [ no ] > DIME messages [*] IO::Scalar (v2.105) [ no ] > [ ] DIME::Tools (v0.03) > [ ] Data::UUID (v0.11) > SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] > Compression support for HTTP [*] Compress::Zlib [ yes ] > MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] > --- An asterix '[*]' indicates if the module is currently installed. > Do you want to proceed with this configuration? [yes] yes > Checking if your kit is complete... > Looks good > Writing Makefile for SOAP::Lite > cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod > cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm > cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm > cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm > cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm > cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm > cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm > cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport > \TCP.pm > cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm > cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport > \POP3.pm > cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm > cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod > cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm > cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm > cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm > cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm > cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm > cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod > cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod > cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod > cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm > cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm > cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod > cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm > cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm > cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod > cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema19 > 99.pm > cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm > cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm > cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod > cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport > \HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema20 > 01.pm > cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod > cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm > cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm > cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport > \MQ.pm > cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport > \FTP.pm > cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP > \Transport\JABBER.pm > cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm > cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod > cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm > cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_2.pm > cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport > \IO.pm > cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_1.pm > cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP > \Transport\LOCAL.pm > cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm > cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod > cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm > cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP > \Transport\MAILTO.pm > cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > SOAPsh.pl blib\script\S > OAPsh.pl > pl2bat.bat blib\script\SOAPsh.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > stubmaker.pl blib\scrip > t\stubmaker.pl > pl2bat.bat blib\script\stubmaker.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > XMLRPCsh.pl blib\script > \XMLRPCsh.pl > pl2bat.bat blib\script\XMLRPCsh.pl > MKUTTER/SOAP-Lite-0.710.08.tar.gz > C:\strawberry\c\bin\dmake.EXE -- OK > Running make test > C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib\lib' > , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/ > 013-array-deserializati > on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03- > server.t t/04-attach. > t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08- > schema.t t/096_characters.t t > /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t > t/IO/SessionSet.t t/SO > AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/ > Deserializer/XMLSchema199 > 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/ > Deserializer/XMLSchemaSOAP1_1.t t > /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/ > SOAP/Transport/FTP.t t/S > OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t > t/SOAP/Transport/MAILT > O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/ > HTTP/CGI.t t/XML/Parser > /Lite.t t/XMLRPC/Lite.t > t/01-core.t .................................. ok > t/010-serializer.t ........................... ok > t/012-cloneable.t ............................ ok > t/013-array-deserialization.t ................ ok > t/014_UNIVERSAL_use.t ........................ ok > t/015_UNIVERSAL_can.t ........................ ok > t/02-payload.t ............................... ok > t/03-server.t ................................ ok > t/04-attach.t ................................ skipped: Could not > find MIME::Parser - is M > IME::Tools installed? Aborting. > t/05-customxml.t ............................. ok > t/06-modules.t ............................... ok > t/07-xmlrpc_payload.t ........................ ok > t/08-schema.t ................................ ok > t/096_characters.t ........................... skipped: (no reason > given) > t/097_kwalitee.t ............................. skipped: (no reason > given) > t/098_pod.t .................................. skipped: (no reason > given) > t/099_pod_coverage.t ......................... skipped: (no reason > given) > t/IO/SessionData.t ........................... ok > t/IO/SessionSet.t ............................ ok > t/SOAP/Data.t ................................ ok > t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok > t/SOAP/Lite/Packager.t ....................... ok > t/SOAP/Schema/WSDL.t ......................... ok > t/SOAP/Serializer.t .......................... 1/12 Use of > uninitialized value $values[0] > in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08- > wfOzhM\blib\lib/SOAP/Lite > .pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > t/SOAP/Serializer.t .......................... ok > t/SOAP/Transport/FTP.t ....................... 1/7 Use of > uninitialized value in split at > C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/ > Transport/FTP.pm line 55. > substr outside of string at C:\strawberry\cpan\build\SOAP- > Lite-0.710.08-wfOzhM\blib\lib/SO > AP/Transport/FTP.pm line 56. > Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/ > perl/lib/IO/Socket/INET. > pm line 117. > Use of uninitialized value $server in concatenation (.) or string at > C:\strawberry\cpan\bu > ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. > t/SOAP/Transport/FTP.t ....................... ok > t/SOAP/Transport/HTTP.t ...................... ok > t/SOAP/Transport/HTTP/CGI.t .................. > > everytime I get to the CGI.t at the end here the installation won't > move! Any suggestions would be greatly appreciated, I've been > trying to force it through, literally for 5 hours now.... > > cheers, > jonny > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 16:42:12 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 12:42:12 -0400 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <979637B9-F2EC-47A0-9283-440AA2558481@verizon.net> Jonathan, It looks like you're not the only one having problems with SOAP::Lite on Windows. For a possible workaround: http://objectmix.com/perl/638075-how-install-soap-lite-windows.html Brian O. On Aug 3, 2009, at 7:18 PM, Johnathan Dalzell wrote: > SOAP/Transport/HTTP/CGI From stefan.kirov at bms.com Sat Aug 8 20:45:32 2009 From: stefan.kirov at bms.com (Kirov, Stefan) Date: Sat, 8 Aug 2009 16:45:32 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife>, <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> Message-ID: There is indeed, actually my mail with the same header was held for a while. In any case I think these pay-to-search/invite-colleagues/et spam-whole-address-book sites should be banned if they are not formally not spam, since the user is at least partially aware of the effect. I am not sure if this is a good solution, I am just frustrated, because these companies are quite unethical. Maybe not as unethical as others (few come to my mind, but will not name them :-)), but still... On the other hand they have not been a real problem before. As long as this is not a frequent thing I guess the filter is doing a great job. Stefan ________________________________________ From: Chris Fields [cjfields at illinois.edu] Sent: Saturday, August 08, 2009 12:24 PM To: Mark A. Jensen Cc: Kirov, Stefan; Hilmar Lapp; BioPerl List Subject: Re: [Bioperl-l] Scour Friend Invite I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. From j_martin at lbl.gov Sun Aug 9 02:41:53 2009 From: j_martin at lbl.gov (Joel Martin) Date: Sat, 8 Aug 2009 19:41:53 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <20090809024152.GA26943@eniac.jgi-psf.org> Hello, It sounds like you want a layer to to figure out what they're giving your program before you open it, you could use Bio::Tools::GuessSeqFormat and spare your user the pain of knowledge. It seems reasonable that coddling happens only when requested. use IO::String; use Bio::SeqIO; use Bio::Tools::GuessSeqFormat; my @files = ( 'NC_000913.fasta', '.gb' ); for my $file ( @files ) { my ( $string, $strio, $out ); $strio = IO::String->new( $string ); $out = Bio::SeqIO->new ( -fh => $strio, -format => 'raw' ); my $guesser = new Bio::Tools::GuessSeqFormat( -file => $file ); my $in = Bio::SeqIO->new( -format => $guesser->guess , -file => $file ); while ( my $seq = $in->next_seq() ) { $out->write_seq( $seq ); print substr($string, 0, 30), "\n"; } } Joel On Thu, Aug 06, 2009 at 03:36:36PM -0400, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > > > Hi, > > > > It doesn't matter what sequence we use. As Chris Fields's showed in > > his test, not having > > ">" as the 1st character on the first line is the problem. > > We always assumed the sequence is in FASTA format and this seems to > > be wrong. > > > > I think, the solution to our problem is to check whether the ">" > > symbol is present or not. > > If not present then it will be added. > > > > Thank you, > > Cornel Ghiban > > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Thursday, August 06, 2009 11:18 AM > > To: Hilgert, Uwe > > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > Uwe - could you send an actual data file (as an attachment) that > > reproduces the problem, or is that not possible? > > > > -hilmar > > > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > > > >> I'm not sure what version we have. Cornel may have installed it a > >> while ago from CVS: > >> > >> Module id = Bio::Root::Build > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::Root::Version > >> Module id = Bio::Root::Version > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::SeqIO > >> Module id = Bio::SeqIO > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > >> INST_VERSION undef > >> > >> Cornel still has the checked-out "bioperl-live" directory and the > >> last > >> changes are from March this year. > >> > >> As per why he used "Fasta" instead of 'fasta" as the format parameter > >> in Bio::SeqIO, it's because that what it says in the modules manual. > >> He now tried 'fasta' instead and see no changes in behavior. Omitting > >> the format parameter altogether, fasta-formatted sequence continues > >> to > >> be treated correctly, the first line being removed. However, raw > >> sequence is being treated differently in that the first line is not > >> being removed any more. Instead, the program returns the first line > >> only. Which, in the example I am going to forward in my next message, > >> will return 60 amino acids out of raw sequence of 300 aa. Can't win > >> with raw sequence... > >> > >> > >> The files may be created on different platforms, we didn't notice any > >> difference between using files created on Windows or Linux. > >> > >> Thanks > >> Uwe > >> > >> > >> > >> > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Wednesday, August 05, 2009 6:54 PM > >> To: Chris Fields > >> Cc: Hilgert, Uwe; BioPerl List > >> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >> > >> I don't think that can be the problem. If anything, providing the > >> format ought to be better in terms of result than not providing it? > >> > >> Uwe - I'd like you to go back to Chris' initial questions that you > >> haven't answered yet: "What version of bioperl are you using, OS, > >> etc? > >> What does your data look like?" I'd add to that, can you show us your > >> full script, or a smaller code snippet that reproduces the problem. > >> > >> I suspect that either something in your script is swallowing the > >> line, > >> or that the line endings in your data file are from a different OS > >> than the one you're running the script on. (Or that you are running a > >> very old version of BioPerl, which is entirely possible if you > >> installed through CPAN.) > >> > >> -hilmar > >> > >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> > >>> Uwe, > >>> > >>> Please keep replies on the list. > >>> > >>> It's very possible that's the issue; IIRC the fasta parser pulls out > >>> the full sequence in chunks (based on local $/ = "\n>") and splits > >>> the header off as the first line in that chunk. You could probably > >>> try leaving the format out and letting SeqIO guess it, or passing > >>> the > >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably > >>> better to go through the files and add a file extension that > >>> corresponds to the format. > >>> > >>> chris > >>> > >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >>> > >>>> Thanks, Chris. The files have no extension, but we indicate what > >>>> format to use, like in the manual: > >>>> > >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > >>>> > >>>> I wonder now whether this could exactly cause the problem: as we > >>>> are > >>>> telling that input files are in fasta format they are being treated > >>>> as such (=remove first line) - regardless of whether they really > >>>> are > >>>> fasta? > >>>> > >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe > >>>> Hilgert, Ph.D. > >>>> Dolan DNA Learning Center > >>>> Cold Spring Harbor Laboratory > >>>> > >>>> C: (516) 857-1693 > >>>> V: (516) 367-5185 > >>>> E: hilgert at cshl.edu > >>>> F: (516) 367-5182 > >>>> W: http://www.dnalc.org > >>>> > >>>> -----Original Message----- > >>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>> Sent: Wednesday, August 05, 2009 5:04 PM > >>>> To: Hilgert, Uwe > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >>>> > >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >>>> > >>>>> Is my impression correct that Bio::SeqIO just assumes that > >>>>> sequences are being submitted in FASTA format? > >>>> > >>>> No. See: > >>>> > >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO > >>>> > >>>> SeqIO tries to guess at the format using the file extension, and if > >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > >>>> possible that the extension is causing the problem, or that > >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced > >>>> to guessing). In any case, it's always advisable to explicitly > >>>> indicate the format when possible. > >>>> > >>>> Relevant lines: > >>>> > >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > >>>> i; > >>>> ... > >>>> return 'raw' if /\.(txt)$/i; > >>>> > >>>>> In our experience, implementing > >>>>> Bio::SeqIO led to the first line of files being cut off, > >>>>> regardless > >>>>> of whether the files were indeed fasta files or files that only > >>>>> contained sequence. > >>>> > >>>> Files that only contain sequence are 'raw'. Ones in FASTA are > >>>> 'fasta'. > >>>> > >>>>> Which, in the latter, led to sequence submissions that had the > >>>>> first line of nucleotides removed. Has anyone tried to write a fix > >>>>> for this? > >>>> > >>>> This sounds like a bug, but we have very little to go on beyond > >>>> your > >>>> description. What version of bioperl are you using, OS, etc? What > >>>> does your data look like? File extension? > >>>> > >>>> chris > >>>> > >>>>> Thanks, > >>>>> > >>>>> Uwe > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >>>>> > >>>>> Uwe Hilgert, Ph.D. > >>>>> > >>>>> Dolan DNA Learning Center > >>>>> > >>>>> Cold Spring Harbor Laboratory > >>>>> > >>>>> > >>>>> > >>>>> V: (516) 367-5185 > >>>>> > >>>>> E: hilgert at cshl.edu > >>>>> > >>>>> F: (516) 367-5182 > >>>>> > >>>>> W: http://www.dnalc.org > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Sun Aug 9 10:38:30 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 11:38:30 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EA726.60303@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > OK, I propose to look into these. Almost certainly I'll be doing "convert > run/db/network to Module::Build". I'll try to resolve the bugs you've > mentioned. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. Chris already started on "convert run/db/network to Module::Build" for some reason, but his attempt doesn't actually result in any modules getting installed (setting pm_files() like that isn't enough). The easiest, cleanest and most standard solution is to create a lib directory and svn move Bio into it. Does anyone have an objection to me doing this for the network, db and run packages? It will only affect developers currently working on code in those packages, and they just need to be aware that an svn update will be rather dramatic after my change. From cjfields at illinois.edu Sun Aug 9 13:05:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:05:17 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7EA726.60303@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> Message-ID: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> ... > > Chris already started on "convert run/db/network to Module::Build" > for some reason, but his attempt doesn't actually result in any > modules getting installed (setting pm_files() like that isn't enough). > > The easiest, cleanest and most standard solution is to create a lib > directory and svn move Bio into it. Does anyone have an objection to > me doing this for the network, db and run packages? It will only > affect developers currently working on code in those packages, and > they just need to be aware that an svn update will be rather > dramatic after my change. If it stimulates you into doing this then I'm all for it, but I've waited on getting this fixed long enough I decided to take it on myself to work on it, using the simplest ones. You had mentioned several times you would do this and I hadn't seen any progress. The point: I would really like to get another point release out before we work on splitting things up. Simple as that. From what I have seen (with my few tests) everything (modules, scripts) gets copied into blib just fine and the temp folder for script generation gets cleaned up; I haven't progressed beyond to the installation step, but there isn't anything to me that indicates it wouldn't work. I won't be available until Wed. at the earliest for additional comment (out of town, no internet connection). chris From bix at sendu.me.uk Sun Aug 9 13:15:07 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 14:15:07 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> Message-ID: <4A7ECBDB.9030505@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> The easiest, cleanest and most standard solution is to create a lib >> directory and svn move Bio into it. Does anyone have an objection to >> me doing this for the network, db and run packages? It will only >> affect developers currently working on code in those packages, and >> they just need to be aware that an svn update will be rather dramatic >> after my change. > > From what I have seen (with my few tests) everything (modules, scripts) > gets copied into blib just fine and the temp folder for script > generation gets cleaned up; I haven't progressed beyond to the > installation step, but there isn't anything to me that indicates it > wouldn't work. ./Build testinstall will show you it doesn't work as-is. If you're in a rush I'll just do the svn moves and we can revert later if anyone complains. From cjfields at illinois.edu Sun Aug 9 13:19:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:19:30 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <2790F9A5-43E8-47E5-B5AA-98239B95EF04@illinois.edu> On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. > > If you're in a rush I'll just do the svn moves and we can revert > later if anyone complains. Works for me. The sooner it gets done the better (next week, would be nice, but two is fine so we don't rush it too much). I'll be working on several other bits, including FASTQ, when I get back Wed, then I'll merge over and work on the next point release. chris From cjfields at illinois.edu Sun Aug 9 13:34:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:34:07 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. Sorry, I'll be leaving in the next hour, but for the above, did you mean './Build fakeinstall'? As long as you're moving everything into /lib (which I fully support), we should consider hard_coding scripts into bp_foo.PLS syntax seeing as we're going through additional trouble of converting them over. That is, unless there is a specific purpose to keeping them without the 'bp_'. chris From bix at sendu.me.uk Sun Aug 9 14:00:18 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 15:00:18 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <4A7ED672.20701@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>>> The easiest, cleanest and most standard solution is to create a lib >>>> directory and svn move Bio into it. Does anyone have an objection to >>>> me doing this for the network, db and run packages? It will only >>>> affect developers currently working on code in those packages, and >>>> they just need to be aware that an svn update will be rather >>>> dramatic after my change. >>> >>> From what I have seen (with my few tests) everything (modules, >>> scripts) gets copied into blib just fine and the temp folder for >>> script generation gets cleaned up; I haven't progressed beyond to the >>> installation step, but there isn't anything to me that indicates it >>> wouldn't work. >> >> ./Build testinstall will show you it doesn't work as-is. > > Sorry, I'll be leaving in the next hour, but for the above, did you mean > './Build fakeinstall'? Yes, sorry. > As long as you're moving everything into /lib (which I fully support), > we should consider hard_coding scripts into bp_foo.PLS syntax seeing as > we're going through additional trouble of converting them over. That > is, unless there is a specific purpose to keeping them without the 'bp_'. (The final suffix is supposed to be .pl - we convert from PLS to pl in core, no conversion needed in db) Yes, for only a handful of scripts, it actually makes sense to flatten them all into a new bin directory, which is the default script location for Module::Build. So for example I'd do: svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl etc. From bix at sendu.me.uk Sun Aug 9 16:13:03 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 17:13:03 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EF58F.9000909@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. These issues should now be resolved. I'll note that for future cases similar to 3), if a user chooses to install an optional dependency using CPAN/CPANPLUS and the installation of that external module causes an infinite loop, it's an issue of that module or CPAN/CPANPLUS, not BioPerl. The solution from our end is to tell the user to choose not to install that dependency or ask on the CPAN mailing list if they really need it. (I've often got stuck in infinite loops just trying to install Bundle::CPAN! CPAN itself will detect infinite loops after a while and kill itself.) From jdalzell03 at qub.ac.uk Sun Aug 9 09:06:26 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Sun, 9 Aug 2009 02:06:26 -0700 (PDT) Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <24885345.post@talk.nabble.com> Thanks for the replies, I emailed Chris and Brian individually, but I guess it would be helpfull if I threw my solution to "the dogs" In the end I found that by downloading subversion (you need to sign up to collabnet for a user account first), and following the installation instructions of the relevant subversion pages on the bioperl site (http://www.bioperl.org/wiki/Using_Subversion), that It downloaded fine first time. No need for CPAN, or a PPM, just copy paste 'svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live' into your command line, and it auto installs in under 30 seconds...definately the way to go for anyone else out there trying to bust-a-move on a Win machine. At time of writing, I have also installed BioPerl-db (same as above, copy and paste 'svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db' into command line), and BioPerl-run (I typed in 'svn co svn://code.open-bio.org/bioperl/bioperl-run/trunk bio' (I THINK), and it worked fine. The relevant installation instructions don't give an explicit command for BP-run installation, but I think that matches the branches and trunk in the subversion repository (if not, sorry, but you can cross ref its position in there easily by following the links). Both have worked without problem on Strawberry Perl 5.10 through WinVista, so far. Jonny -- View this message in context: http://www.nabble.com/bioperl-1.6-installation-on-vista-with-perl-5.10-tp24875623p24885345.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From mwhagen85 at gmail.com Mon Aug 10 18:54:53 2009 From: mwhagen85 at gmail.com (OjoLoco) Date: Mon, 10 Aug 2009 11:54:53 -0700 (PDT) Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits Message-ID: <24905417.post@talk.nabble.com> Hello all, I have found matching sequences between two genomes and I would now like to create a graphic that contains a heat map-like track that will show areas of the genome that were found more often than others. For every nt I have the number of times it was found, so if it was found very often it would be a darker color than say a nt that wasn't found at all. Is there any way to achieve this using built in BioPerl graphics? Thank you for your time. -- View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Mon Aug 10 19:22:36 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 10 Aug 2009 15:22:36 -0400 Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits In-Reply-To: <24905417.post@talk.nabble.com> References: <24905417.post@talk.nabble.com> Message-ID: Hi, You should be able to do that with wiggle_density and wiggle_xyplot glyphs. See http://gmod.org/wiki/GBrowse/Uploading_Wiggle_Tracks for instructions on constructing wiggle plots. After you have a wiggle plot, you'll need the wiggle2gff3.pl script (which is part of GBrowse, but it will should run fine on its own), which you can get from GMOD's cvs: http://gmod.cvs.sourceforge.net/viewvc/*checkout*/gmod/Generic-Genome-Browser/bin/wiggle2gff3.pl which will convert the wig file to a binary file. Then you can create Bio::SeqFeatureI objects that will work with Bio::Graphics to draw the density or xyplot. Note as well that Bio::Graphics is no longer part of the main BioPerl distribution, so you'll need to get the most recent version from CPAN. Also, fair warning: I've never actually done this; I've only used wiggle plots in the context of GBrowse, but it should work pretty much as described. Scott On Aug 10, 2009, at 2:54 PM, OjoLoco wrote: > > Hello all, > I have found matching sequences between two genomes and I would > now like > to create a graphic that contains a heat map-like track that will > show areas > of the genome that were found more often than others. For every nt > I have > the number of times it was found, so if it was found very often it > would be > a darker color than say a nt that wasn't found at all. Is there any > way to > achieve this using built in BioPerl graphics? Thank you for your time. > -- > View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jdalzell03 at qub.ac.uk Tue Aug 11 15:07:52 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:07:52 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <24919498.post@talk.nabble.com> Hi, trying to run the example given for Bio::Tools::HMM on the Bioperl site, and when I try to run it, I get this in the command line... "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. Compilation failed in require at HMM.txt line 4. BEGIN failed--compilation aborted at HMM.txt line 4." I have installed the entire bioperl-ext package through subversion, and it looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it won't work. Am I missing something? I'm under the impression that the C-compiler comes with bioperl-ext (which installed with no reported problems)? I concede that I am extrememly new to both Perl in general and Bioperl more specifically, but I have followed the instructions which I can find. I have the bioperl core installed in addition to bioperl-db and bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that most work through Linux systems...I am at times sorely tempted myself. Any suggestions would be welcomed gratefully, cheers, Jonny ps. this is the partial script I was trying to run... #!/usr/bin/perl -w usr strict; use Bio::Tools::HMM; use Bio::SeqIO; use Bio::Matrix::Scoring; #Create a HMM object #ACGT are the bases NC mean non-coding and coding $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); #Initialise some training observation sequences $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); @seqs = ($seq1, $seq2); #Train the HMM with the observation sequences $hmm ->baum_welch_training(\@seqs); #Get parameters $init = $hmm->init_prob; #Returns an array reference $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring I realise that this is incomplete. -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shameer at ncbs.res.in Tue Aug 11 17:07:20 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 11 Aug 2009 22:37:20 +0530 (IST) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Hello Jonny, Are you sure that you have a compiled version of HMMER installed in your machine ? -- K. Shameer > Hi, > > trying to run the example given for Bio::Tools::HMM on the Bioperl site, > and > when I try to run it, I get this in the command line... > > "The C-compiled engine for Hidden Markov Model (HMM) has not been > installed. > Please read the install the bioperl-ext package > > BEGIN failed--compilation aborted at > C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. > Compilation failed in require at HMM.txt line 4. > BEGIN failed--compilation aborted at HMM.txt line 4." > > I have installed the entire bioperl-ext package through subversion, and it > looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it > won't work. Am I missing something? I'm under the impression that the > C-compiler comes with bioperl-ext (which installed with no reported > problems)? I concede that I am extrememly new to both Perl in general and > Bioperl more specifically, but I have followed the instructions which I > can > find. I have the bioperl core installed in addition to bioperl-db and > bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that > most > work through Linux systems...I am at times sorely tempted myself. > > Any suggestions would be welcomed gratefully, > cheers, > Jonny > > ps. this is the partial script I was trying to run... > > #!/usr/bin/perl -w > > usr strict; > use Bio::Tools::HMM; > use Bio::SeqIO; > use Bio::Matrix::Scoring; > > #Create a HMM object > #ACGT are the bases NC mean non-coding and coding > $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); > > #Initialise some training observation sequences > $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); > $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); > @seqs = ($seq1, $seq2); > > #Train the HMM with the observation sequences > $hmm ->baum_welch_training(\@seqs); > > #Get parameters > $init = $hmm->init_prob; #Returns an array reference > $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring > $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring > > I realise that this is incomplete. > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jdalzell03 at qub.ac.uk Tue Aug 11 15:14:59 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:14:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24919603.post@talk.nabble.com> I should point out perhaps that CPAN is not an option on a Win setup...it has never worked for anything I have tried to install. Although I'm using Strawberry Perl now, I had no success getting bioperl or any of its components through the activestate PPM either (One of the reasons I ended up going to Strawberry). The only option I have for installation is the subversion server. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919603.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 15:42:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:42:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24920117.post@talk.nabble.com> I realise that this looks like there is a problem with Bio::Tools::HMM when looking at the source code, but I've even tried replacing the HMM.pm file I had with the HMM.pm script at http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, and now I'm getting... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: C:/strawberry/perl/lib C:/strawberry/perl/site/ lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." ?? jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24920117.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 18:52:21 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 11:52:21 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Message-ID: <24923606.post@talk.nabble.com> Hi, I'm as sure as I can be. I look in the HHMER folder and it contains "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something to do with @INC, but I put "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at the top of my script, which definately encompasses the directory it should be in, and I still get... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib C:/strawberry/perl/site/lib/ Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." I'm out of ideas. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From rmb32 at cornell.edu Tue Aug 11 19:23:56 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:23:56 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24920117.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> Message-ID: <4A81C54C.5020905@cornell.edu> Jonny, For quicker help you might want to try #bioperl on freenode. That said, the problem here is that when you get code from subversion, you are not really 'installing' it, you are just copying it to your machine. Part of the installation process is compiling these things, and for that you need a working C compiler. I don't know anything about using BioPerl on Windows, but as a general recommendation I would say go back to the CPAN and/or ppm directions and getting those working. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Jonny Dalzell wrote: > I realise that this looks like there is a problem with Bio::Tools::HMM when > looking at the source code, but I've even tried replacing the HMM.pm file I > had with the HMM.pm script at > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, > and now I'm getting... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: > C:/strawberry/perl/lib C:/strawberry/perl/site/ > lib .) at HMM.txt line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > ?? > > jonny From maj at fortinbras.us Tue Aug 11 19:22:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 15:22:42 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <7C7654A8A64E49158F6761EE09C9F297@NewLife> Jonny, You need the HMMER application, which is not part of BioPerl. See http://hmmer.janelia.org/ for download options. MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 2:52 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Tue Aug 11 19:48:11 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:48:11 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81C54C.5020905@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> Message-ID: <4A81CAFB.5050903@cornell.edu> Elaborating more, the 'C-compiled engine' error comes because Bio::Ext::HMM is not installed, because bioperl-ext is not installed (correctly), because Bio::Ext::HMM is an XS extension written in C. Which needs to be compiled. With a C compiler. As part of some kind of installation process, not just copying the files to a machine with subversion. Rob Robert Buels wrote: > Jonny, > > For quicker help you might want to try #bioperl on freenode. > > That said, the problem here is that when you get code from subversion, > you are not really 'installing' it, you are just copying it to your > machine. Part of the installation process is compiling these things, > and for that you need a working C compiler. > > I don't know anything about using BioPerl on Windows, but as a general > recommendation I would say go back to the CPAN and/or ppm directions and > getting those working. > > Rob > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From bix at sendu.me.uk Tue Aug 11 20:11:43 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 11 Aug 2009 21:11:43 +0100 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <4A81D07F.6000703@sendu.me.uk> Jonny Dalzell wrote: > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. lib (or at least one entry in your PERL5LIB) needs to point to the directory that contains the Bio directory. So: use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; Now it will be able to locate Bio::Tools::Hmm. You'll still get your original error because you don't have Hmmer installed. See Mark's reply. From jdalzell03 at qub.ac.uk Tue Aug 11 20:29:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:29:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81D07F.6000703@sendu.me.uk> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> Message-ID: <24925178.post@talk.nabble.com> Hi, thanks. I did install HHMER from the site Mark suggested, and it is within the directories that perl recognizes when reading the script...still I get "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package" Is it possible that this module simply won't run through windows? jonny Sendu Bala-2 wrote: > > Jonny Dalzell wrote: >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >> something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >> the top of my script, which definately encompasses the directory it >> should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >> HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. > > lib (or at least one entry in your PERL5LIB) needs to point to the > directory that contains the Bio directory. So: > > use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > > Now it will be able to locate Bio::Tools::Hmm. You'll still get your > original error because you don't have Hmmer installed. See Mark's reply. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 20:31:36 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:31:36 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81CAFB.5050903@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> Message-ID: <24925211.post@talk.nabble.com> OK, so is there any particular C-compiler which I should use? Thanks, jonny Robert Buels wrote: > > Elaborating more, the 'C-compiled engine' error comes because > Bio::Ext::HMM is not installed, because bioperl-ext is not installed > (correctly), because Bio::Ext::HMM is an XS extension written in C. > Which needs to be compiled. With a C compiler. As part of some kind of > installation process, not just copying the files to a machine with > subversion. > > Rob > > Robert Buels wrote: >> Jonny, >> >> For quicker help you might want to try #bioperl on freenode. >> >> That said, the problem here is that when you get code from subversion, >> you are not really 'installing' it, you are just copying it to your >> machine. Part of the installation process is compiling these things, >> and for that you need a working C compiler. >> >> I don't know anything about using BioPerl on Windows, but as a general >> recommendation I would say go back to the CPAN and/or ppm directions and >> getting those working. >> >> Rob >> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Tue Aug 11 21:05:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 17:05:10 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925178.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: Jonny, It will run in Win/Vis but there are some caveats. The BioPerl package has some plain C components, as Rob pointed out. These need to be compiled, and the objects/libraries put in the right place. CPAN will cause this to happen when you have a compiler available; ActiveState .ppm will download the binaries directly from the repository (my understanding, anyway). CPAN is always available by doing > perl -MCPAN -e shell but you may not have a C compiler around. This is a little tricky. You can either explore Visual C/C++ options from MS here http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, and install Cygwin (www.cygwin.com), which creates a linux-like environment with GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful as the real thing, I grant. Which bring me to a third possibility, that I haven't tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot system (https://help.ubuntu.com/community/WindowsDualBoot). MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 4:29 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > thanks. I did install HHMER from the site Mark suggested, and it is within > the directories that perl recognizes when reading the script...still I get > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > Please read the install the bioperl-ext package" > > Is it possible that this module simply won't run through windows? > > jonny > > > > Sendu Bala-2 wrote: >> >> Jonny Dalzell wrote: >>> Hi, >>> >>> I'm as sure as I can be. I look in the HHMER folder and it contains >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >>> something >>> to do with @INC, but I put >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >>> the top of my script, which definately encompasses the directory it >>> should >>> be in, and I still get... >>> >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >>> C:/strawberry/perl/site/lib/ >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >>> HMM.txt >>> line 5. >>> BEGIN failed--compilation aborted at HMM.txt line 5." >>> >>> I'm out of ideas. >> >> lib (or at least one entry in your PERL5LIB) needs to point to the >> directory that contains the Bio directory. So: >> >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; >> >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your >> original error because you don't have Hmmer installed. See Mark's reply. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue Aug 11 21:39:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 12 Aug 2009 09:39:30 +1200 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB6F93AA@exchsth.agresearch.co.nz> Dev-C++ http://www.bloodshed.net/devcpp.html is a good (i.e. free under GPL) Windows compiler I've used before. Might save having to install Cygwin. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Wednesday, 12 August 2009 9:05 a.m. > To: Jonny Dalzell; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Jonny, > It will run in Win/Vis but there are some caveats. The BioPerl package has > some > plain C components, as Rob pointed out. These need to be compiled, and the > objects/libraries put in the right place. CPAN will cause this to happen when > you have a compiler available; ActiveState .ppm will download the binaries > directly from the repository (my understanding, anyway). CPAN is always > available by doing > > > perl -MCPAN -e shell > > but you may not have a C compiler around. This is a little tricky. You can > either explore Visual C/C++ options from MS here > http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, > and install Cygwin (www.cygwin.com), which creates a linux-like environment > with > GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful > as > the real thing, I grant. Which bring me to a third possibility, that I haven't > tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot > system (https://help.ubuntu.com/community/WindowsDualBoot). > MAJ > ----- Original Message ----- > From: "Jonny Dalzell" > To: > Sent: Tuesday, August 11, 2009 4:29 PM > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > > > > > Hi, > > > > thanks. I did install HHMER from the site Mark suggested, and it is within > > the directories that perl recognizes when reading the script...still I get > > > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > > Please read the install the bioperl-ext package" > > > > Is it possible that this module simply won't run through windows? > > > > jonny > > > > > > > > Sendu Bala-2 wrote: > >> > >> Jonny Dalzell wrote: > >>> Hi, > >>> > >>> I'm as sure as I can be. I look in the HHMER folder and it contains > >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > >>> something > >>> to do with @INC, but I put > >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > >>> the top of my script, which definately encompasses the directory it > >>> should > >>> be in, and I still get... > >>> > >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > >>> C:/strawberry/perl/site/lib/ > >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > >>> HMM.txt > >>> line 5. > >>> BEGIN failed--compilation aborted at HMM.txt line 5." > >>> > >>> I'm out of ideas. > >> > >> lib (or at least one entry in your PERL5LIB) needs to point to the > >> directory that contains the Bio directory. So: > >> > >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > >> > >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your > >> original error because you don't have Hmmer installed. See Mark's reply. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > -- > > View this message in context: > > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista-- > tp24919498p24925178.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Tue Aug 11 23:44:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:44:23 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext that generates HMM's (XS-based bindings I think). I have managed to compile it successfully on Ubuntu and Mac OS X, but WinVista is a whole different bag-o-worms altogether (untested AFAIK). For the record, I do not recommend using it; I'm unsure about it's maintenance status, so it may be released separately. It would be best to use something better supported, such as the HMMER wrapper in bioperl-run and the hmmer parsers in bioperl-core. We may also have wrappers for similar code available in biolib at some future point. chris On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ > Tools/";" at > the top of my script, which definately encompasses the directory it > should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ > per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 11 23:48:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:48:08 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925211.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> <24925211.post@talk.nabble.com> Message-ID: <3A5CA958-3B03-4252-B78F-07BBFF1FA355@illinois.edu> Any C-based code should use the same compiler used from whatever perl version you are running. ActiveState supports both VC/C++ (as Mark indicates) or mingw/gcc. I think Strawberry supports mainly the latter. Though you can use CygWin, I think a native Win module is the best way to go if possible. It will likely be a tricky road, so keep us updated and we'll attempt to help out the best we can. chris On Aug 11, 2009, at 3:31 PM, Jonny Dalzell wrote: > > OK, > > so is there any particular C-compiler which I should use? > > Thanks, > jonny > > > > Robert Buels wrote: >> >> Elaborating more, the 'C-compiled engine' error comes because >> Bio::Ext::HMM is not installed, because bioperl-ext is not installed >> (correctly), because Bio::Ext::HMM is an XS extension written in C. >> Which needs to be compiled. With a C compiler. As part of some >> kind of >> installation process, not just copying the files to a machine with >> subversion. >> >> Rob >> >> Robert Buels wrote: >>> Jonny, >>> >>> For quicker help you might want to try #bioperl on freenode. >>> >>> That said, the problem here is that when you get code from >>> subversion, >>> you are not really 'installing' it, you are just copying it to your >>> machine. Part of the installation process is compiling these >>> things, >>> and for that you need a working C compiler. >>> >>> I don't know anything about using BioPerl on Windows, but as a >>> general >>> recommendation I would say go back to the CPAN and/or ppm >>> directions and >>> getting those working. >>> >>> Rob >>> >>> >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Aug 12 00:09:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 20:09:01 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> Message-ID: <69BDE54FD5C943669BCD41A9A607634A@NewLife> [OOps. Sorry about that. The compiler ideas still apply however.] ----- Original Message ----- From: "Chris Fields" To: "Jonny Dalzell" Cc: Sent: Tuesday, August 11, 2009 7:44 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext > that generates HMM's (XS-based bindings I think). I have managed to compile > it successfully on Ubuntu and Mac OS X, but WinVista is a whole different > bag-o-worms altogether (untested AFAIK). > > For the record, I do not recommend using it; I'm unsure about it's > maintenance status, so it may be released separately. It would be best to > use something better supported, such as the HMMER wrapper in bioperl-run and > the hmmer parsers in bioperl-core. We may also have wrappers for similar > code available in biolib at some future point. > > chris > > On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > >> >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ Tools/";" at >> the top of my script, which definately encompasses the directory it should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. >> >> Jonny >> -- >> View this message in context: >> http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Aug 12 16:44:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 Aug 2009 11:44:37 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ED672.20701@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> <4A7ED672.20701@sendu.me.uk> Message-ID: <1F099DCC-073E-470E-873A-608E674375C1@illinois.edu> On Aug 9, 2009, at 9:00 AM, Sendu Bala wrote: > Chris Fields wrote: > ... >> As long as you're moving everything into /lib (which I fully >> support), we should consider hard_coding scripts into bp_foo.PLS >> syntax seeing as we're going through additional trouble of >> converting them over. That is, unless there is a specific purpose >> to keeping them without the 'bp_'. > > (The final suffix is supposed to be .pl - we convert from PLS to pl > in core, no conversion needed in db) Yes, had that reversed in my commit. Thanks. > Yes, for only a handful of scripts, it actually makes sense to > flatten them all into a new bin directory, which is the default > script location for Module::Build. > > So for example I'd do: > svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl > etc. Yes, exactly. It seems we're going out of our way to keep things as they were previously when using ExtUtil::MakeMaker/Makefile.PL. I'm not quite sure why we've bent over backwards to work around these issues when it is much easier to stick to simple standards that 99% of CPAN uses: scripts in bin (or whatever dir is passed to script_files), modules in lib. I'm not complaining, just haven't heard an explanation about that one way or the other. chris From rmb32 at cornell.edu Thu Aug 13 18:59:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 11:59:00 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A79A52E.7000104@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> Message-ID: <4A846274.4000600@cornell.edu> OK, commit 15927 adds some more info about -db options for Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, nuccore, nucgss, nucest, and unigene, and including a link to an (XML) page from NCBI that lists inputs that NCBI accepts. Could somebody who knows more about eUtils than me also review this patch and make corrections if necessary? Rob Robert Buels wrote: > I think you're looking for the -db => 'nucgss' option. > > I'll add a better listing of this (undocumented) options to the > Bio::DB::Query::GenBank docs. > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jdalzell03 at qub.ac.uk Thu Aug 13 19:27:14 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Thu, 13 Aug 2009 12:27:14 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24957222.post@talk.nabble.com> Fellows, thanks very much for the input. However, today I saw fit to dual-boot with ubuntu. I've installed everything, but I still get the same "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package " message! Is it ridiculous of me to expect ubuntu to take care of this for me? How do I go about compiling the HMM? Thanks in advance, Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24957222.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Aug 13 19:41:21 2009 From: jonathanmflowers at gmail.com (Jonathan Flowers) Date: Thu, 13 Aug 2009 12:41:21 -0700 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO Message-ID: Hi, I am trying to parse BLAST reports written in XML using Bio::SearchIO. When running the following code on a set of reports (multiple query results in a single file), I only get one ResultI object. I tried running the same code on a file in 'blast' format and obtained the expected results (ie one ResultI object for each query), suggesting that the issue is with blastxml. I found an old thread on this listserv where someone had had a similar problem, but could not find how it was resolved. I am using Bioperl 1.5.2 and the XML reports were generated using blastall with the -m7 option. my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'blastreport.xml' ); while( my $result = $in->next_result ) { print $result->query_name,"\n"; while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #do something with hsp } } } Thanks Jonathan From rmb32 at cornell.edu Thu Aug 13 21:37:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 14:37:21 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24957222.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> Message-ID: <4A848791.4010402@cornell.edu> Jonny Dalzell wrote: > Is it ridiculous of me to expect ubuntu to take care of this for me? How do > I go about compiling the HMM? Yes. This is a very specialized thing that you're doing, and Ubuntu does not have the resources to package every single thing. Unfortunately, it looks like bioperl-ext package is not installable under Ubuntu 9.04 anyway, which is what I'm running. For others on this list, if somebody is interested in doing maintaining it, I'd be happy to help out by testing on Debian-based Linux platforms. We need to clarify this package's maintenance status: if there is nobody interested in maintaining it, I would recommend that bioperl-ext be removed from distribution. It's not in anybody's interest to have unmaintained software out there causing confusion. So Jonny, in short, I would say "do not use bioperl-ext". Step back. What are you trying to accomplish? Chris already recommended some alternative methods in his email of 8/11 on this subject. Perhaps we can guide you to some software that is actively maintained and will meet your needs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 13 22:06:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:06:29 -0500 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A846274.4000600@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> <4A846274.4000600@cornell.edu> Message-ID: <916D0E26-EBB5-4E28-99AD-F689639BB93A@illinois.edu> It looks fine. As for the databases, you can always get the latest databases using a script from bioperl-live, which uses Bio::DB::EUtilities to access them directly (scripts/DB_EUtilities/ einfo.PLS, which should install as bp_einfo.pl). (looking at the below, what is blastdbinfo?) cjfields4:DB_EUtilities cjfields$ perl einfo.PLS pubmed protein nucleotide nuccore nucgss nucest structure genome biosystems blastdbinfo books cancerchromosomes cdd gap domains gene genomeprj gensat geo gds homologene journals mesh ncbisearch nlmcatalog omia omim pepdome pmc popset probe proteinclusters pcassay pccompound pcsubstance snp sra taxonomy toolkit unigene chris On Aug 13, 2009, at 1:59 PM, Robert Buels wrote: > OK, commit 15927 adds some more info about -db options for > Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, > nuccore, nucgss, nucest, and unigene, and including a link to an > (XML) page from NCBI that lists inputs that NCBI accepts. > > Could somebody who knows more about eUtils than me also review this > patch and make corrections if necessary? > > Rob > > Robert Buels wrote: >> I think you're looking for the -db => 'nucgss' option. >> I'll add a better listing of this (undocumented) options to the >> Bio::DB::Query::GenBank docs. >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 22:08:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:08:37 -0500 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO In-Reply-To: References: Message-ID: <65CC2787-7F0A-43C1-A840-554A2E4FD76A@illinois.edu> You should update to bioperl 1.6; I believe I fixed this issue after the 1.5.2 release. chris On Aug 13, 2009, at 2:41 PM, Jonathan Flowers wrote: > Hi, > > I am trying to parse BLAST reports written in XML using > Bio::SearchIO. When > running the following code on a set of reports (multiple query > results in a > single file), I only get one ResultI object. I tried running the > same code > on a file in 'blast' format and obtained the expected results (ie one > ResultI object for each query), suggesting that the issue is with > blastxml. > I found an old thread on this listserv where someone had had a similar > problem, but could not find how it was resolved. > > I am using Bioperl 1.5.2 and the XML reports were generated using > blastall > with the -m7 option. > > my $in = new Bio::SearchIO(-format => 'blastxml', -file => > 'blastreport.xml' ); > while( my $result = $in->next_result ) { > print $result->query_name,"\n"; > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #do something with hsp > } > } > } > > Thanks > > Jonathan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 22:18:57 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:18:57 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A848791.4010402@cornell.edu> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> <4A848791.4010402@cornell.edu> Message-ID: On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > Jonny Dalzell wrote: >> Is it ridiculous of me to expect ubuntu to take care of this for >> me? How do >> I go about compiling the HMM? > Yes. This is a very specialized thing that you're doing, and Ubuntu > does not have the resources to package every single thing. > > Unfortunately, it looks like bioperl-ext package is not installable > under Ubuntu 9.04 anyway, which is what I'm running. For others on > this list, if somebody is interested in doing maintaining it, I'd be > happy to help out by testing on Debian-based Linux platforms. We > need to clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that bioperl- > ext be removed from distribution. It's not in anybody's interest to > have unmaintained software out there causing confusion. I have cc'd Yee Man Chan for this. If there isn't a response or the message bounces, we do one of two things: 1) consider it deprecated (probably safest). 2) spin it out into a separate module. Just tried to comile it myself and am getting errors (using 64bit perl 5.10), so I think, unless someone wants to take this on, option #1 is best. > So Jonny, in short, I would say "do not use bioperl-ext". In general, that's a safe bet. We're moving most of our C/C++ bindings to BioLib. > Step back. What are you trying to accomplish? Chris already > recommended some alternative methods in his email of 8/11 on this > subject. Perhaps we can guide you to some software that is actively > maintained and will meet your needs. > > Rob Exactly. Lots of other (better supported!) options out there. HMMER, SeqAn, and others. chris From cjfields at illinois.edu Fri Aug 14 00:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 19:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <650586.94518.qm@web30407.mail.mud.yahoo.com> References: <650586.94518.qm@web30407.mail.mud.yahoo.com> Message-ID: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> (just to point out to everyone, Yee Man's contact information was in the POD) Yee Man, I have the output in the below link: http://gist.github.com/167542 There are similar problems popping up on 32- and 64-bit perl 5.10.0, Mac OS X 10.5. Haven't had time to debug it unfortunately. I think we should seriously consider spinning this code off into it's own distribution for CPAN. It's unfortunately bit-rotting away in bioperl-ext. If you want to continue supporting it I can help set that up. chris On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > Hi > > So is this an HMM only problem? Or does it apply to other bioperl- > ext modules? > > What exactly are the compilation errors for HMM? I believe my > implementation is just a simple one based on Rabiner's paper. > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > ~murphyk%2FBayes > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > I don't think I did anything fancy that makes it machine > dependent or non-ANSI C. > > Yee Man > > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Jonny Dalzell" , "BioPerl List" > >, "Yee Man Chan" >> Date: Thursday, August 13, 2009, 3:18 PM >> >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >> >>> Jonny Dalzell wrote: >>>> Is it ridiculous of me to expect ubuntu to take >> care of this for me? How do >>>> I go about compiling the HMM? >>> Yes. This is a very specialized thing that >> you're doing, and Ubuntu does not have the resources to >> package every single thing. >>> >>> Unfortunately, it looks like bioperl-ext package is >> not installable under Ubuntu 9.04 anyway, which is what I'm >> running. For others on this list, if somebody is >> interested in doing maintaining it, I'd be happy to help out >> by testing on Debian-based Linux platforms. We need to >> clarify this package's maintenance status: if there is >> nobody interested in maintaining it, I would recommend that >> bioperl-ext be removed from distribution. It's not in >> anybody's interest to have unmaintained software out there >> causing confusion. >> >> I have cc'd Yee Man Chan for this. If there isn't a >> response or the message bounces, we do one of two things: >> >> 1) consider it deprecated (probably safest). >> 2) spin it out into a separate module. >> >> Just tried to comile it myself and am getting errors (using >> 64bit perl 5.10), so I think, unless someone wants to take >> this on, option #1 is best. >> >>> So Jonny, in short, I would say "do not use >> bioperl-ext". >> >> In general, that's a safe bet. We're moving most of >> our C/C++ bindings to BioLib. >> >>> Step back. What are you trying to >> accomplish? Chris already recommended some alternative >> methods in his email of 8/11 on this subject. Perhaps >> we can guide you to some software that is actively >> maintained and will meet your needs. >>> >>> Rob >> >> Exactly. Lots of other (better supported!) options >> out there. HMMER, SeqAn, and others. >> >> chris >> > > > From ymc at yahoo.com Thu Aug 13 23:58:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 16:58:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <650586.94518.qm@web30407.mail.mud.yahoo.com> Hi So is this an HMM only problem? Or does it apply to other bioperl-ext modules? What exactly are the compilation errors for HMM? I believe my implementation is just a simple one based on Rabiner's paper. http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg I don't think I did anything fancy that makes it machine dependent or non-ANSI C. Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Jonny Dalzell" , "BioPerl List" , "Yee Man Chan" > Date: Thursday, August 13, 2009, 3:18 PM > > On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > > > Jonny Dalzell wrote: > >> Is it ridiculous of me to expect ubuntu to take > care of this for me?? How do > >> I go about compiling the HMM? > > Yes.? This is a very specialized thing that > you're doing, and Ubuntu does not have the resources to > package every single thing. > > > > Unfortunately, it looks like bioperl-ext package is > not installable under Ubuntu 9.04 anyway, which is what I'm > running.? For others on this list, if somebody is > interested in doing maintaining it, I'd be happy to help out > by testing on Debian-based Linux platforms.? We need to > clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that > bioperl-ext be removed from distribution.? It's not in > anybody's interest to have unmaintained software out there > causing confusion. > > I have cc'd Yee Man Chan for this.? If there isn't a > response or the message bounces, we do one of two things: > > 1) consider it deprecated (probably safest). > 2) spin it out into a separate module. > > Just tried to comile it myself and am getting errors (using > 64bit perl 5.10), so I think, unless someone wants to take > this on, option #1 is best. > > > So Jonny, in short, I would say "do not use > bioperl-ext". > > In general, that's a safe bet.? We're moving most of > our C/C++ bindings to BioLib. > > > Step back.? What are you trying to > accomplish?? Chris already recommended some alternative > methods in his email of 8/11 on this subject.? Perhaps > we can guide you to some software that is actively > maintained and will meet your needs. > > > > Rob > > Exactly.? Lots of other (better supported!) options > out there.? HMMER, SeqAn, and others. > > chris > From agulyaskov at mail.rockefeller.edu Fri Aug 14 00:40:22 2009 From: agulyaskov at mail.rockefeller.edu (Attila Gulyas-Kovacs) Date: Thu, 13 Aug 2009 20:40:22 -0400 Subject: [Bioperl-l] bus error when indexing large file Message-ID: <4A84B276.2040706@mail.rockefeller.edu> Dear all, I can index the SwissProt database without problem but I get bus error when I try to index the much larger TrEMBL database. Indexing failed with both the swissprot and fasta format (using Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke up TrEMBL into multiple files ('chunks'), about the size of the SwissProt database. Then I could could create separate indeces for each chunk. But I got bus error when I passed all chunks simultaneously to my script (below) to create a single index. Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. What do you suggest? Attila #! /usr/bin/perl use warnings; use strict; use Bio::Index::Swissprot; my $index_file_name = shift; my $inx = Bio::Index::Swissprot->new( -filename => $index_file_name, -write_flag => 1); $inx->make_index(@ARGV); -- Attila Gulyas-Kovacs Postdoctoral Associate Rockefeller University Gadsby Lab (Cardiac/Membrane Physiology) D.W. Bronk Building, Room 307 1230 York Avenue New York, NY, 10065 Tel: (212)327-8617 Fax: (212)327-7589 From ymc at yahoo.com Fri Aug 14 04:15:41 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 21:15:41 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> Message-ID: <528790.13637.qm@web30404.mail.mud.yahoo.com> Hi all Based on my understanding of the warning messages, the problem seems to come from the "typemap" file when I cast the return from SvIV from an integer to a pointer. I suppose this might cause problems in 64-bit machines. But when I look at perlguts and perlxs, it does seem to me that the way I did in typemap is the suggested way to do it because the IV type is "guaranteed to be big enough to hold a pointer". Nevertheless, I modified my typemap file to look exactly like what's in perlxs. (See PS) Does anyone know how to deal with this problem? Or can anyone of you give me access to a 64-bit machine to sort this out? Thank you! Yee Man PS This is a typemap file using exactly the same lines suggested by perlxs. It works in my 32-bit machine. Can someone try it on a 64-bit machine? Thanks ================================================ TYPEMAP HMM * T_HMM INPUT T_HMM if (sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG)) $var = ($type)SvIV((SV*)SvRV( $arg )); else{ warn( \"${Package}::$func_name() -- $var is not a blessed SV referenc e\" ); XSRETURN_UNDEF; } OUTPUT T_HMM sv_setref_pv($arg, "Bio::Ext::HMM::HMM", (void*) $var); ======================================================== --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > From ymc at yahoo.com Fri Aug 14 08:27:11 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 01:27:11 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <168012.97676.qm@web30405.mail.mud.yahoo.com> Ah.. I find that the typemap can become as simple as this ===================== TYPEMAP HMM * T_PTROBJ ===================== Then the generated HMM.c will have a function called INT2PTR to do the pointer conversion. I believe this should solve the warnings. Attached are the updated HMM.xs and typemap. Can someone with a 64-bit machine give it a try? Thank you Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5588 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: typemap Type: application/octet-stream Size: 26 bytes Desc: not available URL: From cjfields at illinois.edu Fri Aug 14 14:20:21 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:20:21 -0500 Subject: [Bioperl-l] bus error when indexing large file In-Reply-To: <4A84B276.2040706@mail.rockefeller.edu> References: <4A84B276.2040706@mail.rockefeller.edu> Message-ID: I can attempt to reproduce this (I have very similar specs). I'm wondering if it has something to do with large file support. Have you tried the perl packaged with Mac OS X? I think it's perl 5.8.8. chris On Aug 13, 2009, at 7:40 PM, Attila Gulyas-Kovacs wrote: > Dear all, > > I can index the SwissProt database without problem but I get bus > error when I try to index the much larger TrEMBL database. Indexing > failed with both the swissprot and fasta format (using > Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke > up TrEMBL into multiple files ('chunks'), about the size of the > SwissProt database. Then I could could create separate indeces for > each chunk. But I got bus error when I passed all chunks > simultaneously to my script (below) to create a single index. > Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. > > What do you suggest? > > Attila > > > #! /usr/bin/perl > use warnings; > use strict; > use Bio::Index::Swissprot; > my $index_file_name = shift; > my $inx = Bio::Index::Swissprot->new( > -filename => $index_file_name, > -write_flag => 1); > $inx->make_index(@ARGV); > > -- > Attila Gulyas-Kovacs > Postdoctoral Associate > > Rockefeller University > Gadsby Lab (Cardiac/Membrane Physiology) > D.W. Bronk Building, Room 307 1230 York Avenue > New York, NY, 10065 > Tel: (212)327-8617 > Fax: (212)327-7589 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Aug 14 14:10:33 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 16:10:33 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence Message-ID: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Hi everyone, I'm using Bio::AlignIO to read in a series of multiple alignments. Occasionally, an alignment will have a sequence which consists entirely of gaps (these are actually trimmed sub-alignments; that's why). Each time I read in such an alignment, an error will be raised when the Bio::LocatableSeq object is created for the all-gap sequence (actually, the error comes from the superclass Bio::PrimarySeq). To my way of thinking, an alignment is not invalid if it contains such all-gap sequences, so there shouldn't be an error. This could be done by having Bio::AlignIO::* passing the -nowarnonempty flag when creating the sequence objects. Any thoughts on this? Is there a better way to suppress the warning than changing the behavior of all the AlignIO modules? Dave From cjfields at illinois.edu Fri Aug 14 14:42:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:42:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Dave, Is this using bioperl-live? I recall this being a problem but I thought it was addressed in svn (and soon in the next point release). chris On Aug 14, 2009, at 9:10 AM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists > entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when > the > Bio::LocatableSeq object is created for the all-gap sequence > (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be > done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating > the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning > than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Fri Aug 14 14:44:42 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 14 Aug 2009 16:44:42 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <716af09c0908140744i4447dffg205ec07daeaaa571@mail.gmail.com> Hi Dave, I have observed the same (with bioperl 1.52) for the same reason. It would be nice not to have these errors as also in my view an all-gaps sequence is a sequence. I also found that sometimes parsing such alignments fails when the all-gaps sequence is the last in the alignment (bug 2744, in Bio::LocatableSeq). Regards, Bernd On Fri, Aug 14, 2009 at 4:10 PM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when the > Bio::LocatableSeq object is created for the all-gap sequence (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Aug 14 15:12:35 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 17:12:35 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Message-ID: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> > > Is this using bioperl-live? Sorry, should've said before. Yes, it's bioperl-live (r15927). I recall this being a problem but I thought it was addressed in svn (and > soon in the next point release). Hmm, the only recent somewhat related change I see (in Bio::AlignIO::*, anyway) is: ------------------------------------------------------------------------ r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 lines deprecate no_sequences/no_residues in main trunk (we can switch the version to 1.7 if deemed necessary) ------------------------------------------------------------------------ Perhaps this is what you were thinking of? Dave From cjfields at illinois.edu Fri Aug 14 15:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <168012.97676.qm@web30405.mail.mud.yahoo.com> References: <168012.97676.qm@web30405.mail.mud.yahoo.com> Message-ID: Yee Man, I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 64-bit) and on dev.open-bio.org (which is perl 5.8.8, appears to be 32-bit). The patch results in cleaning up warnings for 5.10.0 but results in similar warnings for 5.8.8 (linux or OS X). On OS X perl 5.8.8, this sometimes passes (note the first attempt fails, the second succeeds), so it's not entirely a 32-bit issue: http://gist.github.com/167860 OS X and perl 5.10.0, this always fails as the previous gist shows, but demonstrates similar behavior (multiple attempts to test get different responses): http://gist.github.com/167542 On linux, everything passes with or w/o the patched files (patched files have warnings as indicated above): Specs for all three perl executables (they vary a bit): http://gist.github.com/167883 chris On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > Ah.. I find that the typemap can become as simple as this > ===================== > TYPEMAP > HMM * T_PTROBJ > ===================== > > Then the generated HMM.c will have a function called INT2PTR to do > the pointer conversion. I believe this should solve the warnings. > > Attached are the updated HMM.xs and typemap. Can someone with a 64- > bit machine give it a try? > > Thank you > Yee Man > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" > >, "BioPerl List" >> Date: Thursday, August 13, 2009, 5:31 PM >> (just to point out to everyone, Yee >> Man's contact information was in the POD) >> >> Yee Man, >> >> I have the output in the below link: >> >> http://gist.github.com/167542 >> >> There are similar problems popping up on 32- and 64-bit >> perl 5.10.0, Mac OS X 10.5. Haven't had time to debug >> it unfortunately. >> >> I think we should seriously consider spinning this code off >> into it's own distribution for CPAN. It's >> unfortunately bit-rotting away in bioperl-ext. If you >> want to continue supporting it I can help set that up. >> >> chris >> >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >> >>> Hi >>> >>> So is this an HMM only problem? Or does >> it apply to other bioperl-ext modules? >>> >>> What exactly are the compilation errors >> for HMM? I believe my implementation is just a simple one >> based on Rabiner's paper. >>> >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>> ~murphyk%2FBayes >>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>> >>> I don't think I did anything fancy that >> makes it machine dependent or non-ANSI C. >>> >>> Yee Man >>> >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Jonny Dalzell" , >> "BioPerl List" , >> "Yee Man Chan" >>>> Date: Thursday, August 13, 2009, 3:18 PM >>>> >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >>>> >>>>> Jonny Dalzell wrote: >>>>>> Is it ridiculous of me to expect ubuntu to >> take >>>> care of this for me? How do >>>>>> I go about compiling the HMM? >>>>> Yes. This is a very specialized thing >> that >>>> you're doing, and Ubuntu does not have the >> resources to >>>> package every single thing. >>>>> >>>>> Unfortunately, it looks like bioperl-ext >> package is >>>> not installable under Ubuntu 9.04 anyway, which is >> what I'm >>>> running. For others on this list, if >> somebody is >>>> interested in doing maintaining it, I'd be happy >> to help out >>>> by testing on Debian-based Linux platforms. >> We need to >>>> clarify this package's maintenance status: if >> there is >>>> nobody interested in maintaining it, I would >> recommend that >>>> bioperl-ext be removed from distribution. >> It's not in >>>> anybody's interest to have unmaintained software >> out there >>>> causing confusion. >>>> >>>> I have cc'd Yee Man Chan for this. If there >> isn't a >>>> response or the message bounces, we do one of two >> things: >>>> >>>> 1) consider it deprecated (probably safest). >>>> 2) spin it out into a separate module. >>>> >>>> Just tried to comile it myself and am getting >> errors (using >>>> 64bit perl 5.10), so I think, unless someone wants >> to take >>>> this on, option #1 is best. >>>> >>>>> So Jonny, in short, I would say "do not use >>>> bioperl-ext". >>>> >>>> In general, that's a safe bet. We're moving >> most of >>>> our C/C++ bindings to BioLib. >>>> >>>>> Step back. What are you trying to >>>> accomplish? Chris already recommended some >> alternative >>>> methods in his email of 8/11 on this >> subject. Perhaps >>>> we can guide you to some software that is >> actively >>>> maintained and will meet your needs. >>>>> >>>>> Rob >>>> >>>> Exactly. Lots of other (better supported!) >> options >>>> out there. HMMER, SeqAn, and others. >>>> >>>> chris >>>> >>> >>> >>> >> >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Aug 14 15:53:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:53:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> Message-ID: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> On Aug 14, 2009, at 10:12 AM, Dave Messina wrote: > Is this using bioperl-live? > > Sorry, should've said before. Yes, it's bioperl-live (r15927). > > > I recall this being a problem but I thought it was addressed in svn > (and soon in the next point release). > > Hmm, the only recent somewhat related change I see (in > Bio::AlignIO::*, anyway) is: > > ------------------------------------------------------------------------ > r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 > lines > > deprecate no_sequences/no_residues in main trunk (we can switch the > version to 1.7 if deemed necessary) > ------------------------------------------------------------------------ > > > Perhaps this is what you were thinking of? > > Dave Maybe not, then (for some reason I thought this was fixed within LocatableSeq). I know that it is possible to have an all-gap LocatableSeq; this works, but the default start/end/length aren't correct, which is part of Bernd's bug: use Modern::Perl; use Bio::LocatableSeq; my $seq = Bio::LocatableSeq->new( -seq => '-------------', -alphabet => 'dna', ); say $seq->start; # 1 say $seq->end; # undef (?) say $seq->length; # 13, counts the gaps The problem is, to fix all this relies on a whole slew of refactors for LocatableSeq and SimpleAlign. Some of this touches root components as well, so it'll need to be tried on a branch and will very likely result in some API changes (and thus may not be included in 1.6). I'll start a branch to get the process started. chris From jncline at gmail.com Fri Aug 14 19:41:21 2009 From: jncline at gmail.com (Jonathan Cline) Date: Fri, 14 Aug 2009 14:41:21 -0500 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: <99E27D08408340B9B0611751A17DF266@NewLife> References: <99E27D08408340B9B0611751A17DF266@NewLife> Message-ID: <4A85BDE1.5020002@gmail.com> Mark A. Jensen wrote: > Sorry, I cut off the last script. The entire thing follows: > This is exactly what I was looking for - thanks. A method to modify Makefile.PL, install in Activestate, etc is great. Perhaps your method could also be improved for portability by using `cygpath` although few cygwin installs modify this beyond the default (to get rid of hardcoded "/cygdrive/x/"). I will definitely save your code for later. I've implemented another workaround, which is to use Win32::Pipe and other Win32:: methods. This has problems of it's own (support is not 100%) and error-free implementation not as easy as requiring Activestate Perl, however it should work with both Activestate and cygwin-perl (and Unix). ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## > ----- Original Message ----- From: "Jonathan Cline" > To: > Cc: > Sent: Friday, July 31, 2009 11:24 PM > Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl > > >> I recently mentioned working on Bio::Robotics for Tecan. Vendors >> being MS-Win specific, the vendor software allows third-party software >> communication through a named pipe (the literal filename is >> "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific >> and this pseudo-pipe is opened with sysopen() ). This is broken under >> cygwin-perl due to cygwin's method of handling paths -- the sysopen >> fails. However it works under ActiveState Perl and communication >> through the named pipe (to the robot hardware) is OK. The standard >> workaround is usually to use cygwin bash, and force the PATH to use >> ActiveState perl. (Typical MS Windows incompatibility problem.) The >> issue is: Perl module libraries for CPAN work under cygwin-perl >> (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN >> module use, or "make test", result in a bad list of incompatibility >> problems. Yet ActiveState Perl is required for communicating to the >> vendor application (unless there is some workaround to raw filesystem >> access in cygwin-perl that I haven't found in 2 days of working this). >> The stand-alone scripts I have work fine to access the named pipe >> (using ActiveState Perl) since the standalone scripts have no module >> INC dependencies, no CPAN module test harness, etc etc. >> >> This isn't specifically a Bio:: issue, though if anyone has >> suggestions please email. I could try msys and see if it handles the >> named-pipe-special-file better, if msys has an msys-perl distribution. >> >> -- >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Fri Aug 14 23:29:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 18:29:43 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring Message-ID: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> As we have pretty much everything in place for another point release (which I will start merging over this weekend into the 1.6 branch), I have gone ahead and made two branches for refactoring some of the more important pieces of bioperl code. Both refactors may require API changes; if so these will be part of a 1.7 release. 1) GFF - entail refactoring bioperl code to better handle GFF2/3. This is a large section of code, so small incremental changes may be merged to trunk over time (and thus may involve several branches). Included is refactoring of feature typing to be more consistent and lightweight, and will initially involve Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be deprecated in the process). See the following for additional details: http://www.bioperl.org/wiki/GFF_Refactor 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to address significant bugs but will also entail cleaning up SimpleAlign methods (factoring out more utility-like methods into Bio::Align::AlignUtils or similar). This also may involve several branches. See the following for additional details: http://www.bioperl.org/wiki/Align_Refactor Any help/suggestions for the above two would be greatly appreciated! Robert Buels may be heading up the initial FeatureIO work; I will likely start on LocatableSeq/Align (Mark, wanna help?). chris From maj at fortinbras.us Fri Aug 14 23:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 19:45:01 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Hey Chris et al, I'm there on LocatableSeq, definitely. I do have one project to finish this weekend before I move to that: I'm planning to move Chase Miller's excellent NeXML read/write implementation into the trunk, complete with tests. If we can get it to pass the test suite, is there room in the point release for it? MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, August 14, 2009 7:29 PM Subject: [Bioperl-l] GFF and LocatableSeq refactoring > As we have pretty much everything in place for another point release > (which I will start merging over this weekend into the 1.6 branch), I > have gone ahead and made two branches for refactoring some of the more > important pieces of bioperl code. Both refactors may require API > changes; if so these will be part of a 1.7 release. > > 1) GFF - entail refactoring bioperl code to better handle GFF2/3. > > This is a large section of code, so small incremental changes may be > merged to trunk over time (and thus may involve several branches). > Included is refactoring of feature typing to be more consistent and > lightweight, and will initially involve Bio::FeatureIO and > Bio::SeqFeature::Annotated (which may be deprecated in the process). > See the following for additional details: > > http://www.bioperl.org/wiki/GFF_Refactor > > 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI > (SimpleAlign) and LocatableSeq. This is primarily to address > significant bugs but will also entail cleaning up SimpleAlign methods > (factoring out more utility-like methods into Bio::Align::AlignUtils > or similar). This also may involve several branches. See the > following for additional details: > > http://www.bioperl.org/wiki/Align_Refactor > > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will > likely start on LocatableSeq/Align (Mark, wanna help?). > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Fri Aug 14 23:50:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 Aug 2009 16:50:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <4A85F83A.30800@cornell.edu> Chris Fields wrote: > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will likely > start on LocatableSeq/Align (Mark, wanna help?). Sure, I'll head up the gff_refactor branch work. If you're interested in what changes are being planned for Bio::SeqFeature::*, Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the implementation plan Chris and I developed just now on IRC, which is at http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan Now soliciting suggestions, comments, and assistance. Rob From cjfields at illinois.edu Sat Aug 15 01:03:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 20:03:41 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Mark, re: NeXML, yes, of course. There'll be an alpha release or two prior to core 1.6.1 (I need to test the Build.PL/Bio::Root::Build changes Sendu added in). chris On Aug 14, 2009, at 6:45 PM, Mark A. Jensen wrote: > Hey Chris et al, I'm there on LocatableSeq, definitely. I do have > one project to finish this weekend before I move to that: I'm > planning to move Chase Miller's > excellent NeXML read/write implementation into the trunk, complete > with tests. If we can get it to pass the test suite, is there room > in the point release for it? > MAJ > ----- Original Message ----- From: "Chris Fields" > > To: "BioPerl List" > Sent: Friday, August 14, 2009 7:29 PM > Subject: [Bioperl-l] GFF and LocatableSeq refactoring > > >> As we have pretty much everything in place for another point >> release (which I will start merging over this weekend into the 1.6 >> branch), I have gone ahead and made two branches for refactoring >> some of the more important pieces of bioperl code. Both refactors >> may require API changes; if so these will be part of a 1.7 release. >> 1) GFF - entail refactoring bioperl code to better handle GFF2/3. >> This is a large section of code, so small incremental changes may >> be merged to trunk over time (and thus may involve several >> branches). Included is refactoring of feature typing to be more >> consistent and lightweight, and will initially involve >> Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be >> deprecated in the process). See the following for additional >> details: >> http://www.bioperl.org/wiki/GFF_Refactor >> 2) Align/LocatableSeq - dealing with inconsistencies in >> Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to >> address significant bugs but will also entail cleaning up >> SimpleAlign methods (factoring out more utility-like methods into >> Bio::Align::AlignUtils or similar). This also may involve several >> branches. See the following for additional details: >> http://www.bioperl.org/wiki/Align_Refactor >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Sat Aug 15 02:32:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 22:32:01 -0400 Subject: [Bioperl-l] on BP documentation Message-ID: <1F899AA92F94415186CB0B25306F1114@NewLife> Hi All -- Off-list, an old colleague of mine had this insightful, if damning, comment: >I guess that from my perspective, after doing this stuff for >about 10 years, I personally would prefer to see a "summer of >documentation" for the bio* languages (or at least bioperl, as that is >the only one I ever look at). From my own experiences, and from those >of many colleagues, the documentation for bioperl has gone from >mediocre to quite poor in the last few years. I largely think the >wikification of the docs are to blame for this. Even SeqIO is hard >to figure out now--it took me an hour the other day to figure out that >"desc" returns the full Fasta header, and I had to get that from the >module code + trial-and-error, instead of the online docs. There is >far too much inside baseball going on in the documentation scheme. >So I worry more about the constant adding of features at the expense >of documenting what is already there. This is just my 2 cents, and it >is disappointing to see a downward trend for bioperl in this regard. I would be really interested in all responses from the list users. I must agree that BP docs are rather a rat's nest and of varying quality, but taken in toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount of useful and sophisticated information available. I think there are approaches we can take to reorganize and standardize the accession of it to make it more useful and inviting. I disagree with my pal about the wikification, but I wager that the power of the wiki could be leveraged to greater advantage (right, Dan?). I think that what we all as developers love is to code, and detest is to document. Since BP is all-volunteer, and volunteers tend to do what they like -- the beauty of open source, btw -- documentation reorg and cleanup probably must devolve to the Core. I am willing to lead such an effort, which will take some time, and more time the fewer volunteers there are. First let's hear some thoughts, and 'let it all hang out', as they said in my mom's era. cheers Mark From cjfields at illinois.edu Sat Aug 15 03:41:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 22:41:10 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> On Aug 14, 2009, at 9:32 PM, Mark A. Jensen wrote: > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >> I guess that from my perspective, after doing this stuff for >> about 10 years, I personally would prefer to see a "summer of >> documentation" for the bio* languages (or at least bioperl, as that >> is >> the only one I ever look at). From my own experiences, and from >> those >> of many colleagues, the documentation for bioperl has gone from >> mediocre to quite poor in the last few years. I largely think the >> wikification of the docs are to blame for this. Even SeqIO is hard >> to figure out now--it took me an hour the other day to figure out >> that >> "desc" returns the full Fasta header, and I had to get that from the >> module code + trial-and-error, instead of the online docs. There is >> far too much inside baseball going on in the documentation scheme. > >> So I worry more about the constant adding of features at the expense >> of documenting what is already there. This is just my 2 cents, and >> it >> is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I > must agree > that BP docs are rather a rat's nest and of varying quality, but > taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal > about the > wikification, but I wager that the power of the wiki could be > leveraged > to greater advantage (right, Dan?). To me good documentation should be a combination of both wiki docs (HOWTOs, scraps, cookbook-y code) and inline POD. We can't forsake one for the other. If I had a preference, I would take more up-to- date POD over wiki (maybe adding a Status: for the methods), but a good HOWTO goes a long way in helping. It's just too hard to cover every use case. It's unfortunate that documentation is very poor for many modules, but at the same time it's also exceptionally hard to write documentation for modules one has had no part in developing. I think this is the main reason the docs are in the state they are in (not to point the finger of blame at anyone, I'm just as much to blame). > I think that what we all as developers love is to code, and detest > is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it > all hang out', > as they said in my mom's era. > > cheers > Mark Two things: 1) Take advantage of the proposed restructuring effort (as well as some of the refactoring are doing) to add decent documentation where possible. This means updating method docs and updating the HOWTO's as needed, or adding new HOWTO's (Jason has indicated this in the past). 2) Pinpoint areas where docs are desperately needed first. Other wiki docs could also use updating. As an example, the above author's question on FASTA and desc() is actually answered in the FAQ, but the question doesn't make it easy to find: http://www.bioperl.org/wiki/FAQ#I_would_like_to_make_my_own_custom_fasta_header_-_how_do_I_do_this.3F chris From David.Messina at sbc.su.se Sat Aug 15 07:49:59 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 09:49:59 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <628aabb70908150049h64f83b8ewb30d916f0534e40d@mail.gmail.com> > > To me good documentation should be a combination of both wiki docs (HOWTOs, > scraps, cookbook-y code) and inline POD. We can't forsake one for the > other. > I think this notion is already kinda there de facto (inside baseball? :)), but perhaps we should make clear the idea that: - POD is the reference manual, with each method's capabilities described comprehensively and in detail. - The wiki is tutorials (bptutorial, Jason's slides), use cases (HOWTOs and Scrapbook), and FAQ And actually all the POD is accessible online from the wiki at doc.bioperl.org, too (although maybe a little hard to find -- it's under Developer--API Docs). > If I had a preference, I would take more up-to-date POD over wiki (maybe > adding a Status: for the methods), but a good HOWTO goes a long way in > helping. It's just too hard to cover every use case. > I'd agree with this, too, partly because I think the HOWTOs are in pretty good shape, covering the most common stuff pretty well, and partly because I think the reference manual has to be complete, both for a user coming to find out how to use it and for authors ensuring that their internal model of how the code works actually hangs together. Mark, one attack point for a documentation improvement effort would be to take a survey of the PODs and see how well they are fulfilling the role of a reference manual. But part of a good reference manual is knowing how to find what you're looking for, and indeed I think that's maybe the main overall problem with trying to document anything as big and complicated as BioPerl. So for me, the organization of our copious docs might benefit from some attention. The goal of providing a way to find information better handled by the wiki, which does searching and crossreferencing much better than POD. To take your friend's FASTA header example, I might expect to be able to search for 'FASTA' or 'FASTA header' on the wiki and find something which guides me to the answer. A search for 'FASTA' gives a list of pointers, including the 'FASTA sequence format' page. That page almost gives the right answer (see the Note section), but perhaps it might be a nice place to say that in BioPerl, a FASTA sequence is a Bio::Seq, and that the header is $seq->desc and the seq is $seq->seq. And there could be an equivalent page for the other common formats, breaking down how the format maps to an object. [...] it's also exceptionally hard to write documentation for modules one > has had no part in developing. I think this is the main reason the docs are > in the state they are in (not to point the finger of blame at anyone, I'm > just as much to blame). Absolutely, and maybe a first step would be to contact the authors of a module with out-of-date docs and ask for them to fix it, in the same way one would go to the author with a bug in their code. Core+volunteers will certainly be needed for organizing the effort and assessing the state of BioPerl documentation as a whole, but give authors the opportunity to take care of their code, too. Two things: > > 1) Take advantage of the proposed restructuring effort (as well as some of > the refactoring are doing) to add decent documentation where possible. This > means updating method docs and updating the HOWTO's as needed, or adding new > HOWTO's (Jason has indicated this in the past). > This is a great idea. > 2) Pinpoint areas where docs are desperately needed first. > > Other wiki docs could also use updating. As an example, the above author's > question on FASTA and desc() is actually answered in the FAQ, Absolutely. Maybe some of the FAQs could actually be added back to the relevant PODs? Dave From David.Messina at sbc.su.se Sat Aug 15 08:00:50 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 10:00:50 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> Message-ID: <628aabb70908150100ka8c21aahe2bf7d636fa94112@mail.gmail.com> > > I know that it is possible to have an all-gap LocatableSeq You can, but to avoid the "can't guess alphabet" error I'm getting you have to set the alphabet manually (which AlignIO does not). I'll start a branch to get the process started. Terrific! In the meantime, then, I'll just use the -nowarnonempty workaround in my local copy of AlignIO. Dave From bernd.web at gmail.com Sat Aug 15 11:17:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Sat, 15 Aug 2009 13:17:44 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Hi >>? Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. I was a bit surprised about $seq->desc retrieving the entire FASTA header line Actually, in Bioperl 1.52 at least $seq->desc returns the description only, so without the ID. Thus, to get the entire FASTA header line $seq->id . " " $seq->desc would be needed. For the modules I use (mainly related to sequences, such as SeqIO, SimpleAlign), I'd be happy to contribute on docs, checking docs, or examples. Regards, Bernd From sanjaysingh765 at gmail.com Sat Aug 15 13:38:18 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Sat, 15 Aug 2009 19:08:18 +0530 Subject: [Bioperl-l] BLINK PARSER Message-ID: Hi, I want to submit query to NCBI'S BLINK and parsed the result for the best hit. is there anyone have script to do so.i would be very grateful if someone would like to share it with me. regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From jimhu at tamu.edu Sat Aug 15 15:01:15 2009 From: jimhu at tamu.edu (Jim Hu) Date: Sat, 15 Aug 2009 10:01:15 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? Message-ID: Over on the Gbrowse list, Don Gilbert explained to me why genbank2gff3.pl is having problems with prokaryotic genomes. Has anyone written an alternative? Jim Hu ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Sat Aug 15 15:27:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:27:01 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: References: Message-ID: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> We (bioperl devs and users) would be very interested to have something like this included. I ran into a similar problem with genbank2gff3 a year ago with some of our work here on Archaea. I managed to get enough data out to get gbrowse up-and-running, but it required quite a bit of hand-editing. In fact, seeing as we're refactoring GFF and other aspects of Features in bioperl, this may be the best time to add something in. chris On Aug 15, 2009, at 10:01 AM, Jim Hu wrote: > Over on the Gbrowse list, Don Gilbert explained to me why > genbank2gff3.pl is having problems with prokaryotic genomes. Has > anyone written an alternative? > > Jim Hu > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 15 15:55:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:55:44 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Message-ID: On Aug 15, 2009, at 6:17 AM, Bernd Web wrote: > Hi > >>> Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. > I was a bit surprised about $seq->desc retrieving the entire FASTA > header line > Actually, in Bioperl 1.52 at least $seq->desc returns the description > only, so without the ID. Thus, to get the entire FASTA header line > $seq->id . " " $seq->desc would be needed. Odd, not seeing where a change was made that would cause this behavior. Can you post an example? > For the modules I use (mainly related to sequences, such as SeqIO, > SimpleAlign), I'd be happy to contribute on docs, checking docs, or > examples. > > Regards, > Bernd Would be nice to have an Align/SimpleAlign HOWTO, but seeing as we want to refactor large chunks of that code, it might be slightly premature. That is, unless we want to document what behavior we expect to see as a sort of ROADMAP (maybe as part of the refactoring page). That could then be converted over to a HOWTO. Feel free to chip in on this in any way possible. The more documentation the better. chris From rmb32 at cornell.edu Sat Aug 15 16:44:03 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 09:44:03 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <85143.35343.qm@web30404.mail.mud.yahoo.com> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A86E5D3.3030906@cornell.edu> The usual procedure for developing code is to exchange code via commits to a version control system. Yee, do you know how to use Subversion? Does Yee need a commit bit? Rob Yee Man Chan wrote: > Hi Chris > > I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) > > Please let me know if it works for you. > > Sorry for the bug... > Yee Man > > --- On Fri, 8/14/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" >> Date: Friday, August 14, 2009, 8:31 AM >> Yee Man, >> >> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >> appears to be 32-bit). The patch results in cleaning >> up warnings for 5.10.0 but results in similar warnings for >> 5.8.8 (linux or OS X). >> >> On OS X perl 5.8.8, this sometimes passes (note the first >> attempt fails, the second succeeds), so it's not entirely a >> 32-bit issue: >> >> http://gist.github.com/167860 >> >> OS X and perl 5.10.0, this always fails as the previous >> gist shows, but demonstrates similar behavior (multiple >> attempts to test get different responses): >> >> http://gist.github.com/167542 >> >> On linux, everything passes with or w/o the patched files >> (patched files have warnings as indicated above): >> >> Specs for all three perl executables (they vary a bit): >> >> http://gist.github.com/167883 >> >> chris >> >> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >> >>> Ah.. I find that the typemap can become as simple as >> this >>> ===================== >>> TYPEMAP >>> HMM * T_PTROBJ >>> ===================== >>> >>> Then the generated HMM.c will have a function called >> INT2PTR to do the pointer conversion. I believe this should >> solve the warnings. >>> Attached are the updated HMM.xs and typemap. Can >> someone with a 64-bit machine give it a try? >>> Thank you >>> Yee Man >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>> Date: Thursday, August 13, 2009, 5:31 PM >>>> (just to point out to everyone, Yee >>>> Man's contact information was in the POD) >>>> >>>> Yee Man, >>>> >>>> I have the output in the below link: >>>> >>>> http://gist.github.com/167542 >>>> >>>> There are similar problems popping up on 32- and >> 64-bit >>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >> to debug >>>> it unfortunately. >>>> >>>> I think we should seriously consider spinning this >> code off >>>> into it's own distribution for CPAN. It's >>>> unfortunately bit-rotting away in >> bioperl-ext. If you >>>> want to continue supporting it I can help set that >> up. >>>> chris >>>> >>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>> >>>>> Hi >>>>> >>>>> So is this an HMM only >> problem? Or does >>>> it apply to other bioperl-ext modules? >>>>> What exactly are the >> compilation errors >>>> for HMM? I believe my implementation is just a >> simple one >>>> based on Rabiner's paper. >>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>> >>>>> I don't think I did >> anything fancy that >>>> makes it machine dependent or non-ANSI C. >>>>> Yee Man >>>>> >>>>> --- On Thu, 8/13/09, Chris Fields >>>> wrote: >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Jonny Dalzell" , >>>> "BioPerl List" , >>>> "Yee Man Chan" >>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>> >>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >> wrote: >>>>>>> Jonny Dalzell wrote: >>>>>>>> Is it ridiculous of me to expect >> ubuntu to >>>> take >>>>>> care of this for me? How do >>>>>>>> I go about compiling the HMM? >>>>>>> Yes. This is a very specialized >> thing >>>> that >>>>>> you're doing, and Ubuntu does not have >> the >>>> resources to >>>>>> package every single thing. >>>>>>> Unfortunately, it looks like >> bioperl-ext >>>> package is >>>>>> not installable under Ubuntu 9.04 anyway, >> which is >>>> what I'm >>>>>> running. For others on this list, >> if >>>> somebody is >>>>>> interested in doing maintaining it, I'd be >> happy >>>> to help out >>>>>> by testing on Debian-based Linux >> platforms. >>>> We need to >>>>>> clarify this package's maintenance status: >> if >>>> there is >>>>>> nobody interested in maintaining it, I >> would >>>> recommend that >>>>>> bioperl-ext be removed from distribution. >>>> It's not in >>>>>> anybody's interest to have unmaintained >> software >>>> out there >>>>>> causing confusion. >>>>>> >>>>>> I have cc'd Yee Man Chan for this. >> If there >>>> isn't a >>>>>> response or the message bounces, we do one >> of two >>>> things: >>>>>> 1) consider it deprecated (probably >> safest). >>>>>> 2) spin it out into a separate module. >>>>>> >>>>>> Just tried to comile it myself and am >> getting >>>> errors (using >>>>>> 64bit perl 5.10), so I think, unless >> someone wants >>>> to take >>>>>> this on, option #1 is best. >>>>>> >>>>>>> So Jonny, in short, I would say "do >> not use >>>>>> bioperl-ext". >>>>>> >>>>>> In general, that's a safe bet. We're >> moving >>>> most of >>>>>> our C/C++ bindings to BioLib. >>>>>> >>>>>>> Step back. What are you trying >> to >>>>>> accomplish? Chris already >> recommended some >>>> alternative >>>>>> methods in his email of 8/11 on this >>>> subject. Perhaps >>>>>> we can guide you to some software that is >>>> actively >>>>>> maintained and will meet your needs. >>>>>>> Rob >>>>>> Exactly. Lots of other (better >> supported!) >>>> options >>>>>> out there. HMMER, SeqAn, and >> others. >>>>>> chris >>>>>> >>>>> >>>>> >>>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj at fortinbras.us Sat Aug 15 17:40:26 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 13:40:26 -0400 Subject: [Bioperl-l] BLINK PARSER In-Reply-To: References: Message-ID: <34DBCBEA5E2D49A892E5077AA780BA4E@NewLife> Hi Sanjay- I'm not sure BioPerl has an interface specifically for BLINK (I will be corrected if I'm wrong, so stay tuned). If you can obtain the "raw" blast output for the protein you're interested in ( doing [BLINK] then [Other Views: BLAST] then [Format:Show: Alignment as Plain text] ) that text can be parsed using the Bio::SearchIO tools, and you can use Bio::Search::Tiling to obtain the 'best' hsps. This may not be too helpful, I'm afraid, but it is where I would start. Mark ----- Original Message ----- From: "sanjay singh" To: Sent: Saturday, August 15, 2009 9:38 AM Subject: [Bioperl-l] BLINK PARSER > Hi, > I want to submit query to NCBI'S BLINK and parsed the result for the best > hit. is there anyone have script to do so.i would be very grateful if > someone would like to share it with me. > regards > sanjay > > -- > Happy moments , praise God. > Difficult moments, seek God. > Quiet moments, worship God. > Painful moments, trust God. > Every moment, thank God > > Sanjay Kumar Singh > Bose Institute > 93\1,A.P.C.Road > Kolkata-700 009 > West Bengal > India > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 15 19:11:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 14:11:48 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A86E5D3.3030906@cornell.edu> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> <4A86E5D3.3030906@cornell.edu> Message-ID: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> I'm not sure, but it makes more sense to commit these changes directly. Yee, need us to set you up with a commit bit? If so, fill out the information on this page: http://www.bioperl.org/wiki/SVN_Account_Request and forward it to support at open-bio.org. I'll sponsor you. chris On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > The usual procedure for developing code is to exchange code via > commits to a version control system. Yee, do you know how to use > Subversion? Does Yee need a commit bit? > > Rob > > Yee Man Chan wrote: >> Hi Chris >> I find that there is a memory access bug in my code. Attached is >> the fixed HMM.xs. This file together with the simpler typemap >> should fix all problems. (I hope..) >> Please let me know if it works for you. >> Sorry for the bug... >> Yee Man >> --- On Fri, 8/14/09, Chris Fields wrote: >>> From: Chris Fields >>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >>> WinVista? >>> To: "Yee Man Chan" >>> Cc: "Robert Buels" , "Jonny Dalzell" >> >, "BioPerl List" >>> Date: Friday, August 14, 2009, 8:31 AM >>> Yee Man, >>> >>> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >>> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >>> appears to be 32-bit). The patch results in cleaning >>> up warnings for 5.10.0 but results in similar warnings for >>> 5.8.8 (linux or OS X). >>> >>> On OS X perl 5.8.8, this sometimes passes (note the first >>> attempt fails, the second succeeds), so it's not entirely a >>> 32-bit issue: >>> >>> http://gist.github.com/167860 >>> >>> OS X and perl 5.10.0, this always fails as the previous >>> gist shows, but demonstrates similar behavior (multiple >>> attempts to test get different responses): >>> >>> http://gist.github.com/167542 >>> >>> On linux, everything passes with or w/o the patched files >>> (patched files have warnings as indicated above): >>> >>> Specs for all three perl executables (they vary a bit): >>> >>> http://gist.github.com/167883 >>> >>> chris >>> >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >>> >>>> Ah.. I find that the typemap can become as simple as >>> this >>>> ===================== >>>> TYPEMAP >>>> HMM * T_PTROBJ >>>> ===================== >>>> >>>> Then the generated HMM.c will have a function called >>> INT2PTR to do the pointer conversion. I believe this should >>> solve the warnings. >>>> Attached are the updated HMM.xs and typemap. Can >>> someone with a 64-bit machine give it a try? >>>> Thank you >>>> Yee Man >>>> --- On Thu, 8/13/09, Chris Fields >>> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >>> package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >>> "Jonny Dalzell" , >>> "BioPerl List" >>>>> Date: Thursday, August 13, 2009, 5:31 PM >>>>> (just to point out to everyone, Yee >>>>> Man's contact information was in the POD) >>>>> >>>>> Yee Man, >>>>> >>>>> I have the output in the below link: >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> There are similar problems popping up on 32- and >>> 64-bit >>>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >>> to debug >>>>> it unfortunately. >>>>> >>>>> I think we should seriously consider spinning this >>> code off >>>>> into it's own distribution for CPAN. It's >>>>> unfortunately bit-rotting away in >>> bioperl-ext. If you >>>>> want to continue supporting it I can help set that >>> up. >>>>> chris >>>>> >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> So is this an HMM only >>> problem? Or does >>>>> it apply to other bioperl-ext modules? >>>>>> What exactly are the >>> compilation errors >>>>> for HMM? I believe my implementation is just a >>> simple one >>>>> based on Rabiner's paper. >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>> ~murphyk%2FBayes >>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>> >>>>>> I don't think I did >>> anything fancy that >>>>> makes it machine dependent or non-ANSI C. >>>>>> Yee Man >>>>>> >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >>> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Robert Buels" >>>>>>> Cc: "Jonny Dalzell" , >>>>> "BioPerl List" , >>>>> "Yee Man Chan" >>>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>>> >>>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >>> wrote: >>>>>>>> Jonny Dalzell wrote: >>>>>>>>> Is it ridiculous of me to expect >>> ubuntu to >>>>> take >>>>>>> care of this for me? How do >>>>>>>>> I go about compiling the HMM? >>>>>>>> Yes. This is a very specialized >>> thing >>>>> that >>>>>>> you're doing, and Ubuntu does not have >>> the >>>>> resources to >>>>>>> package every single thing. >>>>>>>> Unfortunately, it looks like >>> bioperl-ext >>>>> package is >>>>>>> not installable under Ubuntu 9.04 anyway, >>> which is >>>>> what I'm >>>>>>> running. For others on this list, >>> if >>>>> somebody is >>>>>>> interested in doing maintaining it, I'd be >>> happy >>>>> to help out >>>>>>> by testing on Debian-based Linux >>> platforms. >>>>> We need to >>>>>>> clarify this package's maintenance status: >>> if >>>>> there is >>>>>>> nobody interested in maintaining it, I >>> would >>>>> recommend that >>>>>>> bioperl-ext be removed from distribution. >>>>> It's not in >>>>>>> anybody's interest to have unmaintained >>> software >>>>> out there >>>>>>> causing confusion. >>>>>>> >>>>>>> I have cc'd Yee Man Chan for this. >>> If there >>>>> isn't a >>>>>>> response or the message bounces, we do one >>> of two >>>>> things: >>>>>>> 1) consider it deprecated (probably >>> safest). >>>>>>> 2) spin it out into a separate module. >>>>>>> >>>>>>> Just tried to comile it myself and am >>> getting >>>>> errors (using >>>>>>> 64bit perl 5.10), so I think, unless >>> someone wants >>>>> to take >>>>>>> this on, option #1 is best. >>>>>>> >>>>>>>> So Jonny, in short, I would say "do >>> not use >>>>>>> bioperl-ext". >>>>>>> >>>>>>> In general, that's a safe bet. We're >>> moving >>>>> most of >>>>>>> our C/C++ bindings to BioLib. >>>>>>> >>>>>>>> Step back. What are you trying >>> to >>>>>>> accomplish? Chris already >>> recommended some >>>>> alternative >>>>>>> methods in his email of 8/11 on this >>>>> subject. Perhaps >>>>>>> we can guide you to some software that is >>>>> actively >>>>>>> maintained and will meet your needs. >>>>>>>> Rob >>>>>>> Exactly. Lots of other (better >>> supported!) >>>>> options >>>>>>> out there. HMMER, SeqAn, and >>> others. >>>>>>> chris >>>>>>> >>>>>> >>>>>> >>>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam >>> protection around >>>> http://mail.yahoo.com >>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From hlapp at gmx.net Sat Aug 15 19:41:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 15:41:56 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: On Aug 14, 2009, at 11:41 PM, Chris Fields wrote: > I would take more up-to-date POD over wiki (maybe adding a Status: > for the methods), but a good HOWTO goes a long way in helping. It's > just too hard to cover every use case. I'd very much second this. An API documentation should arguably be written by the developer(s) and hence I would expect to find in the PODs. Use-cases, however, and how to solve those in BioPerl can and should be contributed by everyone, and the wiki is just way better at facilitating this. As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 19:53:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 15:53:31 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: ----- Original Message ----- From: "Hilmar Lapp" ... > As for the FASTA example, I can understand - I've heard repeatedly > from people that one of the things that they are missing is > documentation for every SeqIO format we support (such as GenBank, > UniProt, FASTA, etc) about where to find a particular piece of the > format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ From hlapp at gmx.net Sat Aug 15 20:09:14 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:09:14 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > I'm planning to move Chase Miller's excellent NeXML read/write > implementation into the trunk, complete with tests. If we can get it > to pass the test suite, is there room in the point release for it? We've in the past stayed away from adding new features to stable branches with the exception of new methods in existing classes and that didn't do anything complicated. I'm not sure I remember everything but I think the NeXML support does exceed that level, doesn't it? Can it be rolled into its own pre- release that is a drop-in to an existing 1.6.x installation for those who want to go there? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 15 20:12:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:12:35 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A85F83A.30800@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: Great! Two suggestions: > ? deprecate the get_Annotations(Str) method in favor of > get_annotation(Str), which adheres better to standard perl method > naming Yes, but also is then inconsistent with existing BioPerl naming, with the method name indicating what type of object you get back (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in Bio::SeqI). > ? finally, split Bio::FeatureIO modules off into their own CPAN > distribution Wouldn't one start with this? -hilmar On Aug 14, 2009, at 7:50 PM, Robert Buels wrote: > Chris Fields wrote: >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). > > Sure, I'll head up the gff_refactor branch work. If you're > interested in what changes are being planned for Bio::SeqFeature::*, > Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the > implementation plan Chris and I developed just now on IRC, which is at > > http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan > > Now soliciting suggestions, comments, and assistance. > > Rob > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Sat Aug 15 20:24:35 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 13:24:35 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <4A871983.4010702@cornell.edu> Hilmar Lapp wrote: > I'm not sure I remember everything but I think the NeXML support does > exceed that level, doesn't it? Can it be rolled into its own pre-release > that is a drop-in to an existing 1.6.x installation for those who want > to go there? So split it out into its own CPAN dist. Rob From maj at fortinbras.us Sat Aug 15 20:36:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 16:36:47 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Yes, I'd say the Nexml support exceeds the 'complicated' test. There are no modifications to existing modules (except for the addition of annotation attributes to members of the Bio::PopGen model, which are don't-cares to anything out there currently). The manifest of a NeXML drop-in would look like Bio/NexmlIO.pm Bio/Nexml/Factory.pm Bio/SeqIO/nexml.pm Bio/AlignIO/nexml.pm Bio/TreeIO/nexml.pm and, if I get it completed, support for arbitrary characters via Bio::PopGen Bio/PopGen/IO/nexml.pm (all based on hacks of Chase's code, btw; we thought it would round out the package nicely...) Of course, the big dependency that not everyone will need or want is Rutger's Bio::Phylo, so the Nexml support will have to be optional even in 1.7, I think. I am adding run-time checks for Bio::Phylo in the modules so they die relatively gracefully and informatively, rather than just barf. Also, the tests will have appropriate skip blocks. I do want to get the code into bioperl-live, however, unless there's a gotcha there I'm not seeing-- cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:09 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > >> I'm planning to move Chase Miller's excellent NeXML read/write >> implementation into the trunk, complete with tests. If we can get it to pass >> the test suite, is there room in the point release for it? > > > We've in the past stayed away from adding new features to stable branches > with the exception of new methods in existing classes and that didn't do > anything complicated. > > I'm not sure I remember everything but I think the NeXML support does exceed > that level, doesn't it? Can it be rolled into its own pre- release that is a > drop-in to an existing 1.6.x installation for those who want to go there? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From hlapp at gmx.net Sat Aug 15 20:49:22 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:49:22 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Message-ID: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > I do want to get the code into bioperl-live, however, unless there's > a gotcha there I'm not seeing-- That sounds great to me, though it may make some of Chris' hair stand on end if he wants this to go into a separate module from the start :) Maybe a phylogenetics module can be carved out that this would become part of? Though I recall someone saying recently that Bio::Species and by extension Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to split out. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 21:07:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 17:07:30 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> Message-ID: <659CA35CE3AD464AA516D18B313311BE@NewLife> I'm all for an attempt to split out phylogenetic stuff, it seems natural, and think in terms of a phylo package dependent upon a sequence package, and if necessary vice versa -- although if the Bio::Species - Bio::Tree::Node connection is relatively loose, perhaps we can refactor to make some attributes/methods optional features that carp when the phylo package is not installed. (Roles, anyone?) However, probably 1.6.x doesn't sound like the place to do that! I myself wouldn't have any problem waiting till 1.7 for 'official' Nexml support--but I hope Chase will chime in on that. What does Chris think? MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:49 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > >> I do want to get the code into bioperl-live, however, unless there's a >> gotcha there I'm not seeing-- > > > That sounds great to me, though it may make some of Chris' hair stand on end > if he wants this to go into a separate module from the start :) Maybe a > phylogenetics module can be carved out that this would become part of? Though > I recall someone saying recently that Bio::Species and by extension > Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to > split out. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From rmb32 at cornell.edu Sat Aug 15 21:23:40 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:23:40 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: <4A87275C.5040300@cornell.edu> Hilmar Lapp wrote: >> ? deprecate the get_Annotations(Str) method in favor of >> get_annotation(Str), which adheres better to standard perl method naming > > Yes, but also is then inconsistent with existing BioPerl naming, with > the method name indicating what type of object you get back > (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in > Bio::SeqI). Blech. OK never mind about the method rename then. > >> ? finally, split Bio::FeatureIO modules off into their own CPAN >> distribution > > Wouldn't one start with this? Yeah....I've kind of been vacillating back and forth about whether it would be best to *start* with this, or to end with this. Probably makes more sense to start with it, since it gives more freedom to add dependencies on more CPAN stuff without worrying too much. Like...oh...I don't know...Moose? Thoughts on this? Rob From rmb32 at cornell.edu Sat Aug 15 21:25:51 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:25:51 -0700 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> Message-ID: <4A8727DF.7000204@cornell.edu> Chris Fields wrote: > In fact, seeing as we're refactoring GFF and other aspects of Features > in bioperl, this may be the best time to add something in. Reading that thread, it sounds like most of the issues revolve around when and how to use the unflattener. Perhaps just adding another command line switch or two to the script would be appropriate? Editorializing a bit, it's really disheartening that Genbank stores features in such a lossy way. Rob From cjfields at illinois.edu Sun Aug 16 02:05:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:05:41 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <241652.96493.qm@web30404.mail.mud.yahoo.com> References: <241652.96493.qm@web30404.mail.mud.yahoo.com> Message-ID: I'm still seeing the same errors on Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl (v5.8.8) passes fine now (as well as perl 5.8.8 on dev.open-bio.org). I'm wondering if this is a problem with my local perl build. I'm very tempted to push the HMM-related code into a separate distribution (bioperl-hmm) and make a CPAN release out of it so it gets wider testing via CPAN testers; it would just require a minimum bioperl 1.6 installation for Bio::Tools::HMM and any related modules. Yee, would that be okay with you? chris On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > I just committed HMM.xs and typemap to SVN. Can you test it to > confirm it works in 64-bit machines? > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 12:11 PM >> I'm not sure, but it makes more sense >> to commit these changes directly. Yee, need us to set >> you up with a commit bit? If so, fill out the >> information on this page: >> >> http://www.bioperl.org/wiki/SVN_Account_Request >> >> and forward it to support at open-bio.org. >> I'll sponsor you. >> >> chris >> >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >> >>> The usual procedure for developing code is to exchange >> code via commits to a version control system. Yee, do >> you know how to use Subversion? Does Yee need a commit bit? >>> >>> Rob >>> >>> Yee Man Chan wrote: >>>> Hi Chris >>>> I find that there is a memory >> access bug in my code. Attached is the fixed HMM.xs. This >> file together with the simpler typemap should fix all >> problems. (I hope..) >>>> Please let me know if it works >> for you. >>>> Sorry for the bug... >>>> Yee Man >>>> --- On Fri, 8/14/09, Chris Fields >> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>>> Date: Friday, August 14, 2009, 8:31 AM >>>>> Yee Man, >>>>> >>>>> I tested this out locally (perl 5.8.8 32-bit, >> perl 5.10.0 >>>>> 64-bit) and on dev.open-bio.org (which is perl >> 5.8.8, >>>>> appears to be 32-bit). The patch results >> in cleaning >>>>> up warnings for 5.10.0 but results in similar >> warnings for >>>>> 5.8.8 (linux or OS X). >>>>> >>>>> On OS X perl 5.8.8, this sometimes passes >> (note the first >>>>> attempt fails, the second succeeds), so it's >> not entirely a >>>>> 32-bit issue: >>>>> >>>>> http://gist.github.com/167860 >>>>> >>>>> OS X and perl 5.10.0, this always fails as the >> previous >>>>> gist shows, but demonstrates similar behavior >> (multiple >>>>> attempts to test get different responses): >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> On linux, everything passes with or w/o the >> patched files >>>>> (patched files have warnings as indicated >> above): >>>>> >>>>> Specs for all three perl executables (they >> vary a bit): >>>>> >>>>> http://gist.github.com/167883 >>>>> >>>>> chris >>>>> >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan >> wrote: >>>>> >>>>>> Ah.. I find that the typemap can become as >> simple as >>>>> this >>>>>> ===================== >>>>>> TYPEMAP >>>>>> HMM * T_PTROBJ >>>>>> ===================== >>>>>> >>>>>> Then the generated HMM.c will have a >> function called >>>>> INT2PTR to do the pointer conversion. I >> believe this should >>>>> solve the warnings. >>>>>> Attached are the updated HMM.xs and >> typemap. Can >>>>> someone with a 64-bit machine give it a try? >>>>>> Thank you >>>>>> Yee Man >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>>> "Jonny Dalzell" , >>>>> "BioPerl List" >>>>>>> Date: Thursday, August 13, 2009, 5:31 >> PM >>>>>>> (just to point out to everyone, Yee >>>>>>> Man's contact information was in the >> POD) >>>>>>> >>>>>>> Yee Man, >>>>>>> >>>>>>> I have the output in the below link: >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> There are similar problems popping up >> on 32- and >>>>> 64-bit >>>>>>> perl 5.10.0, Mac OS X 10.5. >> Haven't had time >>>>> to debug >>>>>>> it unfortunately. >>>>>>> >>>>>>> I think we should seriously consider >> spinning this >>>>> code off >>>>>>> into it's own distribution for >> CPAN. It's >>>>>>> unfortunately bit-rotting away in >>>>> bioperl-ext. If you >>>>>>> want to continue supporting it I can >> help set that >>>>> up. >>>>>>> chris >>>>>>> >>>>>>> On Aug 13, 2009, at 6:58 PM, Yee Man >> Chan wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> So is this >> an HMM only >>>>> problem? Or does >>>>>>> it apply to other bioperl-ext >> modules? >>>>>>>> What >> exactly are the >>>>> compilation errors >>>>>>> for HMM? I believe my implementation >> is just a >>>>> simple one >>>>>>> based on Rabiner's paper. >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>> ~murphyk%2FBayes >>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>> >>>>>>>> I don't >> think I did >>>>> anything fancy that >>>>>>> makes it machine dependent or non-ANSI >> C. >>>>>>>> Yee Man >>>>>>>> >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Robert Buels" >>>>>>>>> Cc: "Jonny Dalzell" , >>>>>>> "BioPerl List" , >>>>>>> "Yee Man Chan" >>>>>>>>> Date: Thursday, August 13, >> 2009, 3:18 PM >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 4:37 PM, >> Robert Buels >>>>> wrote: >>>>>>>>>> Jonny Dalzell wrote: >>>>>>>>>>> Is it ridiculous of me >> to expect >>>>> ubuntu to >>>>>>> take >>>>>>>>> care of this for me? How >> do >>>>>>>>>>> I go about compiling >> the HMM? >>>>>>>>>> Yes. This is a very >> specialized >>>>> thing >>>>>>> that >>>>>>>>> you're doing, and Ubuntu does >> not have >>>>> the >>>>>>> resources to >>>>>>>>> package every single thing. >>>>>>>>>> Unfortunately, it looks >> like >>>>> bioperl-ext >>>>>>> package is >>>>>>>>> not installable under Ubuntu >> 9.04 anyway, >>>>> which is >>>>>>> what I'm >>>>>>>>> running. For others on >> this list, >>>>> if >>>>>>> somebody is >>>>>>>>> interested in doing >> maintaining it, I'd be >>>>> happy >>>>>>> to help out >>>>>>>>> by testing on Debian-based >> Linux >>>>> platforms. >>>>>>> We need to >>>>>>>>> clarify this package's >> maintenance status: >>>>> if >>>>>>> there is >>>>>>>>> nobody interested in >> maintaining it, I >>>>> would >>>>>>> recommend that >>>>>>>>> bioperl-ext be removed from >> distribution. >>>>>>> It's not in >>>>>>>>> anybody's interest to have >> unmaintained >>>>> software >>>>>>> out there >>>>>>>>> causing confusion. >>>>>>>>> >>>>>>>>> I have cc'd Yee Man Chan for >> this. >>>>> If there >>>>>>> isn't a >>>>>>>>> response or the message >> bounces, we do one >>>>> of two >>>>>>> things: >>>>>>>>> 1) consider it deprecated >> (probably >>>>> safest). >>>>>>>>> 2) spin it out into a separate >> module. >>>>>>>>> >>>>>>>>> Just tried to comile it myself >> and am >>>>> getting >>>>>>> errors (using >>>>>>>>> 64bit perl 5.10), so I think, >> unless >>>>> someone wants >>>>>>> to take >>>>>>>>> this on, option #1 is best. >>>>>>>>> >>>>>>>>>> So Jonny, in short, I >> would say "do >>>>> not use >>>>>>>>> bioperl-ext". >>>>>>>>> >>>>>>>>> In general, that's a safe >> bet. We're >>>>> moving >>>>>>> most of >>>>>>>>> our C/C++ bindings to BioLib. >>>>>>>>> >>>>>>>>>> Step back. What are >> you trying >>>>> to >>>>>>>>> accomplish? Chris >> already >>>>> recommended some >>>>>>> alternative >>>>>>>>> methods in his email of 8/11 >> on this >>>>>>> subject. Perhaps >>>>>>>>> we can guide you to some >> software that is >>>>>>> actively >>>>>>>>> maintained and will meet your >> needs. >>>>>>>>>> Rob >>>>>>>>> Exactly. Lots of other >> (better >>>>> supported!) >>>>>>> options >>>>>>>>> out there. HMMER, SeqAn, >> and >>>>> others. >>>>>>>>> chris >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >> __________________________________________________ >>>>>> Do You Yahoo!? >>>>>> Tired of spam? Yahoo! Mail has the >> best spam >>>>> protection around >>>>>> http://mail.yahoo.com >>>>> >> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> >>> >>> --Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >> >> > > > From cjfields at illinois.edu Sun Aug 16 02:49:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:49:25 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <659CA35CE3AD464AA516D18B313311BE@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> Message-ID: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> On Aug 15, 2009, at 4:07 PM, Mark A. Jensen wrote: > I'm all for an attempt to split out phylogenetic stuff, it > seems natural, and think in terms of a phylo package > dependent upon a sequence package, and if necessary > vice versa -- although if the Bio::Species - Bio::Tree::Node > connection is relatively loose, perhaps we can refactor to > make some attributes/methods optional features that carp > when the phylo package is not installed. (Roles, anyone?) I'm pretty sure they're linked very tightly (Species is-a Bio::Taxon is-a Bio::Tree::Node). This may be something Sendu needs to chime in on; he refactored much of that code prior to 1.5.2. As a suggestion, maybe we can use a combined strategy: fall back to a very simple Bio::Species container class if a bioperl-phylo isn't installed, but utilize Bio::Taxon when it is. > However, probably 1.6.x doesn't sound like the place to > do that! I myself wouldn't have any problem waiting till > 1.7 for 'official' Nexml support--but I hope Chase will chime > in on that. What does Chris think? > MAJ Robert's suggestion of a separate distribution makes sense; it may be one avenue of slowly migrating out phylo-specific code into it's own distribution. Not sure about calling it bioperl-phylo (which might be confused with Rutger's Bio::Phylo). chris From cjfields at illinois.edu Sun Aug 16 02:47:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:47:36 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <4A8727DF.7000204@cornell.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> <4A8727DF.7000204@cornell.edu> Message-ID: <81C3E545-4F0E-4B1F-9F06-398D1EE7A3CF@illinois.edu> On Aug 15, 2009, at 4:25 PM, Robert Buels wrote: > Chris Fields wrote: > > In fact, seeing as we're refactoring GFF and other aspects of > Features > > in bioperl, this may be the best time to add something in. > > Reading that thread, it sounds like most of the issues revolve > around when and how to use the unflattener. Perhaps just adding > another command line switch or two to the script would be appropriate? > > Editorializing a bit, it's really disheartening that Genbank stores > features in such a lossy way. > > Rob Just remembered: NCBI does supply GFF3 files for bacterial genomes, but I'm not sure how well they correspond to the GFF3 specification. For example: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Aquifex_aeolicus/NC_000918.gff A quick glance looks okay, but they don't include FASTA sequence. I think much of the problem with NCBI/GenBank has to do with lack of curation on how submissions are made (lots of inconsistencies). I'm not sure how easy they will be to deal with, but the only way we can deal with that is looking at examples of problematic data (IIRC the Sulfolobus solfataricus genome GB file was a mess, so maybe that's worth a look). chris From cjfields at illinois.edu Sun Aug 16 05:38:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 00:38:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <846546.73578.qm@web30404.mail.mud.yahoo.com> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> Message-ID: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Yee, I took the liberty of making a few simple changes to Bio::Tools::HMM in svn to point out the problem and possible solutions. Feel free to revert these as needed. I'm seeing two errors, which appear randomly when running 'make test'. The first is easily fixable, the second, I'm not so sure. I'll let you make the decisions on both. 1) There is an assumption in the module that, when adding floating points, you will always get 1.0. You may run into problems: see 'perldoc -q long decimals'. Lines like this (two places in the module): ... if ($sum != 1.0) { $self->throw("Sum of probabilities for each state must be 1.0; got $sum\n"); } ... won't work as expected (note I added a simple diagnostic, just print out the 'bad' sum). With perl 5.8.8, this appears to work fine, but this is what I get with perl 5.10 (64-bit): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== Initial Probability Array: 0.499978 0.500022 Transition Probability Matrix: 0.499978 0.500022 0.499978 0.500022 Emission Probability Matrix: 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 Log Probability of sequence 1: -521.808 Log Probability of sequence 2: -426.057 Statistical Training ==================== Initial Probability Array: 1 0 Transition Probability Matrix: ------------- EXCEPTION ------------- MSG: Sum of probabilities for each from-state must be 1.0; got 0.999999999999999976 STACK Bio::Tools::HMM::transition_prob /Users/cjfields/bioperl/bioperl- live/Bio/Tools/HMM.pm:499 STACK toplevel test.pl:82 ------------------------------------- make: *** [test_dynamic] Error 255 I'm assuming this needs to simply be rounded up to 1.0. That could be accomplished with something like 'if (sprintf("%.2f", $sum) != 1.0) {...}' 2) The second error is a little stranger. I have been randomly getting this: pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 When I add strict and warnings pragmas to Bio::Tools::HMM (with a little additional cleanup to get things running), I get an additional warning (arrow): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Argument "FL" isn't numeric in numeric lt (<) at /Users/cjfields/ bioperl/bioperl-live/Bio/Tools/HMM.pm line 188. <---- Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 So something is not being converted as expected. chris On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > When are you going to release 1.6? Maybe let me work on it before it > releases. If it doesn't resolve the problem, then we can think about > other alternatives. > > Also, please show me the latest errors you have for 5.10.0. > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 7:05 PM >> I'm still seeing the same errors on >> Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl >> (v5.8.8) passes fine now (as well as perl 5.8.8 on >> dev.open-bio.org). >> >> I'm wondering if this is a problem with my local perl >> build. I'm very tempted to push the HMM-related code >> into a separate distribution (bioperl-hmm) and make a CPAN >> release out of it so it gets wider testing via CPAN testers; >> it would just require a minimum bioperl 1.6 installation for >> Bio::Tools::HMM and any related modules. Yee, would >> that be okay with you? >> >> chris >> >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >> >>> >>> I just committed HMM.xs and typemap to SVN. Can you >> test it to confirm it works in 64-bit machines? >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Yee Man Chan" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 12:11 PM >>>> I'm not sure, but it makes more sense >>>> to commit these changes directly. Yee, need >> us to set >>>> you up with a commit bit? If so, fill out >> the >>>> information on this page: >>>> >>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>> >>>> and forward it to support at open-bio.org. >>>> I'll sponsor you. >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >>>> >>>>> The usual procedure for developing code is to >> exchange >>>> code via commits to a version control >> system. Yee, do >>>> you know how to use Subversion? Does Yee need a >> commit bit? >>>>> >>>>> Rob >>>>> >>>>> Yee Man Chan wrote: >>>>>> Hi Chris >>>>>> I find that there is a >> memory >>>> access bug in my code. Attached is the fixed >> HMM.xs. This >>>> file together with the simpler typemap should fix >> all >>>> problems. (I hope..) >>>>>> Please let me know if it >> works >>>> for you. >>>>>> Sorry for the bug... >>>>>> Yee Man >>>>>> --- On Fri, 8/14/09, Chris Fields >>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems >> with >>>> Bioperl-ext package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>> "Jonny Dalzell" , >>>> "BioPerl List" >>>>>>> Date: Friday, August 14, 2009, 8:31 >> AM >>>>>>> Yee Man, >>>>>>> >>>>>>> I tested this out locally (perl 5.8.8 >> 32-bit, >>>> perl 5.10.0 >>>>>>> 64-bit) and on dev.open-bio.org (which >> is perl >>>> 5.8.8, >>>>>>> appears to be 32-bit). The patch >> results >>>> in cleaning >>>>>>> up warnings for 5.10.0 but results in >> similar >>>> warnings for >>>>>>> 5.8.8 (linux or OS X). >>>>>>> >>>>>>> On OS X perl 5.8.8, this sometimes >> passes >>>> (note the first >>>>>>> attempt fails, the second succeeds), >> so it's >>>> not entirely a >>>>>>> 32-bit issue: >>>>>>> >>>>>>> http://gist.github.com/167860 >>>>>>> >>>>>>> OS X and perl 5.10.0, this always >> fails as the >>>> previous >>>>>>> gist shows, but demonstrates similar >> behavior >>>> (multiple >>>>>>> attempts to test get different >> responses): >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> On linux, everything passes with or >> w/o the >>>> patched files >>>>>>> (patched files have warnings as >> indicated >>>> above): >>>>>>> >>>>>>> Specs for all three perl executables >> (they >>>> vary a bit): >>>>>>> >>>>>>> http://gist.github.com/167883 >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Aug 14, 2009, at 3:27 AM, Yee Man >> Chan >>>> wrote: >>>>>>> >>>>>>>> Ah.. I find that the typemap can >> become as >>>> simple as >>>>>>> this >>>>>>>> ===================== >>>>>>>> TYPEMAP >>>>>>>> HMM * T_PTROBJ >>>>>>>> ===================== >>>>>>>> >>>>>>>> Then the generated HMM.c will have >> a >>>> function called >>>>>>> INT2PTR to do the pointer conversion. >> I >>>> believe this should >>>>>>> solve the warnings. >>>>>>>> Attached are the updated HMM.xs >> and >>>> typemap. Can >>>>>>> someone with a 64-bit machine give it >> a try? >>>>>>>> Thank you >>>>>>>> Yee Man >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>>> "Jonny Dalzell" , >>>>>>> "BioPerl List" >>>>>>>>> Date: Thursday, August 13, >> 2009, 5:31 >>>> PM >>>>>>>>> (just to point out to >> everyone, Yee >>>>>>>>> Man's contact information was >> in the >>>> POD) >>>>>>>>> >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I have the output in the below >> link: >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> There are similar problems >> popping up >>>> on 32- and >>>>>>> 64-bit >>>>>>>>> perl 5.10.0, Mac OS X 10.5. >>>> Haven't had time >>>>>>> to debug >>>>>>>>> it unfortunately. >>>>>>>>> >>>>>>>>> I think we should seriously >> consider >>>> spinning this >>>>>>> code off >>>>>>>>> into it's own distribution >> for >>>> CPAN. It's >>>>>>>>> unfortunately bit-rotting away >> in >>>>>>> bioperl-ext. If you >>>>>>>>> want to continue supporting it >> I can >>>> help set that >>>>>>> up. >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 6:58 PM, >> Yee Man >>>> Chan wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> So is >> this >>>> an HMM only >>>>>>> problem? Or does >>>>>>>>> it apply to other bioperl-ext >>>> modules? >>>>>>>>>> What >>>> exactly are the >>>>>>> compilation errors >>>>>>>>> for HMM? I believe my >> implementation >>>> is just a >>>>>>> simple one >>>>>>>>> based on Rabiner's paper. >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>> >>>>>>>>>> I >> don't >>>> think I did >>>>>>> anything fancy that >>>>>>>>> makes it machine dependent or >> non-ANSI >>>> C. >>>>>>>>>> Yee Man >>>>>>>>>> >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Robert Buels" >> >>>>>>>>>>> Cc: "Jonny Dalzell" >> , >>>>>>>>> "BioPerl List" , >>>>>>>>> "Yee Man Chan" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 3:18 PM >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 4:37 PM, >>>> Robert Buels >>>>>>> wrote: >>>>>>>>>>>> Jonny Dalzell >> wrote: >>>>>>>>>>>>> Is it >> ridiculous of me >>>> to expect >>>>>>> ubuntu to >>>>>>>>> take >>>>>>>>>>> care of this for >> me? How >>>> do >>>>>>>>>>>>> I go about >> compiling >>>> the HMM? >>>>>>>>>>>> Yes. This is >> a very >>>> specialized >>>>>>> thing >>>>>>>>> that >>>>>>>>>>> you're doing, and >> Ubuntu does >>>> not have >>>>>>> the >>>>>>>>> resources to >>>>>>>>>>> package every single >> thing. >>>>>>>>>>>> Unfortunately, it >> looks >>>> like >>>>>>> bioperl-ext >>>>>>>>> package is >>>>>>>>>>> not installable under >> Ubuntu >>>> 9.04 anyway, >>>>>>> which is >>>>>>>>> what I'm >>>>>>>>>>> running. For >> others on >>>> this list, >>>>>>> if >>>>>>>>> somebody is >>>>>>>>>>> interested in doing >>>> maintaining it, I'd be >>>>>>> happy >>>>>>>>> to help out >>>>>>>>>>> by testing on >> Debian-based >>>> Linux >>>>>>> platforms. >>>>>>>>> We need to >>>>>>>>>>> clarify this >> package's >>>> maintenance status: >>>>>>> if >>>>>>>>> there is >>>>>>>>>>> nobody interested in >>>> maintaining it, I >>>>>>> would >>>>>>>>> recommend that >>>>>>>>>>> bioperl-ext be removed >> from >>>> distribution. >>>>>>>>> It's not in >>>>>>>>>>> anybody's interest to >> have >>>> unmaintained >>>>>>> software >>>>>>>>> out there >>>>>>>>>>> causing confusion. >>>>>>>>>>> >>>>>>>>>>> I have cc'd Yee Man >> Chan for >>>> this. >>>>>>> If there >>>>>>>>> isn't a >>>>>>>>>>> response or the >> message >>>> bounces, we do one >>>>>>> of two >>>>>>>>> things: >>>>>>>>>>> 1) consider it >> deprecated >>>> (probably >>>>>>> safest). >>>>>>>>>>> 2) spin it out into a >> separate >>>> module. >>>>>>>>>>> >>>>>>>>>>> Just tried to comile >> it myself >>>> and am >>>>>>> getting >>>>>>>>> errors (using >>>>>>>>>>> 64bit perl 5.10), so I >> think, >>>> unless >>>>>>> someone wants >>>>>>>>> to take >>>>>>>>>>> this on, option #1 is >> best. >>>>>>>>>>> >>>>>>>>>>>> So Jonny, in >> short, I >>>> would say "do >>>>>>> not use >>>>>>>>>>> bioperl-ext". >>>>>>>>>>> >>>>>>>>>>> In general, that's a >> safe >>>> bet. We're >>>>>>> moving >>>>>>>>> most of >>>>>>>>>>> our C/C++ bindings to >> BioLib. >>>>>>>>>>> >>>>>>>>>>>> Step back. >> What are >>>> you trying >>>>>>> to >>>>>>>>>>> accomplish? >> Chris >>>> already >>>>>>> recommended some >>>>>>>>> alternative >>>>>>>>>>> methods in his email >> of 8/11 >>>> on this >>>>>>>>> subject. Perhaps >>>>>>>>>>> we can guide you to >> some >>>> software that is >>>>>>>>> actively >>>>>>>>>>> maintained and will >> meet your >>>> needs. >>>>>>>>>>>> Rob >>>>>>>>>>> Exactly. Lots of >> other >>>> (better >>>>>>> supported!) >>>>>>>>> options >>>>>>>>>>> out there. >> HMMER, SeqAn, >>>> and >>>>>>> others. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> __________________________________________________ >>>>>>>> Do You Yahoo!? >>>>>>>> Tired of spam? Yahoo! Mail >> has the >>>> best spam >>>>>>> protection around >>>>>>>> http://mail.yahoo.com >>>>>>> >>>> >> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>> >>>>> >>>>> --Robert Buels >>>>> Bioinformatics Analyst, Sol Genomics Network >>>>> Boyce Thompson Institute for Plant Research >>>>> Tower Rd >>>>> Ithaca, NY 14853 >>>>> Tel: 503-889-8539 >>>>> rmb32 at cornell.edu >>>>> http://www.sgn.cornell.edu >>>> >>>> >>> >>> >>> >> >> > > > From abhishek.vit at gmail.com Sun Aug 16 08:06:49 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 04:06:49 -0400 Subject: [Bioperl-l] About binning data for histograms Message-ID: Hi All After a lot of look up on forums I could google, I am finally posting my question here. I think it may not be appropriate for this mailing list. I apologize for this first up. The question is regarding dynamic binning of data points for histogram plots. So I have many hashes, each having a "numerical" coverage data obtained from Next generation sequencing data analysis. Now each hash may have couple of hundred to thousands entry "contig_name => coverage". What I want to do is to plot a histogram for each hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N has to be binned according to the data size). I am using Chart::Gnuplot for this but I am not able to figure out how to bin the data points to fit nicely on a screen. Is there any smart/quick method to do this. Any pointers will help a great deal. Best Regards, -Abhi From bix at sendu.me.uk Sun Aug 16 09:21:11 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 16 Aug 2009 10:21:11 +0100 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <4A87CF87.7030803@sendu.me.uk> Abhishek Pratap wrote: > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width Like it says, it depends on the data, but it's worth trying them out to see if one of them gives you anything sensible. From sdavis2 at mail.nih.gov Sun Aug 16 11:48:23 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 16 Aug 2009 07:48:23 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <264855a00908160448i2691fc08t472fc0d83afbb356@mail.gmail.com> On Sun, Aug 16, 2009 at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > Hi, Abhi. You could use R, but you got that already. ; ) However, you might look here for a perl solution. http://search.cpan.org/~whizdog/GDGraph-histogram-1.1/lib/GD/Graph/histogram.pm Sean From cjfields at illinois.edu Sun Aug 16 12:53:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 07:53:29 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <217259.7083.qm@web30408.mail.mud.yahoo.com> References: <217259.7083.qm@web30408.mail.mud.yahoo.com> Message-ID: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> That worked! Thanks Yee Man! chris ps - let me know how you want to deal with a release. On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > Hi Chris > > Thanks for your suggestions. I think it is indeed better to check > sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm > > I also fixed codes that will lead to warnings with use warnings. > > So now the only problem left is that "monotonic increasing" error. > For that part of the code, I was trying to perform an expectation > maximization step. Theoretically, the expectation should > monotonically increase in every step. But I suppose this is not > necessarily true when double precision floating point numbers are > involved. I don't know why I used a 1e-100 tolerance for this. > Therefore I "fixed" it by using the same tolerance to terminate the > maximization step (ie .000001). I suppose this "fix" will make it > much more unlikely to throw exception with your 5.10.0 perl. > > Can you give that a try again and see if it works now. > > Thank you > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 10:38 PM >> Yee, >> >> I took the liberty of making a few simple changes to >> Bio::Tools::HMM in svn to point out the problem and possible >> solutions. Feel free to revert these as needed. >> >> I'm seeing two errors, which appear randomly when running >> 'make test'. The first is easily fixable, the second, >> I'm not so sure. I'll let you make the decisions on >> both. >> >> 1) There is an assumption in the module that, when >> adding floating points, you will always get 1.0. You >> may run into problems: see 'perldoc -q long decimals'. >> Lines like this (two places in the module): >> ... >> if ($sum != 1.0) { >> $self->throw("Sum of >> probabilities for each state must be 1.0; got $sum\n"); >> } >> ... >> >> won't work as expected (note I added a simple diagnostic, >> just print out the 'bad' sum). With perl 5.8.8, this >> appears to work fine, but this is what I get with perl 5.10 >> (64-bit): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> Initial Probability Array: >> 0.499978 0.500022 >> Transition Probability Matrix: >> 0.499978 0.500022 >> 0.499978 0.500022 >> Emission Probability Matrix: >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> >> Log Probability of sequence 1: -521.808 >> Log Probability of sequence 2: -426.057 >> >> Statistical Training >> ==================== >> Initial Probability Array: >> 1 0 >> Transition Probability Matrix: >> >> ------------- EXCEPTION ------------- >> MSG: Sum of probabilities for each from-state must be 1.0; >> got 0.999999999999999976 >> >> STACK Bio::Tools::HMM::transition_prob >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 >> STACK toplevel test.pl:82 >> ------------------------------------- >> >> make: *** [test_dynamic] Error 255 >> >> I'm assuming this needs to simply be rounded up to >> 1.0. That could be accomplished with something like >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' >> >> 2) The second error is a little stranger. I have been >> randomly getting this: >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> When I add strict and warnings pragmas to Bio::Tools::HMM >> (with a little additional cleanup to get things running), I >> get an additional warning (arrow): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Argument "FL" isn't numeric in numeric lt (<) at >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line >> 188. <---- >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> So something is not being converted as expected. >> >> chris >> >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: >> >>> When are you going to release 1.6? Maybe let me work >> on it before it releases. If it doesn't resolve the problem, >> then we can think about other alternatives. >>> >>> Also, please show me the latest errors you have for >> 5.10.0. >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 7:05 PM >>>> I'm still seeing the same errors on >>>> Mac OS X for 64-bit perl 5.10.0. Mac OS X, >> native perl >>>> (v5.8.8) passes fine now (as well as perl 5.8.8 >> on >>>> dev.open-bio.org). >>>> >>>> I'm wondering if this is a problem with my local >> perl >>>> build. I'm very tempted to push the >> HMM-related code >>>> into a separate distribution (bioperl-hmm) and >> make a CPAN >>>> release out of it so it gets wider testing via >> CPAN testers; >>>> it would just require a minimum bioperl 1.6 >> installation for >>>> Bio::Tools::HMM and any related modules. >> Yee, would >>>> that be okay with you? >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >>>> >>>>> >>>>> I just committed HMM.xs and typemap to SVN. >> Can you >>>> test it to confirm it works in 64-bit machines? >>>>> >>>>> Thanks >>>>> Yee Man >>>>> >>>>> --- On Sat, 8/15/09, Chris Fields >>>> wrote: >>>>> >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Yee Man Chan" , >>>> "BioPerl List" >>>>>> Date: Saturday, August 15, 2009, 12:11 PM >>>>>> I'm not sure, but it makes more sense >>>>>> to commit these changes directly. >> Yee, need >>>> us to set >>>>>> you up with a commit bit? If so, >> fill out >>>> the >>>>>> information on this page: >>>>>> >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>>>> >>>>>> and forward it to support at open-bio.org. >>>>>> I'll sponsor you. >>>>>> >>>>>> chris >>>>>> >>>>>> On Aug 15, 2009, at 11:44 AM, Robert Buels >> wrote: >>>>>> >>>>>>> The usual procedure for developing >> code is to >>>> exchange >>>>>> code via commits to a version control >>>> system. Yee, do >>>>>> you know how to use Subversion? Does Yee >> need a >>>> commit bit? >>>>>>> >>>>>>> Rob >>>>>>> >>>>>>> Yee Man Chan wrote: >>>>>>>> Hi Chris >>>>>>>> I find >> that there is a >>>> memory >>>>>> access bug in my code. Attached is the >> fixed >>>> HMM.xs. This >>>>>> file together with the simpler typemap >> should fix >>>> all >>>>>> problems. (I hope..) >>>>>>>> Please let >> me know if it >>>> works >>>>>> for you. >>>>>>>> Sorry for the bug... >>>>>>>> Yee Man >>>>>>>> --- On Fri, 8/14/09, Chris Fields >> >>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems >>>> with >>>>>> Bioperl-ext package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>> "Jonny Dalzell" , >>>>>> "BioPerl List" >>>>>>>>> Date: Friday, August 14, 2009, >> 8:31 >>>> AM >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I tested this out locally >> (perl 5.8.8 >>>> 32-bit, >>>>>> perl 5.10.0 >>>>>>>>> 64-bit) and on >> dev.open-bio.org (which >>>> is perl >>>>>> 5.8.8, >>>>>>>>> appears to be 32-bit). >> The patch >>>> results >>>>>> in cleaning >>>>>>>>> up warnings for 5.10.0 but >> results in >>>> similar >>>>>> warnings for >>>>>>>>> 5.8.8 (linux or OS X). >>>>>>>>> >>>>>>>>> On OS X perl 5.8.8, this >> sometimes >>>> passes >>>>>> (note the first >>>>>>>>> attempt fails, the second >> succeeds), >>>> so it's >>>>>> not entirely a >>>>>>>>> 32-bit issue: >>>>>>>>> >>>>>>>>> http://gist.github.com/167860 >>>>>>>>> >>>>>>>>> OS X and perl 5.10.0, this >> always >>>> fails as the >>>>>> previous >>>>>>>>> gist shows, but demonstrates >> similar >>>> behavior >>>>>> (multiple >>>>>>>>> attempts to test get >> different >>>> responses): >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> On linux, everything passes >> with or >>>> w/o the >>>>>> patched files >>>>>>>>> (patched files have warnings >> as >>>> indicated >>>>>> above): >>>>>>>>> >>>>>>>>> Specs for all three perl >> executables >>>> (they >>>>>> vary a bit): >>>>>>>>> >>>>>>>>> http://gist.github.com/167883 >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 14, 2009, at 3:27 AM, >> Yee Man >>>> Chan >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Ah.. I find that the >> typemap can >>>> become as >>>>>> simple as >>>>>>>>> this >>>>>>>>>> ===================== >>>>>>>>>> TYPEMAP >>>>>>>>>> HMM * >> T_PTROBJ >>>>>>>>>> ===================== >>>>>>>>>> >>>>>>>>>> Then the generated HMM.c >> will have >>>> a >>>>>> function called >>>>>>>>> INT2PTR to do the pointer >> conversion. >>>> I >>>>>> believe this should >>>>>>>>> solve the warnings. >>>>>>>>>> Attached are the updated >> HMM.xs >>>> and >>>>>> typemap. Can >>>>>>>>> someone with a 64-bit machine >> give it >>>> a try? >>>>>>>>>> Thank you >>>>>>>>>> Yee Man >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Yee Man Chan" >> >>>>>>>>>>> Cc: "Robert Buels" >> , >>>>>>>>> "Jonny Dalzell" , >>>>>>>>> "BioPerl List" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 5:31 >>>>>> PM >>>>>>>>>>> (just to point out to >>>> everyone, Yee >>>>>>>>>>> Man's contact >> information was >>>> in the >>>>>> POD) >>>>>>>>>>> >>>>>>>>>>> Yee Man, >>>>>>>>>>> >>>>>>>>>>> I have the output in >> the below >>>> link: >>>>>>>>>>> >>>>>>>>>>> http://gist.github.com/167542 >>>>>>>>>>> >>>>>>>>>>> There are similar >> problems >>>> popping up >>>>>> on 32- and >>>>>>>>> 64-bit >>>>>>>>>>> perl 5.10.0, Mac OS X >> 10.5. >>>>>> Haven't had time >>>>>>>>> to debug >>>>>>>>>>> it unfortunately. >>>>>>>>>>> >>>>>>>>>>> I think we should >> seriously >>>> consider >>>>>> spinning this >>>>>>>>> code off >>>>>>>>>>> into it's own >> distribution >>>> for >>>>>> CPAN. It's >>>>>>>>>>> unfortunately >> bit-rotting away >>>> in >>>>>>>>> bioperl-ext. If you >>>>>>>>>>> want to continue >> supporting it >>>> I can >>>>>> help set that >>>>>>>>> up. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 6:58 PM, >>>> Yee Man >>>>>> Chan wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> >> So is >>>> this >>>>>> an HMM only >>>>>>>>> problem? Or does >>>>>>>>>>> it apply to other >> bioperl-ext >>>>>> modules? >>>>>>>>>>>> >> What >>>>>> exactly are the >>>>>>>>> compilation errors >>>>>>>>>>> for HMM? I believe my >>>> implementation >>>>>> is just a >>>>>>>>> simple one >>>>>>>>>>> based on Rabiner's >> paper. >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>>>> >>>>>>>>>>>> >> I >>>> don't >>>>>> think I did >>>>>>>>> anything fancy that >>>>>>>>>>> makes it machine >> dependent or >>>> non-ANSI >>>>>> C. >>>>>>>>>>>> Yee Man >>>>>>>>>>>> >>>>>>>>>>>> --- On Thu, >> 8/13/09, Chris >>>> Fields >>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> From: Chris >> Fields >>>> >>>>>>>>>>>>> Subject: Re: >>>> [Bioperl-l] >>>>>> Problems with >>>>>>>>> Bioperl-ext >>>>>>>>>>> package on WinVista? >>>>>>>>>>>>> To: "Robert >> Buels" >>>> >>>>>>>>>>>>> Cc: "Jonny >> Dalzell" >>>> , >>>>>>>>>>> "BioPerl List" , >>>>>>>>>>> "Yee Man Chan" >>>>>>>>>>>>> Date: >> Thursday, August >>>> 13, >>>>>> 2009, 3:18 PM >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 13, >> 2009, at >>>> 4:37 PM, >>>>>> Robert Buels >>>>>>>>> wrote: >>>>>>>>>>>>>> Jonny >> Dalzell >>>> wrote: >>>>>>>>>>>>>>> Is it >>>> ridiculous of me >>>>>> to expect >>>>>>>>> ubuntu to >>>>>>>>>>> take >>>>>>>>>>>>> care of this >> for >>>> me? How >>>>>> do >>>>>>>>>>>>>>> I go >> about >>>> compiling >>>>>> the HMM? >>>>>>>>>>>>>> Yes. >> This is >>>> a very >>>>>> specialized >>>>>>>>> thing >>>>>>>>>>> that >>>>>>>>>>>>> you're doing, >> and >>>> Ubuntu does >>>>>> not have >>>>>>>>> the >>>>>>>>>>> resources to >>>>>>>>>>>>> package every >> single >>>> thing. >>>>>>>>>>>>>> >> Unfortunately, it >>>> looks >>>>>> like >>>>>>>>> bioperl-ext >>>>>>>>>>> package is >>>>>>>>>>>>> not >> installable under >>>> Ubuntu >>>>>> 9.04 anyway, >>>>>>>>> which is >>>>>>>>>>> what I'm >>>>>>>>>>>>> running. >> For >>>> others on >>>>>> this list, >>>>>>>>> if >>>>>>>>>>> somebody is >>>>>>>>>>>>> interested in >> doing >>>>>> maintaining it, I'd be >>>>>>>>> happy >>>>>>>>>>> to help out >>>>>>>>>>>>> by testing on >>>> Debian-based >>>>>> Linux >>>>>>>>> platforms. >>>>>>>>>>> We need to >>>>>>>>>>>>> clarify this >>>> package's >>>>>> maintenance status: >>>>>>>>> if >>>>>>>>>>> there is >>>>>>>>>>>>> nobody >> interested in >>>>>> maintaining it, I >>>>>>>>> would >>>>>>>>>>> recommend that >>>>>>>>>>>>> bioperl-ext be >> removed >>>> from >>>>>> distribution. >>>>>>>>>>> It's not in >>>>>>>>>>>>> anybody's >> interest to >>>> have >>>>>> unmaintained >>>>>>>>> software >>>>>>>>>>> out there >>>>>>>>>>>>> causing >> confusion. >>>>>>>>>>>>> >>>>>>>>>>>>> I have cc'd >> Yee Man >>>> Chan for >>>>>> this. >>>>>>>>> If there >>>>>>>>>>> isn't a >>>>>>>>>>>>> response or >> the >>>> message >>>>>> bounces, we do one >>>>>>>>> of two >>>>>>>>>>> things: >>>>>>>>>>>>> 1) consider >> it >>>> deprecated >>>>>> (probably >>>>>>>>> safest). >>>>>>>>>>>>> 2) spin it out >> into a >>>> separate >>>>>> module. >>>>>>>>>>>>> >>>>>>>>>>>>> Just tried to >> comile >>>> it myself >>>>>> and am >>>>>>>>> getting >>>>>>>>>>> errors (using >>>>>>>>>>>>> 64bit perl >> 5.10), so I >>>> think, >>>>>> unless >>>>>>>>> someone wants >>>>>>>>>>> to take >>>>>>>>>>>>> this on, >> option #1 is >>>> best. >>>>>>>>>>>>> >>>>>>>>>>>>>> So Jonny, >> in >>>> short, I >>>>>> would say "do >>>>>>>>> not use >>>>>>>>>>>>> bioperl-ext". >>>>>>>>>>>>> >>>>>>>>>>>>> In general, >> that's a >>>> safe >>>>>> bet. We're >>>>>>>>> moving >>>>>>>>>>> most of >>>>>>>>>>>>> our C/C++ >> bindings to >>>> BioLib. >>>>>>>>>>>>> >>>>>>>>>>>>>> Step >> back. >>>> What are >>>>>> you trying >>>>>>>>> to >>>>>>>>>>>>> accomplish? >>>> Chris >>>>>> already >>>>>>>>> recommended some >>>>>>>>>>> alternative >>>>>>>>>>>>> methods in his >> email >>>> of 8/11 >>>>>> on this >>>>>>>>>>> subject. >> Perhaps >>>>>>>>>>>>> we can guide >> you to >>>> some >>>>>> software that is >>>>>>>>>>> actively >>>>>>>>>>>>> maintained and >> will >>>> meet your >>>>>> needs. >>>>>>>>>>>>>> Rob >>>>>>>>>>>>> Exactly. >> Lots of >>>> other >>>>>> (better >>>>>>>>> supported!) >>>>>>>>>>> options >>>>>>>>>>>>> out there. >>>> HMMER, SeqAn, >>>>>> and >>>>>>>>> others. >>>>>>>>>>>>> chris >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> __________________________________________________ >>>>>>>>>> Do You Yahoo!? >>>>>>>>>> Tired of spam? >> Yahoo! Mail >>>> has the >>>>>> best spam >>>>>>>>> protection around >>>>>>>>>> http://mail.yahoo.com >>>>>>>>> >>>>>> >>>> >> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --Robert Buels >>>>>>> Bioinformatics Analyst, Sol Genomics >> Network >>>>>>> Boyce Thompson Institute for Plant >> Research >>>>>>> Tower Rd >>>>>>> Ithaca, NY 14853 >>>>>>> Tel: 503-889-8539 >>>>>>> rmb32 at cornell.edu >>>>>>> http://www.sgn.cornell.edu >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > From hlapp at gmx.net Sun Aug 16 15:07:39 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:07:39 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > I'm assuming this needs to simply be rounded up to 1.0. That could > be accomplished with something like 'if (sprintf("%.2f", $sum) != > 1.0) {...}' Couldn't you just test for the absolute difference being smaller than some reasonable epsilon? That might be more efficient (and more explicit) than printing to a string. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 16 15:13:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:13:54 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > Not sure about calling it bioperl-phylo (which might be confused > with Rutger's Bio::Phylo). Frankly, it seems to me that either is more powerful in combination with the other, so I don't quite see how the name suggesting some linkage isn't a Good Thing rather than bad. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Sun Aug 16 15:42:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:42:50 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> Message-ID: On Aug 16, 2009, at 10:07 AM, Hilmar Lapp wrote: > > On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > >> I'm assuming this needs to simply be rounded up to 1.0. That could >> be accomplished with something like 'if (sprintf("%.2f", $sum) != >> 1.0) {...}' > > > Couldn't you just test for the absolute difference being smaller > than some reasonable epsilon? That might be more efficient (and more > explicit) than printing to a string. > > -hilmar Yes, either way is fine. Re: floating point and sprintf, acc. to the perlfaq4, as perl doesn't have a round() function the sprintf() idiom is suggested (and commonly used). chris From cjfields at illinois.edu Sun Aug 16 15:48:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:48:52 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > >> Not sure about calling it bioperl-phylo (which might be confused >> with Rutger's Bio::Phylo). > > > Frankly, it seems to me that either is more powerful in combination > with the other, so I don't quite see how the name suggesting some > linkage isn't a Good Thing rather than bad. > > -hilmar I don't have a problem as long as there is some emphasis they are two separate, but related, projects. There is quite a bit of crossover between the two (particularly with the last few bioperl-related GSoC projects), but I would rather not have to worry about users emailing the list wondering why something in bioperl-phylo doesn't work when they installed Bio::Phylo instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended module with bioperl-phylo to alleviate that? chris From maj at fortinbras.us Sun Aug 16 16:59:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 16 Aug 2009 12:59:40 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: <44D32BE895F446A9917A5550485AB102@NewLife> I see both points- I think Chris's suggestion is good. The nexml support won't work without Bio::Phylo, but not everyone will need that support, so if the install can be chatty about this that would be great- ----- Original Message ----- From: "Chris Fields" To: "Hilmar Lapp" Cc: "BioPerl List" ; "Mark A. Jensen" ; "chase Miller" Sent: Sunday, August 16, 2009 11:48 AM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > >> On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: >> >>> Not sure about calling it bioperl-phylo (which might be confused with >>> Rutger's Bio::Phylo). >> >> >> Frankly, it seems to me that either is more powerful in combination with the >> other, so I don't quite see how the name suggesting some linkage isn't a >> Good Thing rather than bad. >> >> -hilmar > > I don't have a problem as long as there is some emphasis they are two > separate, but related, projects. There is quite a bit of crossover between > the two (particularly with the last few bioperl-related GSoC projects), but I > would rather not have to worry about users emailing the list wondering why > something in bioperl-phylo doesn't work when they installed Bio::Phylo > instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended > module with bioperl-phylo to alleviate that? > > chris > > From rmb32 at cornell.edu Sun Aug 16 17:16:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 16 Aug 2009 10:16:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <44D32BE895F446A9917A5550485AB102@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> Message-ID: <4A883EE2.3060101@cornell.edu> Mark A. Jensen wrote: > I see both points- I think Chris's suggestion is good. The nexml support > won't work without Bio::Phylo, but not everyone will need that support, > so if the install can be chatty about this that would be great- Maybe the parts that have differing dependencies should be in different distros then? Rob From jason at bioperl.org Sun Aug 16 17:25:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 13:25:08 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> For binning of a distribution see the perl module Statistics::Descriptive - http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm function: frequency_distritibution I would also look at R histogram function for the plotting. This would be one of the easiest ways - I would just make a perl script that generates the correct R code that can be used to make the plots. On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > > Best Regards, > -Abhi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From abhishek.vit at gmail.com Sun Aug 16 17:34:54 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 13:34:54 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> References: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> Message-ID: Thanks All. I completely forgot and dint realize that histogram function in R could auto bin based on the data. Cheers, -Abhi On Sun, Aug 16, 2009 at 1:25 PM, Jason Stajich wrote: > For binning of a distribution see the perl module Statistics::Descriptive - > http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm?function: > frequency_distritibution > > I would also look at R histogram function for the plotting. ?This would be > one of the easiest ways - I would just make a perl script that generates the > correct R code that can be used to make the plots. > > > On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > >> Hi All >> >> After a lot of look up on forums I could google, I am finally posting >> my question here. I think it may not be appropriate for this mailing >> list. I apologize for this first up. The question is regarding dynamic >> binning of data points for histogram plots. >> >> So I have many hashes, each having a "numerical" coverage data >> obtained from Next generation sequencing data analysis. Now each hash >> may have couple of hundred to thousands entry "contig_name => >> coverage". ?What I want to do is to plot a histogram for each >> hash/dataset. ?"Coverage v/s Count of contigs with coverage > #N " ( N >> has to be binned according to the data size). >> >> I am using Chart::Gnuplot for this but I am not able to figure out how >> to bin the data points to fit nicely on a screen. Is there any >> smart/quick method to do this. >> >> Any pointers will help a great deal. >> >> Best Regards, >> -Abhi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From robert.bradbury at gmail.com Sun Aug 16 19:16:09 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 16 Aug 2009 15:16:09 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? Message-ID: Hello, I am trying to use get_sequence() to fetch the sequence NS_000198 for the fungus *Podospora anserina* with the databases "GenBank" and when that didn't work "Gene". This is a simple script which fetches the sequence then writes out the fasta and genbank files from the data structure. The errors I got suggested that the system was running out of memory which I thought was unlikely since I've got something like 3GB of main memory and 9GB of swap space. After running strace on the script (which takes a while) I determined that the brk() calls were generating ENOMEM at ~3GB. This turns out to be due to the limit of the Linux memory model I am using (3GB/1GB) on a Pentium IV (Prescott). Now, I think the total genome size for the fungus is ~70MB but haven't verified this so I "should" be able to fetch it unless Bioperl (or perl itself) is doing extremely poor memory management (perhaps not coalescing memory segments into one large sequence) as the reads take place? [1]. Has anyone encountered this problem (fetching say large mammalian chromosomes)? Does anyone know what the limits are for "fetching" sequence files (on 32/64 bit machines?. The reason I am using get_sequence and BioPerl is that I can't seem to find the *Podospora anserina* sequence in a FTP database anywhere (so I can't use "wget or ftp"). I haven't tested accessing the GenBank file in a browser (I don't know what browsers would do with a HTML file that large but suspect it would not be pretty). Thanks in advance, Robert Bradbury 1. The strace seems to indicate periodic brk() calls to expand the process data segment size between which there are lots of read() calls of size 4096, presumably reading the socket from NCBI. I don't know if there is an easy way to trace perl's memory allocation/manipulation at a higher level. From jason at bioperl.org Sun Aug 16 19:22:35 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 15:22:35 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? In-Reply-To: References: Message-ID: <93672502-26EB-4C30-A37E-F3B593E57279@bioperl.org> Robert - Posting your script will help us replicate and diagnose - I am not sure which GenBank fetch option you are using. I have a feeling it is trying to do recursive calls to stitch together the pseudoscaffold. I presume it works find though if you request the each chromosome scaffold like CU607053,CU633438, ... I guess posting it via a bugzilla bug is the best way unless you have a git account and wanted to post it as a 'gist'. -jason -- Jason Stajich jason at bioperl.org http://fungalgenomes.org/ On Aug 16, 2009, at 3:16 PM, Robert Bradbury wrote: > Hello, > > I am trying to use get_sequence() to fetch the sequence NS_000198 > for the > fungus *Podospora anserina* with the databases "GenBank" and when that > didn't work "Gene". This is a simple script which fetches the > sequence then > writes out the fasta and genbank files from the data structure. > > The errors I got suggested that the system was running out of memory > which I > thought was unlikely since I've got something like 3GB of main > memory and > 9GB of swap space. After running strace on the script (which takes > a while) > I determined that the brk() calls were generating ENOMEM at ~3GB. > This > turns out to be due to the limit of the Linux memory model I am using > (3GB/1GB) on a Pentium IV (Prescott). > > Now, I think the total genome size for the fungus is ~70MB but haven't > verified this so I "should" be able to fetch it unless Bioperl (or > perl > itself) is doing extremely poor memory management (perhaps not > coalescing > memory segments into one large sequence) as the reads take place? [1]. > > Has anyone encountered this problem (fetching say large mammalian > chromosomes)? Does anyone know what the limits are for "fetching" > sequence > files (on 32/64 bit machines?. The reason I am using get_sequence and > BioPerl is that I can't seem to find the *Podospora anserina* > sequence in a > FTP database anywhere (so I can't use "wget or ftp"). I haven't > tested > accessing the GenBank file in a browser (I don't know what browsers > would do > with a HTML file that large but suspect it would not be pretty). > > Thanks in advance, > Robert Bradbury > > 1. The strace seems to indicate periodic brk() calls to expand the > process > data segment size between which there are lots of read() calls of > size 4096, > presumably reading the socket from NCBI. I don't know if there is > an easy > way to trace perl's memory allocation/manipulation at a higher level. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Aug 16 19:42:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 14:42:56 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A883EE2.3060101@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> <4A883EE2.3060101@cornell.edu> Message-ID: <69B8C887-1C5E-47B4-9168-8509BB0A5528@illinois.edu> On Aug 16, 2009, at 12:16 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> I see both points- I think Chris's suggestion is good. The nexml >> support >> won't work without Bio::Phylo, but not everyone will need that >> support, >> so if the install can be chatty about this that would be great- > > Maybe the parts that have differing dependencies should be in > different distros then? > > Rob I'm guessing large chunks of that code would have Bio::Root::Root as a base, so I think maintaining related code split into two distributions too problematic. Simple to indicate that Bio::Phylo is required only for NeXML (so listing it as a 'recommends') and keep everything NeXML- related and requiring Bio::Root::Root in one spot. It's possible something inheriting from Bio::Phylo could go there, but that's up to Rutger. chris From maj at fortinbras.us Mon Aug 17 12:43:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 08:43:33 -0400 Subject: [Bioperl-l] new NeXML I/O modules Message-ID: Hi All- I'm pleased to announce that my Google Summer of Code student Chase Miller and I have successfully migrated his modules for NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is Rutger Vos' highly flexible, highly annotable standard for evolutionary data exchange, that is catching on in the evolutionary DB world. We hope these modules will help move that process along. I also want to say that Chase has been a terrific student and collaborator. He learned the not only the complexities of BioPerl IO from scratch, but also grokked Rutger's Bio::Phylo internals, and became familiar with and applied modern OO concepts. He also wrote tests (which pass!), complete POD, and a HOWTO (at http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this work. Best of all, he finished! (Well, as much as anything is ever finished around here.) I for one hope he will continue to use his commit bit for good and not evil. cheers, Mark From deequan at gmail.com Mon Aug 17 13:06:44 2009 From: deequan at gmail.com (David Quan) Date: Mon, 17 Aug 2009 09:06:44 -0400 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? Message-ID: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Hello there, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Hints and recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From akarger at CGR.Harvard.edu Mon Aug 17 13:04:29 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 17 Aug 2009 09:04:29 -0400 Subject: [Bioperl-l] on BP documentation References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger From cjfields at illinois.edu Mon Aug 17 14:06:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:06:15 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: Congrats Chase! chris On Aug 17, 2009, at 7:43 AM, Mark A. Jensen wrote: > Hi All- > > I'm pleased to announce that my Google Summer of Code student > Chase Miller and I have successfully migrated his modules for > NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is > Rutger Vos' highly flexible, highly annotable standard for > evolutionary data exchange, that is catching on in the > evolutionary DB world. We hope these modules will help move that > process along. > > I also want to say that Chase has been a terrific student and > collaborator. He learned the not only the complexities of BioPerl > IO from scratch, but also grokked Rutger's Bio::Phylo internals, > and became familiar with and applied modern OO concepts. He also > wrote tests (which pass!), complete POD, and a HOWTO (at > http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this > work. Best of all, he finished! (Well, as much as anything is > ever finished around here.) I for one hope he will continue to > use his commit bit for good and not evil. > > cheers, > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 14:22:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:22:26 -0500 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? In-Reply-To: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> References: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Message-ID: <74D10663-5770-43DA-ABDB-27FA5D532497@illinois.edu> That's possible, yes. Use the hit information and use Bio::DB::GenBank to pull the sequence out, in the below example. Note that strand is different than BioPerl's -1/0/1; efetch strand: 1 = normal (default), 2 = comp. ================================ my $factory = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $seqstart, -seq_stop => $seqend, -strand => $strand, # 1=plus, 2=minus ); $factory->get_Seq_by_id($id); # should be UID, use get_Seq_by_acc() for accessions ================================ This pulls everything into a Bio::Seq, though, so you'll need to push it out to a SeqIO output stream. You can also use Bio::DB::EUtilities to get the raw sequence via efetch, something like (untested): ================================ my $fetcher = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'gb'); # loop: for each hit/HSP, grab sequence... my $fetcher->set_parameters( -id => $id # UID or accession -seq_start => $seqstart, # hit start -seq_stop => $seqend, # hit end -strand => $strand # 1=plus, 2=minus ); # then get raw content $fetcher->get_Response(-file => ">$id.gb"); ================================ You could probably plug into ENSembl similarly if the db versions match; see: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences chris On Aug 17, 2009, at 8:06 AM, David Quan wrote: > Hello there, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Hints and recommendations for > elements that would be capable of behaving in such a way would be > greatly appreciated. Thanks very much. > > David N. Quan > > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 14:47:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:47:31 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: On Aug 17, 2009, at 8:04 AM, Amir Karger wrote: >> -----Original Message----- >> From: Mark A. Jensen [mailto:maj at fortinbras.us] >> >> From: "Hilmar Lapp" >> ... >>> As for the FASTA example, I can understand - I've heard >> repeatedly >>> from people that one of the things that they are missing is >>> documentation for every SeqIO format we support (such as >> GenBank, >>> UniProt, FASTA, etc) about where to find a particular piece of >> the >>> format in the object model. >> >> This is the right thread for list lurkers to contribute their betes >> noires >> such as this one. I encourage ALL to post these issues and help >> create >> our list of action items. >> MAJ > > I wish you the best of luck on this ambitious and crucial project. I > teach intro Perl classes to biologists and always tell them that > Bioperl > is amazingly useful, but only if you can figure out how to use it. If > what you want to do isn't in the howtos, you can be in big trouble. > > I was trying to remember specific examples of where I've gotten lost, > and unfortunately can't give any. But I can tell you that often I've > run > into trouble because the particular method I'm looking for is three > parent classes away from the module I'm actually looking at. The > deobfuscator helps some, but only for people who know about that. Do > you > think you could automate a tool that would add the following to the > bottom of each module? > > =head2 Inherited methods > > =over 4 > > =item desc > > See Bio::Seq::Basic > > =back > > This would make browsing through the docs on bioperl.org more fun too. > > -Amir Karger For many modules this is already in place, but yes this could be improved. One of the problems I suggest we avoid when doing this is placing these interspersed within code. It has been demonstrated that doing so actually slows down the perl interpreter slightly; it has to slog through lots of POD to find the code at the compilation step. This occurs only upon on initial compilation, but it is significant enough that the overall recommendation by most perl brethren (and in Perl Best Practices) has been to place any POD after an __END__ marker. This way the compiler doesn't have to look at it at all, but perldoc can still find it. Also, acc to PBP, although the inline POD would seemingly be easier to take care of, apparently the opposite is true in most cases (though it can come down to styling differences). Interspersed code is much harder to maintain in a consistent state, tends to be choppier, and can be laid out in odd ways due to being scattered throughout the file. I know this can come down to a difference in style, but the arguments do make sense enough to me that in Biome I am pushing to have all docs after the __END__ marker. Lincoln already practices this within bioperl and Bio::Graphics, and I plan on moving much on my documentation similarly within my code in BioPerl. The additional comments in the PBP chapter "Documentation" are well- worth reading if you can get your hands on it. chris From rmb32 at cornell.edu Mon Aug 17 15:21:08 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:21:08 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A897564.2090203@cornell.edu> Hurrah! GSoC strikes again! Rob From rmb32 at cornell.edu Mon Aug 17 15:45:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:45:18 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <474354.59886.qm@web30408.mail.mud.yahoo.com> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A897B0E.7060208@cornell.edu> Yee Man Chan wrote: > As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. We're not pushing to spin it off because of code quality, we're pushing to spin it off because we're spinning everything off. The plan is to break BioPerl up into many discrete distributions on CPAN with the dependencies between them well-known and codified. This will make maintenance of BioPerl *much* easier in the long run. So this means that the plan of action should be 1.) get the code so that it's working on all platforms, 2.) create a CPAN distribution for it and put it on CPAN, 3.) remove it from bioperl-ext Also, doing a search for bioperl-ext on CPAN brings to light a couple of issues that probably need to be dealt with. To wit: 1.) there is an ancient version of bioperl-ext that probably needs to be removed, it's under ~birney's account. Thoughts on this? 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on bioperl-ext, which suggests that these really need to be split off, each with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the first case of this: * make a dir in the repos called Bio-Tools-HMM alongside bioperl-live, having trunk/, and branches/ subdirs * move Bio::Tools::HMM out of bioperl-live into that * move Bio::Ext::HMM stuff out of bioperl-ext into that * repeat with Bio::Tools::dpAlign and pSW, which would probably go together into a Bio-Tools-Align distro, I think Sounds like this is moving along nicely. Rob From rmb32 at cornell.edu Mon Aug 17 15:48:10 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:48:10 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <4A897BBA.2070204@cornell.edu> Also, I volunteer to make this branch and module machinery and such if you want. I just don't want to step on any ongoing development you guys are going in the bioperl-ext trunk. If you want me to do it, just say the word, either here or in #bioperl. Rob Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So if >> the HMM doesn't meet that standard, I am ok with it being spinned off. > > We're not pushing to spin it off because of code quality, we're pushing > to spin it off because we're spinning everything off. The plan is to > break BioPerl up into many discrete distributions on CPAN with the > dependencies between them well-known and codified. This will make > maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a couple of > issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs to be > removed, it's under ~birney's account. Thoughts on this? > > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on > bioperl-ext, which suggests that these really need to be split off, each > with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the > first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside > bioperl-live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Mon Aug 17 16:58:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 11:58:24 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> On Aug 17, 2009, at 10:45 AM, Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So >> if the HMM doesn't meet that standard, I am ok with it being >> spinned off. > > We're not pushing to spin it off because of code quality, we're > pushing to spin it off because we're spinning everything off. The > plan is to break BioPerl up into many discrete distributions on CPAN > with the dependencies between them well-known and codified. This > will make maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a > couple of issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs > to be removed, it's under ~birney's account. Thoughts on this? This subject just recently popped up on perl.module.authors, more in relation to abandonware, but a similar thing. Andreas has indicate there is an abandoned flag that can be set so it's worth looking into, but using it requires another release. I have been in contact with that group on ideas for the split; libwin32 did the same thing, so I'll contact Jan Dubois on the matter for some pointers. > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend > on bioperl-ext, which suggests that these really need to be split > off, each with the Bio::Ext::Modules they depend on. > Bio::Tools::HMM could be the first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside bioperl- > live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob Yes, that's essentially the idea. The more significant impact of this (both here and in core) is allowing updates to be made as needed, and not be blocked due to issues in unrelated modules. We have been waiting years for fixes to pSW, Staden::read, Align w/o progress, which has hindered overall releases of bioperl-ext. Similar problems exist in bp-core. Re: bioperl-ext, BioLib has rendered some of those implementations obsolete. I would rather do that incrementally (individual implementations) vs. wait for a full-blown bioperl-ext release, so splitting these up makes that possible. chris From robert.bradbury at gmail.com Mon Aug 17 17:14:57 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 17 Aug 2009 13:14:57 -0400 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers Message-ID: One of the questions facing people working in bioinformatics is "How do we present information so that it can be effectively interpreted by non-informatics specialists?" Now, my expertise lies in computer science (esp. O.S. & databases) and as a second vocation the biology of aging (DNA damage & repair, to a lesser extent cancer and pathologies of aging, etc.). Now by my estimate there are perhaps 5 people in the world who are able to effectively discuss computer science X aging (gerontology) [3]. There are perhaps several dozen people where those areas, esp aging, may overlap with DNA damage & repair. But then there is a wider audience of perhaps a few hundred members of AGE, and maybe a thousand or so who are members of the scientific subgroup of GSA. But most of those individuals are "old school" scientists who know relatively little about bioinformatics. So one has barriers to presenting bioinformatics information in ways that they can use usefully. I have found in my limited experience that homology graphs of conserved protein domains, such as those displayed in HomloGene or those in Ensembl (including phylogeny graphs) can be quite useful in reaching interesting conclusions. For example, double strand break repair processes which may involve 8-10 relatively conserved proteins, may have a critical role in the mechanisms of aging. In particular two of those proteins, WRN & DCLRE1C (Artemis) contain complementary exonuclease activities which chew up the DNA in order to prepare the strands for ligation. Of course, programmers may appreciate better than gerontologists the significance of deleting random bytes from instruction sequences in ones code. At the recent AGE meeting in June several discussions arose as to possible differences in "aging" in yeast, *C. elegans* and mammals. [1]. A quick database search showed that *C. elegans* seems to be lacking the exonuclease domain on the WRN homologue and may be missing a DCLRE1C homologue entirely (which if true would lead to conclusions that aging in *C. elegans* may be fundamentally different from aging in vertebrates). Explaining this to researchers can best be done using pictures. I've been through PubMed and have several papers (NAR / BMC Bioinformatics) regarding programs to do homology comparisons and phylogeny trees. However these seem to lean towards producing less condensed bioinformatics-ish information. I do not know however whether the outputs from databases like PubMed HomoloGene or Ensembl have been packaged in tools that might be part of BioPerl. I am interested in programs that can be run on a regular basis to draw "pretty pictures" that can be used for publication and/or internet browsing. In particular I'm interested in running such programs on species of interest to various gerontological communities [2] which involves subsets of databases which seem to be scattered around the world. Thanks. 1. Of course there has been lots of discussion and rationalization over the last 15+ years about how "aging" is largely the same in more complex and simpler organisms -- in part to justify sequencing some organisms and in part to justify funding research at certain laboratories. A closer examination based on some of the complete and emerging genome sequences may suggest this is a very swampy discussion. 2. For example, nematode DNA repair gene comparisons would be interesting to nematode researchers, insect DNA repair gene comparisons to insect researchers, both to invertebrate researchers, etc. 3. The recently published textbooks *Aging of the Genome* by Jan Vijg and the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg *et al*, go a long way towards moving these areas from the stacks of research libraries into areas for more general discussion. Both volumes deal extensively with the ~150 DNA repair genes. From cjfields at illinois.edu Mon Aug 17 17:15:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 12:15:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897BBA.2070204@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <4A897BBA.2070204@cornell.edu> Message-ID: I say go for it if Yee Man is okay with the idea. It gets the code out there that much faster. This also doesn't depend on core being split up (only need a 'requires' bioperl 1.6.0). chris On Aug 17, 2009, at 10:48 AM, Robert Buels wrote: > Also, I volunteer to make this branch and module machinery and such > if you want. I just don't want to step on any ongoing development > you guys are going in the bioperl-ext trunk. > > If you want me to do it, just say the word, either here or in > #bioperl. > > Rob > > Robert Buels wrote: >> Yee Man Chan wrote: >>> As to the release, my thinking is that I do understand that >>> your desire to maintain a high level of quality in BioPerl code >>> base. So if the HMM doesn't meet that standard, I am ok with it >>> being spinned off. >> We're not pushing to spin it off because of code quality, we're >> pushing to spin it off because we're spinning everything off. The >> plan is to break BioPerl up into many discrete distributions on >> CPAN with the dependencies between them well-known and codified. >> This will make maintenance of BioPerl *much* easier in the long run. >> So this means that the plan of action should be >> 1.) get the code so that it's working on all platforms, >> 2.) create a CPAN distribution for it and put it on CPAN, >> 3.) remove it from bioperl-ext >> Also, doing a search for bioperl-ext on CPAN brings to light a >> couple of issues that probably need to be dealt with. To wit: >> 1.) there is an ancient version of bioperl-ext that probably needs >> to be removed, it's under ~birney's account. Thoughts on this? >> 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend >> on bioperl-ext, which suggests that these really need to be split >> off, each with the Bio::Ext::Modules they depend on. >> Bio::Tools::HMM could be the first case of this: >> * make a dir in the repos called Bio-Tools-HMM alongside bioperl- >> live, having trunk/, and branches/ subdirs >> * move Bio::Tools::HMM out of bioperl-live into that >> * move Bio::Ext::HMM stuff out of bioperl-ext into that >> * repeat with Bio::Tools::dpAlign and pSW, which would probably >> go together into a Bio-Tools-Align distro, I think >> Sounds like this is moving along nicely. >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chmille4 at gmail.com Mon Aug 17 18:44:09 2009 From: chmille4 at gmail.com (Chase Miller) Date: Mon, 17 Aug 2009 14:44:09 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A897564.2090203@cornell.edu> References: <4A897564.2090203@cornell.edu> Message-ID: <991fb8210908171144t3f7107f0ldaf02dfdc762ae27@mail.gmail.com> Thanks! It was a great experience. I couldn't have done it without Mark who was a fantastic mentor. cheers, Chase On Mon, Aug 17, 2009 at 11:21 AM, Robert Buels wrote: > Hurrah! GSoC strikes again! > > Rob > From rmb32 at cornell.edu Mon Aug 17 20:32:14 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:32:14 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> Message-ID: <4A89BE4E.7090901@cornell.edu> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro at Bio-Tools-HMM in the repo. The tests are not passing, I think that some bugs need to be fixed in the logic of things. Yee Man, could you have a look? To download the newly repackaged code: svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM perl Build.PL; ./Build test Please check that things are compiling OK, check the test logic, upgrade the tests to use Test::More, and get the tests to the point where they are passing. At that point, it should be ready for CPAN, but we need to decide how we want to coordinate that with releases of bioperl-live and bioperl-ext. Rob From rmb32 at cornell.edu Mon Aug 17 20:45:42 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:45:42 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A89C176.3050109@cornell.edu> Mark A. Jensen wrote: > wrote tests (which pass!), complete POD, and a HOWTO (at The tests for this are depending on Bio::Phylo and fail if it's not installed. Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a "recommended" module, or what? Gotta clarify our dependencies. Rob From cjfields at illinois.edu Mon Aug 17 20:54:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 15:54:05 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: On Aug 17, 2009, at 3:45 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not > installed. Are we going to add Bio::Phylo as a bioperl dependency, > or band-aid it as a "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob 'recommends', should skip all tests as a 'pass' with message that 'Bio::Phylo is required' or somesuch. chris From maj at fortinbras.us Mon Aug 17 20:55:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 16:55:19 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: <3D65CA5234EB4BDF892F280D575FB01D@NewLife> I meant to add a skip tests on a runtime check for bio::phylo. Gotta do that. It's necessary only for these modules. ----- Original Message ----- From: "Robert Buels" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Rutger Vos" ; "Chase Miller" Sent: Monday, August 17, 2009 4:45 PM Subject: Re: [Bioperl-l] new NeXML I/O modules > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not installed. > Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a > "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Aug 17 21:22:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:22:00 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89BE4E.7090901@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> Message-ID: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Still seeing that odd warning popping up: cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line 185. Have you tried using Yee Man's original Makefile.PL to see if it works better? There appear to be some differences in the compilation, including a linking warning popping up. chris On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro > at Bio-Tools-HMM in the repo. The tests are not passing, I think > that some bugs need to be fixed in the logic of things. > > Yee Man, could you have a look? To download the newly repackaged > code: > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ > bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM > > perl Build.PL; ./Build test > > Please check that things are compiling OK, check the test logic, > upgrade the tests to use Test::More, and get the tests to the point > where they are passing. > > At that point, it should be ready for CPAN, but we need to decide > how we want to coordinate that with releases of bioperl-live and > bioperl-ext. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 21:28:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:28:05 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Take that back. Yes the 'FL' warning is still there, but no tests are run b/c (simply put) there are no regression tests (no use of Test or Test::More). If you run './Build test --verbose' you can see the run, but no test output. That should be easy to fix, though. chris On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > Still seeing that odd warning popping up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / > Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line > 185. > > Have you tried using Yee Man's original Makefile.PL to see if it > works better? There appear to be some differences in the > compilation, including a linking warning popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro >> at Bio-Tools-HMM in the repo. The tests are not passing, I think >> that some bugs need to be fixed in the logic of things. >> >> Yee Man, could you have a look? To download the newly repackaged >> code: >> >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >> bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM >> >> perl Build.PL; ./Build test >> >> Please check that things are compiling OK, check the test logic, >> upgrade the tests to use Test::More, and get the tests to the point >> where they are passing. >> >> At that point, it should be ready for CPAN, but we need to decide >> how we want to coordinate that with releases of bioperl-live and >> bioperl-ext. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 22:26:19 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 17:26:19 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <419432.62970.qm@web30403.mail.mud.yahoo.com> References: <419432.62970.qm@web30403.mail.mud.yahoo.com> Message-ID: <227EADF3-D769-413D-B1BF-22C919C8D097@illinois.edu> Yee Man, Will look into that. I do recall that disappearing last night, so I'll go look at the commit log. I have committed some regression tests using Bio::Root::Test. This'll need to be extensively tested b/c we're comparing floating point numbers, though I do use our custom float_is() test to run these (so we only compare first six signif). These are passing for me on 64bit perl 5.10.0; I may try these on a local 64bit linux (I need to set up bioperl on it first). chris On Aug 17, 2009, at 5:19 PM, Yee Man Chan wrote: > I believe this warnings should have been fixed with the latest Bio/ > Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/ > HMM.pm? I noticed that there are two pairs of "use strict" and "use > warnings" in this version. :P > > Yee Man > > --- On Mon, 8/17/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "BioPerl List" , "Yee Man Chan" > > >> Date: Monday, August 17, 2009, 2:22 PM >> Still seeing that odd warning popping >> up: >> >> cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose >> t/001_basics.t .. Argument "FL" isn't numeric in numeric lt >> (<) at >> /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm >> line 185. >> >> Have you tried using Yee Man's original Makefile.PL to see >> if it works better? There appear to be some >> differences in the compilation, including a linking warning >> popping up. >> >> chris >> >> On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: >> >>> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into >> a new distro at Bio-Tools-HMM in the repo. The tests >> are not passing, I think that some bugs need to be fixed in >> the logic of things. >>> >>> Yee Man, could you have a look? To download the >> newly repackaged code: >>> >>> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/ >>> bioperl/Bio-Tools-HMM/trunk >> Bio-Tools-HMM >>> >>> perl Build.PL; ./Build test >>> >>> Please check that things are compiling OK, check the >> test logic, upgrade the tests to use Test::More, and get the >> tests to the point where they are passing. >>> >>> At that point, it should be ready for CPAN, but we >> need to decide how we want to coordinate that with releases >> of bioperl-live and bioperl-ext. >>> >>> Rob >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From abhishek.vit at gmail.com Mon Aug 17 22:53:19 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 18:53:19 -0400 Subject: [Bioperl-l] Error Copying Hashes Message-ID: Hi Guys I think this one should be appropriate for here. I am trying to copy a hash (spaced out below for the sake of readability} % { $OUTPUT->{$dir}->{'file'}->{$file}->{'additive'} } =%ADDITIVE_COUNT; ## Where %ADDITIVE_COUNT is a simple hash. (key/value) No references : I am getting this error :- Odd number of elements in hash assignment at ./assessCoverage.pl line 258 Seeing the dump of hash I see this $VAR1 = { '/local/seq/' => { 'read_len' => 36, 'file' => { 's_3_sorted.txt' => { 'additive' => { '8979/16384' => undef #### I dont understand this behavior. Something unusual is going on ????? }}}}} From rmb32 at cornell.edu Mon Aug 17 23:00:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:00:00 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <360578.66990.qm@web30403.mail.mud.yahoo.com> References: <360578.66990.qm@web30403.mail.mud.yahoo.com> Message-ID: <4A89E0F0.8010307@cornell.edu> Yee Man Chan wrote: > I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Please read my email above with instructions for checkout out the new Bio-Tools-HMM component, where Bio::Tools::HMM has been moved. Please do not add the Bio::Tools::HMM module back into bioperl-live. I think you might be confused about the functions of 'svn add', 'svn commit', etc, because I don't see any actual addition of the module in the commit logs. Please read through the SVN manual at http://svnbook.red-bean.com/ if you need clarification. Rob From rmb32 at cornell.edu Mon Aug 17 23:30:07 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:30:07 -0700 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: References: Message-ID: <4A89E7FF.1020603@cornell.edu> Well for one thing, it looks like somewhere a hash is getting accidentally evaluated in scalar context. '8979/16384' is a typical result of doing, for example, my $x = %some_hash; This might not be the proximate cause of your problem, it would be better to post your whole script somewhere so people can look over it. That said, this isn't the right list for this, this list is specifically for discussing the BioPerl toolkit, not just perl that is used in biology. IRC probably the quickest place to get perl help, try the #perl-help channel on the server irc.perl.org. Otherwise, you might try asking on a general perl mailing list, there seem to be some listed at http://perl-begin.org/mailing-lists/ Best of luck! Rob From abhishek.vit at gmail.com Mon Aug 17 23:33:41 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 17 Aug 2009 19:33:41 -0400 Subject: [Bioperl-l] Error Copying Hashes In-Reply-To: <4A89E7FF.1020603@cornell.edu> References: <4A89E7FF.1020603@cornell.edu> Message-ID: Ok great. Thanks for pointing me to the right places to post later. best, -Abhi On Mon, Aug 17, 2009 at 7:30 PM, Robert Buels wrote: > Well for one thing, it looks like somewhere a hash is getting accidentally > evaluated in scalar context. '8979/16384' is a typical result of doing, for > example, my $x = %some_hash; This might not be the proximate cause of your > problem, it would be better to post your whole script somewhere so people > can look over it. > > That said, this isn't the right list for this, this list is specifically > for discussing the BioPerl toolkit, not just perl that is used in biology. > > IRC probably the quickest place to get perl help, try the #perl-help > channel on the server irc.perl.org. > > Otherwise, you might try asking on a general perl mailing list, there seem > to be some listed at > http://perl-begin.org/mailing-lists/ > > Best of luck! > > Rob > From rmb32 at cornell.edu Mon Aug 17 23:42:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 16:42:21 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A87275C.5040300@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> Message-ID: <4A89EADD.9050509@cornell.edu> I'm digging into the second item on implementation plan, having mostly finished splitting off Bio::FeatureIO (in a branch): * Rename some TypedSeqFeatureI methods as suggested in Hilmar's post Where Hilmar's post is at http://article.gmane.org/gmane.comp.lang.perl.bio.general/15846 Now, he refers to an interesting thing in there that I haven't heard discussed before, which is the concept of having the feature's source_tag by typed with an ontology term also, as source_term(). I can see how this might be a good idea, or it might be overkill. Anybody have thoughts on having feature _sources_ strongly typed with ontology terms? Rob From Kevin.M.Brown at asu.edu Tue Aug 18 00:36:34 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 17 Aug 2009 17:36:34 -0700 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> The obfuscator does help, but even it is a little sparse on data for modules. Especially information on the realities of the returned data from a method call. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger Sent: Monday, August 17, 2009 6:04 AM To: Mark A. Jensen; BioPerl List Subject: Re: [Bioperl-l] on BP documentation > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From sidd.basu at gmail.com Tue Aug 18 11:01:03 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 18 Aug 2009 06:01:03 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> Message-ID: <20090818110102.GA27010@seinfeld> Putting it in the bioperl list, makes more sense here, On Wed, 12 Aug 2009, Chris Fields wrote: > (BTW, this is re: the reimplementation of major chunks of BioPerl using > Moose, Biome: http://github.com/cjfields/biome/tree/) > > Locations should use a Role (specifically, Biome::Role::Range), so > start/end/strand should be attributes, not methods. With attributes the > best way to do this is probably with a builder, and lazily (start > requires end, and vice versa). Factor out the common code as Tomas > indicates. BTW, the $self->throw() is akin to BioPerl's $self->throw() > exception handling; it simply catches any exceptions and passes them to > the metaclass exception handling. > > I've been thinking about making the Range role abstract for this very > reason (or defining very basic attributes); something like: > > ---------------------------- > > package Bio::Role::Range; > > requires qw(_build_start _build_end _build_strand); > > # also require other methods which need to be defined in implementation > > has 'start' => ( > isa => 'Int', > is => 'rw', > builder => '_build_start', > lazy => 1 > ); > > # same for end, strand (except strand has a different isa via > MooseX::Types) > .... > > package Bio::Location::Foo; > > with 'Bio::Role::Range'; > > sub _build_start { > # for location-specific start > } > > sub _build_end { > # for location-specific end > } > > sub _build_strand { > # for location-specific strand > } > > sub _common_build_method { > # factor out common code here, call from other builders > } > > ---------------------------- This plan makes things much clearer. Currently the BioMe::Role::Location has a 'requires' keyword and rest of the location modules consume that role to have its own implementation. At this point on BioMe::Location::Atomic has attribute based 'start' and 'end' implememtation. I got a bit confused because in current bioperl 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when i am trying to follow that path in BioMe it has to override that method. So, my question is do all the location modules really needs to inherits from each other. I am totally aware about the origianl design ideas but it would be better to have a flatten hierarchy if possible. One more thing, what about putting the 'start', 'end' and the other common base attributes in BioMe::Role::Location instead of BioMe::Role::Range. I am not sure which would be correct from bioperl stand of view, just throwing out an idea. > > Also, I think the Coordinate-related stuff should be simplified down to a > trait or an attribute; they bring in way too much overhead in bioperl w/o > much added value. You mean instead of having 'builder' method, having a specialized traits handling those. That sounds like even better. -siddhartha > > And now back to your regular Moose-related broadcast... > > chris > > On Aug 11, 2009, at 9:27 PM, Siddhartha Basu wrote: > > > Hi, > > In one my classes i have this boilerplate code block that is repeated > > all > > over .... > > > > sub start { > > my ( $self, $value ) = @_; > > $self->{'_start'} = $value if defined $value; > > > > ## -- from here > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > return $self->{'_start'}; > > ## -- here > > > > } > > > > then again .... > > > > sub end { > > my ( $self, $value ) = @_; > > > > $self->{'_end'} = $value if defined $value; > > > > #assume end is the same as start if not defined > > if ( !defined $self->{'_end'} ) { > > if ( !defined $self->{'_start'} ) { > > $self->warn('Calling end without a defined start > > position'); > > return; > > } > > $self->warn('Setting start equal to end'); > > $self->{'_end'} = $self->{'_start'}; > > } > > > > ## ---- > > > > $self->throw( "Only adjacent residues when location type " > > . "is IN-BETWEEN. Not [" > > . $self->{'_start'} > > . "] and [" > > . $self->{'_end'} > > . "]" ) > > if defined $self->{'_start'} > > && defined $self->{'_end'} > > && $self->location_type eq 'IN-BETWEEN' > > && ( $self->{'_end'} - 1 != $self->{'_start'} ); > > > > return $self->{'_end'}; > > #--------- > > } > > > > > > Is there any way moose can be used here for more code resuage. I > > thought > > about converted it to a type but still couldn't figure out how that > > can > > be done. > > > > > > thanks, > > -siddhartha > From deequan at gmail.com Fri Aug 14 19:02:06 2009 From: deequan at gmail.com (David Quan) Date: Fri, 14 Aug 2009 15:02:06 -0400 Subject: [Bioperl-l] bioperl capability Message-ID: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Hello, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From ymc at yahoo.com Sat Aug 15 02:57:15 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 19:57:15 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <85143.35343.qm@web30404.mail.mud.yahoo.com> Hi Chris I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) Please let me know if it works for you. Sorry for the bug... Yee Man --- On Fri, 8/14/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Friday, August 14, 2009, 8:31 AM > Yee Man, > > I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 > 64-bit) and on dev.open-bio.org (which is perl 5.8.8, > appears to be 32-bit).? The patch results in cleaning > up warnings for 5.10.0 but results in similar warnings for > 5.8.8 (linux or OS X). > > On OS X perl 5.8.8, this sometimes passes (note the first > attempt fails, the second succeeds), so it's not entirely a > 32-bit issue: > > http://gist.github.com/167860 > > OS X and perl 5.10.0, this always fails as the previous > gist shows, but demonstrates similar behavior (multiple > attempts to test get different responses): > > http://gist.github.com/167542 > > On linux, everything passes with or w/o the patched files > (patched files have warnings as indicated above): > > Specs for all three perl executables (they vary a bit): > > http://gist.github.com/167883 > > chris > > On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > > > Ah.. I find that the typemap can become as simple as > this > > ===================== > > TYPEMAP > > HMM *? ? T_PTROBJ > > ===================== > > > > Then the generated HMM.c will have a function called > INT2PTR to do the pointer conversion. I believe this should > solve the warnings. > > > > Attached are the updated HMM.xs and typemap. Can > someone with a 64-bit machine give it a try? > > > > Thank you > > Yee Man > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >> Date: Thursday, August 13, 2009, 5:31 PM > >> (just to point out to everyone, Yee > >> Man's contact information was in the POD) > >> > >> Yee Man, > >> > >> I have the output in the below link: > >> > >> http://gist.github.com/167542 > >> > >> There are similar problems popping up on 32- and > 64-bit > >> perl 5.10.0, Mac OS X 10.5.? Haven't had time > to debug > >> it unfortunately. > >> > >> I think we should seriously consider spinning this > code off > >> into it's own distribution for CPAN.? It's > >> unfortunately bit-rotting away in > bioperl-ext.? If you > >> want to continue supporting it I can help set that > up. > >> > >> chris > >> > >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > >> > >>> Hi > >>> > >>>? ???So is this an HMM only > problem? Or does > >> it apply to other bioperl-ext modules? > >>> > >>>? ???What exactly are the > compilation errors > >> for HMM? I believe my implementation is just a > simple one > >> based on Rabiner's paper. > >>> > >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>> > >>>? ???I don't think I did > anything fancy that > >> makes it machine dependent or non-ANSI C. > >>> > >>> Yee Man > >>> > >>> --- On Thu, 8/13/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Jonny Dalzell" , > >> "BioPerl List" , > >> "Yee Man Chan" > >>>> Date: Thursday, August 13, 2009, 3:18 PM > >>>> > >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels > wrote: > >>>> > >>>>> Jonny Dalzell wrote: > >>>>>> Is it ridiculous of me to expect > ubuntu to > >> take > >>>> care of this for me?? How do > >>>>>> I go about compiling the HMM? > >>>>> Yes.? This is a very specialized > thing > >> that > >>>> you're doing, and Ubuntu does not have > the > >> resources to > >>>> package every single thing. > >>>>> > >>>>> Unfortunately, it looks like > bioperl-ext > >> package is > >>>> not installable under Ubuntu 9.04 anyway, > which is > >> what I'm > >>>> running.? For others on this list, > if > >> somebody is > >>>> interested in doing maintaining it, I'd be > happy > >> to help out > >>>> by testing on Debian-based Linux > platforms. > >> We need to > >>>> clarify this package's maintenance status: > if > >> there is > >>>> nobody interested in maintaining it, I > would > >> recommend that > >>>> bioperl-ext be removed from distribution. > >> It's not in > >>>> anybody's interest to have unmaintained > software > >> out there > >>>> causing confusion. > >>>> > >>>> I have cc'd Yee Man Chan for this.? > If there > >> isn't a > >>>> response or the message bounces, we do one > of two > >> things: > >>>> > >>>> 1) consider it deprecated (probably > safest). > >>>> 2) spin it out into a separate module. > >>>> > >>>> Just tried to comile it myself and am > getting > >> errors (using > >>>> 64bit perl 5.10), so I think, unless > someone wants > >> to take > >>>> this on, option #1 is best. > >>>> > >>>>> So Jonny, in short, I would say "do > not use > >>>> bioperl-ext". > >>>> > >>>> In general, that's a safe bet.? We're > moving > >> most of > >>>> our C/C++ bindings to BioLib. > >>>> > >>>>> Step back.? What are you trying > to > >>>> accomplish?? Chris already > recommended some > >> alternative > >>>> methods in his email of 8/11 on this > >> subject.? Perhaps > >>>> we can guide you to some software that is > >> actively > >>>> maintained and will meet your needs. > >>>>> > >>>>> Rob > >>>> > >>>> Exactly.? Lots of other (better > supported!) > >> options > >>>> out there.? HMMER, SeqAn, and > others. > >>>> > >>>> chris > >>>> > >>> > >>> > >>> > >> > >> > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam?? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5614 bytes Desc: not available URL: From ymc at yahoo.com Sun Aug 16 01:23:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 18:23:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> Message-ID: <241652.96493.qm@web30404.mail.mud.yahoo.com> I just committed HMM.xs and typemap to SVN. Can you test it to confirm it works in 64-bit machines? Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Saturday, August 15, 2009, 12:11 PM > I'm not sure, but it makes more sense > to commit these changes directly.? Yee, need us to set > you up with a commit bit?? If so, fill out the > information on this page: > > http://www.bioperl.org/wiki/SVN_Account_Request > > and forward it to support at open-bio.org.? > I'll sponsor you. > > chris > > On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > > > The usual procedure for developing code is to exchange > code via commits to a version control system.? Yee, do > you know how to use Subversion? Does Yee need a commit bit? > > > > Rob > > > > Yee Man Chan wrote: > >> Hi Chris > >>???I find that there is a memory > access bug in my code. Attached is the fixed HMM.xs. This > file together with the simpler typemap should fix all > problems. (I hope..) > >>???Please let me know if it works > for you. > >> Sorry for the bug... > >> Yee Man > >> --- On Fri, 8/14/09, Chris Fields > wrote: > >>> From: Chris Fields > >>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext package on WinVista? > >>> To: "Yee Man Chan" > >>> Cc: "Robert Buels" , > "Jonny Dalzell" , > "BioPerl List" > >>> Date: Friday, August 14, 2009, 8:31 AM > >>> Yee Man, > >>> > >>> I tested this out locally (perl 5.8.8 32-bit, > perl 5.10.0 > >>> 64-bit) and on dev.open-bio.org (which is perl > 5.8.8, > >>> appears to be 32-bit).? The patch results > in cleaning > >>> up warnings for 5.10.0 but results in similar > warnings for > >>> 5.8.8 (linux or OS X). > >>> > >>> On OS X perl 5.8.8, this sometimes passes > (note the first > >>> attempt fails, the second succeeds), so it's > not entirely a > >>> 32-bit issue: > >>> > >>> http://gist.github.com/167860 > >>> > >>> OS X and perl 5.10.0, this always fails as the > previous > >>> gist shows, but demonstrates similar behavior > (multiple > >>> attempts to test get different responses): > >>> > >>> http://gist.github.com/167542 > >>> > >>> On linux, everything passes with or w/o the > patched files > >>> (patched files have warnings as indicated > above): > >>> > >>> Specs for all three perl executables (they > vary a bit): > >>> > >>> http://gist.github.com/167883 > >>> > >>> chris > >>> > >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan > wrote: > >>> > >>>> Ah.. I find that the typemap can become as > simple as > >>> this > >>>> ===================== > >>>> TYPEMAP > >>>> HMM *? ? T_PTROBJ > >>>> ===================== > >>>> > >>>> Then the generated HMM.c will have a > function called > >>> INT2PTR to do the pointer conversion. I > believe this should > >>> solve the warnings. > >>>> Attached are the updated HMM.xs and > typemap. Can > >>> someone with a 64-bit machine give it a try? > >>>> Thank you > >>>> Yee Man > >>>> --- On Thu, 8/13/09, Chris Fields > >>> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >>> package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >>> "Jonny Dalzell" , > >>> "BioPerl List" > >>>>> Date: Thursday, August 13, 2009, 5:31 > PM > >>>>> (just to point out to everyone, Yee > >>>>> Man's contact information was in the > POD) > >>>>> > >>>>> Yee Man, > >>>>> > >>>>> I have the output in the below link: > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> There are similar problems popping up > on 32- and > >>> 64-bit > >>>>> perl 5.10.0, Mac OS X 10.5.? > Haven't had time > >>> to debug > >>>>> it unfortunately. > >>>>> > >>>>> I think we should seriously consider > spinning this > >>> code off > >>>>> into it's own distribution for > CPAN.? It's > >>>>> unfortunately bit-rotting away in > >>> bioperl-ext.? If you > >>>>> want to continue supporting it I can > help set that > >>> up. > >>>>> chris > >>>>> > >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man > Chan wrote: > >>>>> > >>>>>> Hi > >>>>>> > >>>>>>? ???So is this > an HMM only > >>> problem? Or does > >>>>> it apply to other bioperl-ext > modules? > >>>>>>? ???What > exactly are the > >>> compilation errors > >>>>> for HMM? I believe my implementation > is just a > >>> simple one > >>>>> based on Rabiner's paper. > >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>> > >>>>>>? ???I don't > think I did > >>> anything fancy that > >>>>> makes it machine dependent or non-ANSI > C. > >>>>>> Yee Man > >>>>>> > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >>> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Robert Buels" > >>>>>>> Cc: "Jonny Dalzell" , > >>>>> "BioPerl List" , > >>>>> "Yee Man Chan" > >>>>>>> Date: Thursday, August 13, > 2009, 3:18 PM > >>>>>>> > >>>>>>> On Aug 13, 2009, at 4:37 PM, > Robert Buels > >>> wrote: > >>>>>>>> Jonny Dalzell wrote: > >>>>>>>>> Is it ridiculous of me > to expect > >>> ubuntu to > >>>>> take > >>>>>>> care of this for me?? How > do > >>>>>>>>> I go about compiling > the HMM? > >>>>>>>> Yes.? This is a very > specialized > >>> thing > >>>>> that > >>>>>>> you're doing, and Ubuntu does > not have > >>> the > >>>>> resources to > >>>>>>> package every single thing. > >>>>>>>> Unfortunately, it looks > like > >>> bioperl-ext > >>>>> package is > >>>>>>> not installable under Ubuntu > 9.04 anyway, > >>> which is > >>>>> what I'm > >>>>>>> running.? For others on > this list, > >>> if > >>>>> somebody is > >>>>>>> interested in doing > maintaining it, I'd be > >>> happy > >>>>> to help out > >>>>>>> by testing on Debian-based > Linux > >>> platforms. > >>>>> We need to > >>>>>>> clarify this package's > maintenance status: > >>> if > >>>>> there is > >>>>>>> nobody interested in > maintaining it, I > >>> would > >>>>> recommend that > >>>>>>> bioperl-ext be removed from > distribution. > >>>>> It's not in > >>>>>>> anybody's interest to have > unmaintained > >>> software > >>>>> out there > >>>>>>> causing confusion. > >>>>>>> > >>>>>>> I have cc'd Yee Man Chan for > this. > >>> If there > >>>>> isn't a > >>>>>>> response or the message > bounces, we do one > >>> of two > >>>>> things: > >>>>>>> 1) consider it deprecated > (probably > >>> safest). > >>>>>>> 2) spin it out into a separate > module. > >>>>>>> > >>>>>>> Just tried to comile it myself > and am > >>> getting > >>>>> errors (using > >>>>>>> 64bit perl 5.10), so I think, > unless > >>> someone wants > >>>>> to take > >>>>>>> this on, option #1 is best. > >>>>>>> > >>>>>>>> So Jonny, in short, I > would say "do > >>> not use > >>>>>>> bioperl-ext". > >>>>>>> > >>>>>>> In general, that's a safe > bet.? We're > >>> moving > >>>>> most of > >>>>>>> our C/C++ bindings to BioLib. > >>>>>>> > >>>>>>>> Step back.? What are > you trying > >>> to > >>>>>>> accomplish?? Chris > already > >>> recommended some > >>>>> alternative > >>>>>>> methods in his email of 8/11 > on this > >>>>> subject.? Perhaps > >>>>>>> we can guide you to some > software that is > >>>>> actively > >>>>>>> maintained and will meet your > needs. > >>>>>>>> Rob > >>>>>>> Exactly.? Lots of other > (better > >>> supported!) > >>>>> options > >>>>>>> out there.? HMMER, SeqAn, > and > >>> others. > >>>>>>> chris > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > __________________________________________________ > >>>> Do You Yahoo!? > >>>> Tired of spam?? Yahoo! Mail has the > best spam > >>> protection around > >>>> http://mail.yahoo.com > >>> > _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > > > > > > --Robert Buels > > Bioinformatics Analyst, Sol Genomics Network > > Boyce Thompson Institute for Plant Research > > Tower Rd > > Ithaca, NY? 14853 > > Tel: 503-889-8539 > > rmb32 at cornell.edu > > http://www.sgn.cornell.edu > > From ymc at yahoo.com Sun Aug 16 04:32:19 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sat, 15 Aug 2009 21:32:19 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <846546.73578.qm@web30404.mail.mud.yahoo.com> When are you going to release 1.6? Maybe let me work on it before it releases. If it doesn't resolve the problem, then we can think about other alternatives. Also, please show me the latest errors you have for 5.10.0. Thanks Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 7:05 PM > I'm still seeing the same errors on > Mac OS X for 64-bit perl 5.10.0.? Mac OS X, native perl > (v5.8.8) passes fine now (as well as perl 5.8.8 on > dev.open-bio.org). > > I'm wondering if this is a problem with my local perl > build.? I'm very tempted to push the HMM-related code > into a separate distribution (bioperl-hmm) and make a CPAN > release out of it so it gets wider testing via CPAN testers; > it would just require a minimum bioperl 1.6 installation for > Bio::Tools::HMM and any related modules.? Yee, would > that be okay with you? > > chris > > On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > > > > I just committed HMM.xs and typemap to SVN. Can you > test it to confirm it works in 64-bit machines? > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 12:11 PM > >> I'm not sure, but it makes more sense > >> to commit these changes directly.? Yee, need > us to set > >> you up with a commit bit?? If so, fill out > the > >> information on this page: > >> > >> http://www.bioperl.org/wiki/SVN_Account_Request > >> > >> and forward it to support at open-bio.org. > >> I'll sponsor you. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > >> > >>> The usual procedure for developing code is to > exchange > >> code via commits to a version control > system.? Yee, do > >> you know how to use Subversion? Does Yee need a > commit bit? > >>> > >>> Rob > >>> > >>> Yee Man Chan wrote: > >>>> Hi Chris > >>>>? ? I find that there is a > memory > >> access bug in my code. Attached is the fixed > HMM.xs. This > >> file together with the simpler typemap should fix > all > >> problems. (I hope..) > >>>>? ? Please let me know if it > works > >> for you. > >>>> Sorry for the bug... > >>>> Yee Man > >>>> --- On Fri, 8/14/09, Chris Fields > >> wrote: > >>>>> From: Chris Fields > >>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext package on WinVista? > >>>>> To: "Yee Man Chan" > >>>>> Cc: "Robert Buels" , > >> "Jonny Dalzell" , > >> "BioPerl List" > >>>>> Date: Friday, August 14, 2009, 8:31 > AM > >>>>> Yee Man, > >>>>> > >>>>> I tested this out locally (perl 5.8.8 > 32-bit, > >> perl 5.10.0 > >>>>> 64-bit) and on dev.open-bio.org (which > is perl > >> 5.8.8, > >>>>> appears to be 32-bit).? The patch > results > >> in cleaning > >>>>> up warnings for 5.10.0 but results in > similar > >> warnings for > >>>>> 5.8.8 (linux or OS X). > >>>>> > >>>>> On OS X perl 5.8.8, this sometimes > passes > >> (note the first > >>>>> attempt fails, the second succeeds), > so it's > >> not entirely a > >>>>> 32-bit issue: > >>>>> > >>>>> http://gist.github.com/167860 > >>>>> > >>>>> OS X and perl 5.10.0, this always > fails as the > >> previous > >>>>> gist shows, but demonstrates similar > behavior > >> (multiple > >>>>> attempts to test get different > responses): > >>>>> > >>>>> http://gist.github.com/167542 > >>>>> > >>>>> On linux, everything passes with or > w/o the > >> patched files > >>>>> (patched files have warnings as > indicated > >> above): > >>>>> > >>>>> Specs for all three perl executables > (they > >> vary a bit): > >>>>> > >>>>> http://gist.github.com/167883 > >>>>> > >>>>> chris > >>>>> > >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man > Chan > >> wrote: > >>>>> > >>>>>> Ah.. I find that the typemap can > become as > >> simple as > >>>>> this > >>>>>> ===================== > >>>>>> TYPEMAP > >>>>>> HMM *? ? T_PTROBJ > >>>>>> ===================== > >>>>>> > >>>>>> Then the generated HMM.c will have > a > >> function called > >>>>> INT2PTR to do the pointer conversion. > I > >> believe this should > >>>>> solve the warnings. > >>>>>> Attached are the updated HMM.xs > and > >> typemap. Can > >>>>> someone with a 64-bit machine give it > a try? > >>>>>> Thank you > >>>>>> Yee Man > >>>>>> --- On Thu, 8/13/09, Chris Fields > > >>>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems with > >> Bioperl-ext > >>>>> package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>>> "Jonny Dalzell" , > >>>>> "BioPerl List" > >>>>>>> Date: Thursday, August 13, > 2009, 5:31 > >> PM > >>>>>>> (just to point out to > everyone, Yee > >>>>>>> Man's contact information was > in the > >> POD) > >>>>>>> > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I have the output in the below > link: > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> There are similar problems > popping up > >> on 32- and > >>>>> 64-bit > >>>>>>> perl 5.10.0, Mac OS X 10.5. > >> Haven't had time > >>>>> to debug > >>>>>>> it unfortunately. > >>>>>>> > >>>>>>> I think we should seriously > consider > >> spinning this > >>>>> code off > >>>>>>> into it's own distribution > for > >> CPAN.? It's > >>>>>>> unfortunately bit-rotting away > in > >>>>> bioperl-ext.? If you > >>>>>>> want to continue supporting it > I can > >> help set that > >>>>> up. > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 13, 2009, at 6:58 PM, > Yee Man > >> Chan wrote: > >>>>>>> > >>>>>>>> Hi > >>>>>>>> > >>>>>>>>? ? ? So is > this > >> an HMM only > >>>>> problem? Or does > >>>>>>> it apply to other bioperl-ext > >> modules? > >>>>>>>>? ? ? What > >> exactly are the > >>>>> compilation errors > >>>>>>> for HMM? I believe my > implementation > >> is just a > >>>>> simple one > >>>>>>> based on Rabiner's paper. > >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>> > >>>>>>>>? ? ? I > don't > >> think I did > >>>>> anything fancy that > >>>>>>> makes it machine dependent or > non-ANSI > >> C. > >>>>>>>> Yee Man > >>>>>>>> > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Robert Buels" > > >>>>>>>>> Cc: "Jonny Dalzell" > , > >>>>>>> "BioPerl List" , > >>>>>>> "Yee Man Chan" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 3:18 PM > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 4:37 PM, > >> Robert Buels > >>>>> wrote: > >>>>>>>>>> Jonny Dalzell > wrote: > >>>>>>>>>>> Is it > ridiculous of me > >> to expect > >>>>> ubuntu to > >>>>>>> take > >>>>>>>>> care of this for > me?? How > >> do > >>>>>>>>>>> I go about > compiling > >> the HMM? > >>>>>>>>>> Yes.? This is > a very > >> specialized > >>>>> thing > >>>>>>> that > >>>>>>>>> you're doing, and > Ubuntu does > >> not have > >>>>> the > >>>>>>> resources to > >>>>>>>>> package every single > thing. > >>>>>>>>>> Unfortunately, it > looks > >> like > >>>>> bioperl-ext > >>>>>>> package is > >>>>>>>>> not installable under > Ubuntu > >> 9.04 anyway, > >>>>> which is > >>>>>>> what I'm > >>>>>>>>> running.? For > others on > >> this list, > >>>>> if > >>>>>>> somebody is > >>>>>>>>> interested in doing > >> maintaining it, I'd be > >>>>> happy > >>>>>>> to help out > >>>>>>>>> by testing on > Debian-based > >> Linux > >>>>> platforms. > >>>>>>> We need to > >>>>>>>>> clarify this > package's > >> maintenance status: > >>>>> if > >>>>>>> there is > >>>>>>>>> nobody interested in > >> maintaining it, I > >>>>> would > >>>>>>> recommend that > >>>>>>>>> bioperl-ext be removed > from > >> distribution. > >>>>>>> It's not in > >>>>>>>>> anybody's interest to > have > >> unmaintained > >>>>> software > >>>>>>> out there > >>>>>>>>> causing confusion. > >>>>>>>>> > >>>>>>>>> I have cc'd Yee Man > Chan for > >> this. > >>>>> If there > >>>>>>> isn't a > >>>>>>>>> response or the > message > >> bounces, we do one > >>>>> of two > >>>>>>> things: > >>>>>>>>> 1) consider it > deprecated > >> (probably > >>>>> safest). > >>>>>>>>> 2) spin it out into a > separate > >> module. > >>>>>>>>> > >>>>>>>>> Just tried to comile > it myself > >> and am > >>>>> getting > >>>>>>> errors (using > >>>>>>>>> 64bit perl 5.10), so I > think, > >> unless > >>>>> someone wants > >>>>>>> to take > >>>>>>>>> this on, option #1 is > best. > >>>>>>>>> > >>>>>>>>>> So Jonny, in > short, I > >> would say "do > >>>>> not use > >>>>>>>>> bioperl-ext". > >>>>>>>>> > >>>>>>>>> In general, that's a > safe > >> bet.? We're > >>>>> moving > >>>>>>> most of > >>>>>>>>> our C/C++ bindings to > BioLib. > >>>>>>>>> > >>>>>>>>>> Step back.? > What are > >> you trying > >>>>> to > >>>>>>>>> accomplish?? > Chris > >> already > >>>>> recommended some > >>>>>>> alternative > >>>>>>>>> methods in his email > of 8/11 > >> on this > >>>>>>> subject.? Perhaps > >>>>>>>>> we can guide you to > some > >> software that is > >>>>>>> actively > >>>>>>>>> maintained and will > meet your > >> needs. > >>>>>>>>>> Rob > >>>>>>>>> Exactly.? Lots of > other > >> (better > >>>>> supported!) > >>>>>>> options > >>>>>>>>> out there.? > HMMER, SeqAn, > >> and > >>>>> others. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >> > __________________________________________________ > >>>>>> Do You Yahoo!? > >>>>>> Tired of spam?? Yahoo! Mail > has the > >> best spam > >>>>> protection around > >>>>>> http://mail.yahoo.com > >>>>> > >> > _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>> > >>> > >>> > >>> --Robert Buels > >>> Bioinformatics Analyst, Sol Genomics Network > >>> Boyce Thompson Institute for Plant Research > >>> Tower Rd > >>> Ithaca, NY? 14853 > >>> Tel: 503-889-8539 > >>> rmb32 at cornell.edu > >>> http://www.sgn.cornell.edu > >> > >> > > > > > > > > From ymc at yahoo.com Sun Aug 16 09:36:59 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 02:36:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <217259.7083.qm@web30408.mail.mud.yahoo.com> Hi Chris Thanks for your suggestions. I think it is indeed better to check sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm I also fixed codes that will lead to warnings with use warnings. So now the only problem left is that "monotonic increasing" error. For that part of the code, I was trying to perform an expectation maximization step. Theoretically, the expectation should monotonically increase in every step. But I suppose this is not necessarily true when double precision floating point numbers are involved. I don't know why I used a 1e-100 tolerance for this. Therefore I "fixed" it by using the same tolerance to terminate the maximization step (ie .000001). I suppose this "fix" will make it much more unlikely to throw exception with your 5.10.0 perl. Can you give that a try again and see if it works now. Thank you Yee Man --- On Sat, 8/15/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Saturday, August 15, 2009, 10:38 PM > Yee, > > I took the liberty of making a few simple changes to > Bio::Tools::HMM in svn to point out the problem and possible > solutions.? Feel free to revert these as needed. > > I'm seeing two errors, which appear randomly when running > 'make test'.? The first is easily fixable, the second, > I'm not so sure.? I'll let you make the decisions on > both. > > 1)? There is an assumption in the module that, when > adding floating points, you will always get 1.0.? You > may run into problems: see 'perldoc -q long decimals'.? > Lines like this (two places in the module): > ? ... > ? if ($sum != 1.0) { > ? ???$self->throw("Sum of > probabilities for each state must be 1.0; got $sum\n"); > ? } > ? ... > > won't work as expected (note I added a simple diagnostic, > just print out the 'bad' sum).? With perl 5.8.8, this > appears to work fine, but this is what I get with perl 5.10 > (64-bit): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > Initial Probability Array: > 0.499978??? 0.500022??? > Transition Probability Matrix: > 0.499978??? 0.500022??? > 0.499978??? 0.500022??? > Emission Probability Matrix: > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > 0.133333??? 0.143333??? > 0.163333??? 0.123333??? > 0.143333??? 0.293333??? > > Log Probability of sequence 1: -521.808 > Log Probability of sequence 2: -426.057 > > Statistical Training > ==================== > Initial Probability Array: > 1??? 0??? > Transition Probability Matrix: > > ------------- EXCEPTION ------------- > MSG: Sum of probabilities for each from-state must be 1.0; > got 0.999999999999999976 > > STACK Bio::Tools::HMM::transition_prob > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > STACK toplevel test.pl:82 > ------------------------------------- > > make: *** [test_dynamic] Error 255 > > I'm assuming this needs to simply be rounded up to > 1.0.? That could be accomplished with something like > 'if (sprintf("%.2f", $sum) != 1.0) {...}' > > 2) The second error is a little stranger.? I have been > randomly getting this: > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > When I add strict and warnings pragmas to Bio::Tools::HMM > (with a little additional cleanup to get things running), I > get an additional warning (arrow): > > pyrimidine1:HMM cjfields$ make test > PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > "-Iblib/arch" test.pl > Argument "FL" isn't numeric in numeric lt (<) at > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > 188. <---- > Baum-Welch Training > =================== > S should be monotonic increasing! > make: *** [test_dynamic] Error 255 > > So something is not being converted as expected. > > chris > > On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > > > When are you going to release 1.6? Maybe let me work > on it before it releases. If it doesn't resolve the problem, > then we can think about other alternatives. > > > > Also, please show me the latest errors you have for > 5.10.0. > > > > Thanks > > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> Date: Saturday, August 15, 2009, 7:05 PM > >> I'm still seeing the same errors on > >> Mac OS X for 64-bit perl 5.10.0.? Mac OS X, > native perl > >> (v5.8.8) passes fine now (as well as perl 5.8.8 > on > >> dev.open-bio.org). > >> > >> I'm wondering if this is a problem with my local > perl > >> build.? I'm very tempted to push the > HMM-related code > >> into a separate distribution (bioperl-hmm) and > make a CPAN > >> release out of it so it gets wider testing via > CPAN testers; > >> it would just require a minimum bioperl 1.6 > installation for > >> Bio::Tools::HMM and any related modules.? > Yee, would > >> that be okay with you? > >> > >> chris > >> > >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > >> > >>> > >>> I just committed HMM.xs and typemap to SVN. > Can you > >> test it to confirm it works in 64-bit machines? > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Robert Buels" > >>>> Cc: "Yee Man Chan" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 12:11 PM > >>>> I'm not sure, but it makes more sense > >>>> to commit these changes directly.? > Yee, need > >> us to set > >>>> you up with a commit bit?? If so, > fill out > >> the > >>>> information on this page: > >>>> > >>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>> > >>>> and forward it to support at open-bio.org. > >>>> I'll sponsor you. > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels > wrote: > >>>> > >>>>> The usual procedure for developing > code is to > >> exchange > >>>> code via commits to a version control > >> system.? Yee, do > >>>> you know how to use Subversion? Does Yee > need a > >> commit bit? > >>>>> > >>>>> Rob > >>>>> > >>>>> Yee Man Chan wrote: > >>>>>> Hi Chris > >>>>>>? ???I find > that there is a > >> memory > >>>> access bug in my code. Attached is the > fixed > >> HMM.xs. This > >>>> file together with the simpler typemap > should fix > >> all > >>>> problems. (I hope..) > >>>>>>? ???Please let > me know if it > >> works > >>>> for you. > >>>>>> Sorry for the bug... > >>>>>> Yee Man > >>>>>> --- On Fri, 8/14/09, Chris Fields > > >>>> wrote: > >>>>>>> From: Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] > Problems > >> with > >>>> Bioperl-ext package on WinVista? > >>>>>>> To: "Yee Man Chan" > >>>>>>> Cc: "Robert Buels" , > >>>> "Jonny Dalzell" , > >>>> "BioPerl List" > >>>>>>> Date: Friday, August 14, 2009, > 8:31 > >> AM > >>>>>>> Yee Man, > >>>>>>> > >>>>>>> I tested this out locally > (perl 5.8.8 > >> 32-bit, > >>>> perl 5.10.0 > >>>>>>> 64-bit) and on > dev.open-bio.org (which > >> is perl > >>>> 5.8.8, > >>>>>>> appears to be 32-bit).? > The patch > >> results > >>>> in cleaning > >>>>>>> up warnings for 5.10.0 but > results in > >> similar > >>>> warnings for > >>>>>>> 5.8.8 (linux or OS X). > >>>>>>> > >>>>>>> On OS X perl 5.8.8, this > sometimes > >> passes > >>>> (note the first > >>>>>>> attempt fails, the second > succeeds), > >> so it's > >>>> not entirely a > >>>>>>> 32-bit issue: > >>>>>>> > >>>>>>> http://gist.github.com/167860 > >>>>>>> > >>>>>>> OS X and perl 5.10.0, this > always > >> fails as the > >>>> previous > >>>>>>> gist shows, but demonstrates > similar > >> behavior > >>>> (multiple > >>>>>>> attempts to test get > different > >> responses): > >>>>>>> > >>>>>>> http://gist.github.com/167542 > >>>>>>> > >>>>>>> On linux, everything passes > with or > >> w/o the > >>>> patched files > >>>>>>> (patched files have warnings > as > >> indicated > >>>> above): > >>>>>>> > >>>>>>> Specs for all three perl > executables > >> (they > >>>> vary a bit): > >>>>>>> > >>>>>>> http://gist.github.com/167883 > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On Aug 14, 2009, at 3:27 AM, > Yee Man > >> Chan > >>>> wrote: > >>>>>>> > >>>>>>>> Ah.. I find that the > typemap can > >> become as > >>>> simple as > >>>>>>> this > >>>>>>>> ===================== > >>>>>>>> TYPEMAP > >>>>>>>> HMM *? ? > T_PTROBJ > >>>>>>>> ===================== > >>>>>>>> > >>>>>>>> Then the generated HMM.c > will have > >> a > >>>> function called > >>>>>>> INT2PTR to do the pointer > conversion. > >> I > >>>> believe this should > >>>>>>> solve the warnings. > >>>>>>>> Attached are the updated > HMM.xs > >> and > >>>> typemap. Can > >>>>>>> someone with a 64-bit machine > give it > >> a try? > >>>>>>>> Thank you > >>>>>>>> Yee Man > >>>>>>>> --- On Thu, 8/13/09, Chris > Fields > >> > >>>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems with > >>>> Bioperl-ext > >>>>>>> package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>>> "Jonny Dalzell" , > >>>>>>> "BioPerl List" > >>>>>>>>> Date: Thursday, August > 13, > >> 2009, 5:31 > >>>> PM > >>>>>>>>> (just to point out to > >> everyone, Yee > >>>>>>>>> Man's contact > information was > >> in the > >>>> POD) > >>>>>>>>> > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I have the output in > the below > >> link: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> There are similar > problems > >> popping up > >>>> on 32- and > >>>>>>> 64-bit > >>>>>>>>> perl 5.10.0, Mac OS X > 10.5. > >>>> Haven't had time > >>>>>>> to debug > >>>>>>>>> it unfortunately. > >>>>>>>>> > >>>>>>>>> I think we should > seriously > >> consider > >>>> spinning this > >>>>>>> code off > >>>>>>>>> into it's own > distribution > >> for > >>>> CPAN.? It's > >>>>>>>>> unfortunately > bit-rotting away > >> in > >>>>>>> bioperl-ext.? If you > >>>>>>>>> want to continue > supporting it > >> I can > >>>> help set that > >>>>>>> up. > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 13, 2009, at > 6:58 PM, > >> Yee Man > >>>> Chan wrote: > >>>>>>>>> > >>>>>>>>>> Hi > >>>>>>>>>> > >>>>>>>>>>? ? > ???So is > >> this > >>>> an HMM only > >>>>>>> problem? Or does > >>>>>>>>> it apply to other > bioperl-ext > >>>> modules? > >>>>>>>>>>? ? > ???What > >>>> exactly are the > >>>>>>> compilation errors > >>>>>>>>> for HMM? I believe my > >> implementation > >>>> is just a > >>>>>>> simple one > >>>>>>>>> based on Rabiner's > paper. > >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>> > >>>>>>>>>>? ? > ???I > >> don't > >>>> think I did > >>>>>>> anything fancy that > >>>>>>>>> makes it machine > dependent or > >> non-ANSI > >>>> C. > >>>>>>>>>> Yee Man > >>>>>>>>>> > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Robert > Buels" > >> > >>>>>>>>>>> Cc: "Jonny > Dalzell" > >> , > >>>>>>>>> "BioPerl List" , > >>>>>>>>> "Yee Man Chan" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 3:18 PM > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 4:37 PM, > >>>> Robert Buels > >>>>>>> wrote: > >>>>>>>>>>>> Jonny > Dalzell > >> wrote: > >>>>>>>>>>>>> Is it > >> ridiculous of me > >>>> to expect > >>>>>>> ubuntu to > >>>>>>>>> take > >>>>>>>>>>> care of this > for > >> me?? How > >>>> do > >>>>>>>>>>>>> I go > about > >> compiling > >>>> the HMM? > >>>>>>>>>>>> Yes.? > This is > >> a very > >>>> specialized > >>>>>>> thing > >>>>>>>>> that > >>>>>>>>>>> you're doing, > and > >> Ubuntu does > >>>> not have > >>>>>>> the > >>>>>>>>> resources to > >>>>>>>>>>> package every > single > >> thing. > >>>>>>>>>>>> > Unfortunately, it > >> looks > >>>> like > >>>>>>> bioperl-ext > >>>>>>>>> package is > >>>>>>>>>>> not > installable under > >> Ubuntu > >>>> 9.04 anyway, > >>>>>>> which is > >>>>>>>>> what I'm > >>>>>>>>>>> running.? > For > >> others on > >>>> this list, > >>>>>>> if > >>>>>>>>> somebody is > >>>>>>>>>>> interested in > doing > >>>> maintaining it, I'd be > >>>>>>> happy > >>>>>>>>> to help out > >>>>>>>>>>> by testing on > >> Debian-based > >>>> Linux > >>>>>>> platforms. > >>>>>>>>> We need to > >>>>>>>>>>> clarify this > >> package's > >>>> maintenance status: > >>>>>>> if > >>>>>>>>> there is > >>>>>>>>>>> nobody > interested in > >>>> maintaining it, I > >>>>>>> would > >>>>>>>>> recommend that > >>>>>>>>>>> bioperl-ext be > removed > >> from > >>>> distribution. > >>>>>>>>> It's not in > >>>>>>>>>>> anybody's > interest to > >> have > >>>> unmaintained > >>>>>>> software > >>>>>>>>> out there > >>>>>>>>>>> causing > confusion. > >>>>>>>>>>> > >>>>>>>>>>> I have cc'd > Yee Man > >> Chan for > >>>> this. > >>>>>>> If there > >>>>>>>>> isn't a > >>>>>>>>>>> response or > the > >> message > >>>> bounces, we do one > >>>>>>> of two > >>>>>>>>> things: > >>>>>>>>>>> 1) consider > it > >> deprecated > >>>> (probably > >>>>>>> safest). > >>>>>>>>>>> 2) spin it out > into a > >> separate > >>>> module. > >>>>>>>>>>> > >>>>>>>>>>> Just tried to > comile > >> it myself > >>>> and am > >>>>>>> getting > >>>>>>>>> errors (using > >>>>>>>>>>> 64bit perl > 5.10), so I > >> think, > >>>> unless > >>>>>>> someone wants > >>>>>>>>> to take > >>>>>>>>>>> this on, > option #1 is > >> best. > >>>>>>>>>>> > >>>>>>>>>>>> So Jonny, > in > >> short, I > >>>> would say "do > >>>>>>> not use > >>>>>>>>>>> bioperl-ext". > >>>>>>>>>>> > >>>>>>>>>>> In general, > that's a > >> safe > >>>> bet.? We're > >>>>>>> moving > >>>>>>>>> most of > >>>>>>>>>>> our C/C++ > bindings to > >> BioLib. > >>>>>>>>>>> > >>>>>>>>>>>> Step > back. > >> What are > >>>> you trying > >>>>>>> to > >>>>>>>>>>> accomplish? > >> Chris > >>>> already > >>>>>>> recommended some > >>>>>>>>> alternative > >>>>>>>>>>> methods in his > email > >> of 8/11 > >>>> on this > >>>>>>>>> subject.? > Perhaps > >>>>>>>>>>> we can guide > you to > >> some > >>>> software that is > >>>>>>>>> actively > >>>>>>>>>>> maintained and > will > >> meet your > >>>> needs. > >>>>>>>>>>>> Rob > >>>>>>>>>>> Exactly.? > Lots of > >> other > >>>> (better > >>>>>>> supported!) > >>>>>>>>> options > >>>>>>>>>>> out there. > >> HMMER, SeqAn, > >>>> and > >>>>>>> others. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>> Do You Yahoo!? > >>>>>>>> Tired of spam?? > Yahoo! Mail > >> has the > >>>> best spam > >>>>>>> protection around > >>>>>>>> http://mail.yahoo.com > >>>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> --Robert Buels > >>>>> Bioinformatics Analyst, Sol Genomics > Network > >>>>> Boyce Thompson Institute for Plant > Research > >>>>> Tower Rd > >>>>> Ithaca, NY? 14853 > >>>>> Tel: 503-889-8539 > >>>>> rmb32 at cornell.edu > >>>>> http://www.sgn.cornell.edu > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Mon Aug 17 03:34:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 16 Aug 2009 20:34:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> Message-ID: <474354.59886.qm@web30408.mail.mud.yahoo.com> Hi Chris Good to hear that it is working and thanks for testing. As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. So please pass around the updated code and test it extensively, if no one complains about the new code by the time of release, I would think it should go into the next bioperl-ext release. If people uncover new errors with the new code and the errors can't be fixed on time, then it should be spinned off. What do you think? Best Regards, Yee Man --- On Sun, 8/16/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Sunday, August 16, 2009, 5:53 AM > That worked!? Thanks Yee Man! > > chris > > ps - let me know how you want to deal with a release. > > On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > > > Hi Chris > > > >???Thanks for your suggestions. I think > it is indeed better to check? > > sum to 1.0 using sprintf. I fixed this in the newly > committed HMM.pm > > > >???I also fixed codes that will lead to > warnings with use warnings. > > > >???So now the only problem left is that > "monotonic increasing" error.? > > For that part of the code, I was trying to perform an > expectation? > > maximization step. Theoretically, the expectation > should? > > monotonically increase in every step. But I suppose > this is not? > > necessarily true when double precision floating point > numbers are? > > involved. I don't know why I used a 1e-100 tolerance > for this.? > > Therefore I "fixed" it by using the same tolerance to > terminate the? > > maximization step (ie .000001). I suppose this "fix" > will make it? > > much more unlikely to throw exception with your 5.10.0 > perl. > > > >???Can you give that a try again and see > if it works now. > > > > Thank you > > Yee Man > > > > > > > > --- On Sat, 8/15/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on? > >> WinVista? > >> To: "Yee Man Chan" > >> Cc: "Robert Buels" , > "BioPerl List" > >> > > >> Date: Saturday, August 15, 2009, 10:38 PM > >> Yee, > >> > >> I took the liberty of making a few simple changes > to > >> Bio::Tools::HMM in svn to point out the problem > and possible > >> solutions.? Feel free to revert these as > needed. > >> > >> I'm seeing two errors, which appear randomly when > running > >> 'make test'.? The first is easily fixable, > the second, > >> I'm not so sure.? I'll let you make the > decisions on > >> both. > >> > >> 1)? There is an assumption in the module > that, when > >> adding floating points, you will always get > 1.0.? You > >> may run into problems: see 'perldoc -q long > decimals'. > >> Lines like this (two places in the module): > >>???... > >>???if ($sum != 1.0) { > >>? ? ? $self->throw("Sum of > >> probabilities for each state must be 1.0; got > $sum\n"); > >>???} > >>???... > >> > >> won't work as expected (note I added a simple > diagnostic, > >> just print out the 'bad' sum).? With perl > 5.8.8, this > >> appears to work fine, but this is what I get with > perl 5.10 > >> (64-bit): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> Initial Probability Array: > >> 0.499978? ? 0.500022 > >> Transition Probability Matrix: > >> 0.499978? ? 0.500022 > >> 0.499978? ? 0.500022 > >> Emission Probability Matrix: > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> 0.133333? ? 0.143333 > >> 0.163333? ? 0.123333 > >> 0.143333? ? 0.293333 > >> > >> Log Probability of sequence 1: -521.808 > >> Log Probability of sequence 2: -426.057 > >> > >> Statistical Training > >> ==================== > >> Initial Probability Array: > >> 1? ? 0 > >> Transition Probability Matrix: > >> > >> ------------- EXCEPTION ------------- > >> MSG: Sum of probabilities for each from-state must > be 1.0; > >> got 0.999999999999999976 > >> > >> STACK Bio::Tools::HMM::transition_prob > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 > >> STACK toplevel test.pl:82 > >> ------------------------------------- > >> > >> make: *** [test_dynamic] Error 255 > >> > >> I'm assuming this needs to simply be rounded up > to > >> 1.0.? That could be accomplished with > something like > >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' > >> > >> 2) The second error is a little stranger.? I > have been > >> randomly getting this: > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> When I add strict and warnings pragmas to > Bio::Tools::HMM > >> (with a little additional cleanup to get things > running), I > >> get an additional warning (arrow): > >> > >> pyrimidine1:HMM cjfields$ make test > >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" > >> "-Iblib/arch" test.pl > >> Argument "FL" isn't numeric in numeric lt (<) > at > >> > /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line > >> 188. <---- > >> Baum-Welch Training > >> =================== > >> S should be monotonic increasing! > >> make: *** [test_dynamic] Error 255 > >> > >> So something is not being converted as expected. > >> > >> chris > >> > >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > >> > >>> When are you going to release 1.6? Maybe let > me work > >> on it before it releases. If it doesn't resolve > the problem, > >> then we can think about other alternatives. > >>> > >>> Also, please show me the latest errors you > have for > >> 5.10.0. > >>> > >>> Thanks > >>> Yee Man > >>> > >>> --- On Sat, 8/15/09, Chris Fields > >> wrote: > >>> > >>>> From: Chris Fields > >>>> Subject: Re: [Bioperl-l] Problems with > Bioperl-ext > >> package on WinVista? > >>>> To: "Yee Man Chan" > >>>> Cc: "Robert Buels" , > >> "BioPerl List" > >>>> Date: Saturday, August 15, 2009, 7:05 PM > >>>> I'm still seeing the same errors on > >>>> Mac OS X for 64-bit perl 5.10.0.? Mac > OS X, > >> native perl > >>>> (v5.8.8) passes fine now (as well as perl > 5.8.8 > >> on > >>>> dev.open-bio.org). > >>>> > >>>> I'm wondering if this is a problem with my > local > >> perl > >>>> build.? I'm very tempted to push the > >> HMM-related code > >>>> into a separate distribution (bioperl-hmm) > and > >> make a CPAN > >>>> release out of it so it gets wider testing > via > >> CPAN testers; > >>>> it would just require a minimum bioperl > 1.6 > >> installation for > >>>> Bio::Tools::HMM and any related modules. > >> Yee, would > >>>> that be okay with you? > >>>> > >>>> chris > >>>> > >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan > wrote: > >>>> > >>>>> > >>>>> I just committed HMM.xs and typemap to > SVN. > >> Can you > >>>> test it to confirm it works in 64-bit > machines? > >>>>> > >>>>> Thanks > >>>>> Yee Man > >>>>> > >>>>> --- On Sat, 8/15/09, Chris Fields > > >>>> wrote: > >>>>> > >>>>>> From: Chris Fields > >>>>>> Subject: Re: [Bioperl-l] Problems > with > >> Bioperl-ext > >>>> package on WinVista? > >>>>>> To: "Robert Buels" > >>>>>> Cc: "Yee Man Chan" , > >>>> "BioPerl List" > >>>>>> Date: Saturday, August 15, 2009, > 12:11 PM > >>>>>> I'm not sure, but it makes more > sense > >>>>>> to commit these changes directly. > >> Yee, need > >>>> us to set > >>>>>> you up with a commit bit?? If > so, > >> fill out > >>>> the > >>>>>> information on this page: > >>>>>> > >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request > >>>>>> > >>>>>> and forward it to support at open-bio.org. > >>>>>> I'll sponsor you. > >>>>>> > >>>>>> chris > >>>>>> > >>>>>> On Aug 15, 2009, at 11:44 AM, > Robert Buels > >> wrote: > >>>>>> > >>>>>>> The usual procedure for > developing > >> code is to > >>>> exchange > >>>>>> code via commits to a version > control > >>>> system.? Yee, do > >>>>>> you know how to use Subversion? > Does Yee > >> need a > >>>> commit bit? > >>>>>>> > >>>>>>> Rob > >>>>>>> > >>>>>>> Yee Man Chan wrote: > >>>>>>>> Hi Chris > >>>>>>>>? ? ? I > find > >> that there is a > >>>> memory > >>>>>> access bug in my code. Attached is > the > >> fixed > >>>> HMM.xs. This > >>>>>> file together with the simpler > typemap > >> should fix > >>>> all > >>>>>> problems. (I hope..) > >>>>>>>>? ? ? Please > let > >> me know if it > >>>> works > >>>>>> for you. > >>>>>>>> Sorry for the bug... > >>>>>>>> Yee Man > >>>>>>>> --- On Fri, 8/14/09, Chris > Fields > >> > >>>>>> wrote: > >>>>>>>>> From: Chris Fields > > >>>>>>>>> Subject: Re: > [Bioperl-l] > >> Problems > >>>> with > >>>>>> Bioperl-ext package on WinVista? > >>>>>>>>> To: "Yee Man Chan" > > >>>>>>>>> Cc: "Robert Buels" > , > >>>>>> "Jonny Dalzell" , > >>>>>> "BioPerl List" > >>>>>>>>> Date: Friday, August > 14, 2009, > >> 8:31 > >>>> AM > >>>>>>>>> Yee Man, > >>>>>>>>> > >>>>>>>>> I tested this out > locally > >> (perl 5.8.8 > >>>> 32-bit, > >>>>>> perl 5.10.0 > >>>>>>>>> 64-bit) and on > >> dev.open-bio.org (which > >>>> is perl > >>>>>> 5.8.8, > >>>>>>>>> appears to be > 32-bit). > >> The patch > >>>> results > >>>>>> in cleaning > >>>>>>>>> up warnings for 5.10.0 > but > >> results in > >>>> similar > >>>>>> warnings for > >>>>>>>>> 5.8.8 (linux or OS > X). > >>>>>>>>> > >>>>>>>>> On OS X perl 5.8.8, > this > >> sometimes > >>>> passes > >>>>>> (note the first > >>>>>>>>> attempt fails, the > second > >> succeeds), > >>>> so it's > >>>>>> not entirely a > >>>>>>>>> 32-bit issue: > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167860 > >>>>>>>>> > >>>>>>>>> OS X and perl 5.10.0, > this > >> always > >>>> fails as the > >>>>>> previous > >>>>>>>>> gist shows, but > demonstrates > >> similar > >>>> behavior > >>>>>> (multiple > >>>>>>>>> attempts to test get > >> different > >>>> responses): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167542 > >>>>>>>>> > >>>>>>>>> On linux, everything > passes > >> with or > >>>> w/o the > >>>>>> patched files > >>>>>>>>> (patched files have > warnings > >> as > >>>> indicated > >>>>>> above): > >>>>>>>>> > >>>>>>>>> Specs for all three > perl > >> executables > >>>> (they > >>>>>> vary a bit): > >>>>>>>>> > >>>>>>>>> http://gist.github.com/167883 > >>>>>>>>> > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Aug 14, 2009, at > 3:27 AM, > >> Yee Man > >>>> Chan > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Ah.. I find that > the > >> typemap can > >>>> become as > >>>>>> simple as > >>>>>>>>> this > >>>>>>>>>> > ===================== > >>>>>>>>>> TYPEMAP > >>>>>>>>>> HMM * > >> T_PTROBJ > >>>>>>>>>> > ===================== > >>>>>>>>>> > >>>>>>>>>> Then the generated > HMM.c > >> will have > >>>> a > >>>>>> function called > >>>>>>>>> INT2PTR to do the > pointer > >> conversion. > >>>> I > >>>>>> believe this should > >>>>>>>>> solve the warnings. > >>>>>>>>>> Attached are the > updated > >> HMM.xs > >>>> and > >>>>>> typemap. Can > >>>>>>>>> someone with a 64-bit > machine > >> give it > >>>> a try? > >>>>>>>>>> Thank you > >>>>>>>>>> Yee Man > >>>>>>>>>> --- On Thu, > 8/13/09, Chris > >> Fields > >>>> > >>>>>>>>> wrote: > >>>>>>>>>>> From: Chris > Fields > >> > >>>>>>>>>>> Subject: Re: > >> [Bioperl-l] > >>>> Problems with > >>>>>> Bioperl-ext > >>>>>>>>> package on WinVista? > >>>>>>>>>>> To: "Yee Man > Chan" > >> > >>>>>>>>>>> Cc: "Robert > Buels" > >> , > >>>>>>>>> "Jonny Dalzell" , > >>>>>>>>> "BioPerl List" > >>>>>>>>>>> Date: > Thursday, August > >> 13, > >>>> 2009, 5:31 > >>>>>> PM > >>>>>>>>>>> (just to point > out to > >>>> everyone, Yee > >>>>>>>>>>> Man's contact > >> information was > >>>> in the > >>>>>> POD) > >>>>>>>>>>> > >>>>>>>>>>> Yee Man, > >>>>>>>>>>> > >>>>>>>>>>> I have the > output in > >> the below > >>>> link: > >>>>>>>>>>> > >>>>>>>>>>> http://gist.github.com/167542 > >>>>>>>>>>> > >>>>>>>>>>> There are > similar > >> problems > >>>> popping up > >>>>>> on 32- and > >>>>>>>>> 64-bit > >>>>>>>>>>> perl 5.10.0, > Mac OS X > >> 10.5. > >>>>>> Haven't had time > >>>>>>>>> to debug > >>>>>>>>>>> it > unfortunately. > >>>>>>>>>>> > >>>>>>>>>>> I think we > should > >> seriously > >>>> consider > >>>>>> spinning this > >>>>>>>>> code off > >>>>>>>>>>> into it's own > >> distribution > >>>> for > >>>>>> CPAN.? It's > >>>>>>>>>>> unfortunately > >> bit-rotting away > >>>> in > >>>>>>>>> bioperl-ext.? If > you > >>>>>>>>>>> want to > continue > >> supporting it > >>>> I can > >>>>>> help set that > >>>>>>>>> up. > >>>>>>>>>>> chris > >>>>>>>>>>> > >>>>>>>>>>> On Aug 13, > 2009, at > >> 6:58 PM, > >>>> Yee Man > >>>>>> Chan wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? So is > >>>> this > >>>>>> an HMM only > >>>>>>>>> problem? Or does > >>>>>>>>>>> it apply to > other > >> bioperl-ext > >>>>>> modules? > >>>>>>>>>>>> > >>? ? What > >>>>>> exactly are the > >>>>>>>>> compilation errors > >>>>>>>>>>> for HMM? I > believe my > >>>> implementation > >>>>>> is just a > >>>>>>>>> simple one > >>>>>>>>>>> based on > Rabiner's > >> paper. > >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > > >>>>>>>>>>>> > ~murphyk%2FBayes > >>>>>>>>>>>> > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > > >>>>>>>>>>>> > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > >>>>>>>>>>>> > >>>>>>>>>>>> > >>? ? I > >>>> don't > >>>>>> think I did > >>>>>>>>> anything fancy that > >>>>>>>>>>> makes it > machine > >> dependent or > >>>> non-ANSI > >>>>>> C. > >>>>>>>>>>>> Yee Man > >>>>>>>>>>>> > >>>>>>>>>>>> --- On > Thu, > >> 8/13/09, Chris > >>>> Fields > >>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> From: > Chris > >> Fields > >>>> > >>>>>>>>>>>>> > Subject: Re: > >>>> [Bioperl-l] > >>>>>> Problems with > >>>>>>>>> Bioperl-ext > >>>>>>>>>>> package on > WinVista? > >>>>>>>>>>>>> To: > "Robert > >> Buels" > >>>> > >>>>>>>>>>>>> Cc: > "Jonny > >> Dalzell" > >>>> , > >>>>>>>>>>> "BioPerl List" > , > >>>>>>>>>>> "Yee Man Chan" > > >>>>>>>>>>>>> Date: > >> Thursday, August > >>>> 13, > >>>>>> 2009, 3:18 PM > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Aug > 13, > >> 2009, at > >>>> 4:37 PM, > >>>>>> Robert Buels > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > Jonny > >> Dalzell > >>>> wrote: > >>>>>>>>>>>>>>> > Is it > >>>> ridiculous of me > >>>>>> to expect > >>>>>>>>> ubuntu to > >>>>>>>>>>> take > >>>>>>>>>>>>> care > of this > >> for > >>>> me?? How > >>>>>> do > >>>>>>>>>>>>>>> > I go > >> about > >>>> compiling > >>>>>> the HMM? > >>>>>>>>>>>>>> > Yes. > >> This is > >>>> a very > >>>>>> specialized > >>>>>>>>> thing > >>>>>>>>>>> that > >>>>>>>>>>>>> you're > doing, > >> and > >>>> Ubuntu does > >>>>>> not have > >>>>>>>>> the > >>>>>>>>>>> resources to > >>>>>>>>>>>>> > package every > >> single > >>>> thing. > >>>>>>>>>>>>>> > >> Unfortunately, it > >>>> looks > >>>>>> like > >>>>>>>>> bioperl-ext > >>>>>>>>>>> package is > >>>>>>>>>>>>> not > >> installable under > >>>> Ubuntu > >>>>>> 9.04 anyway, > >>>>>>>>> which is > >>>>>>>>>>> what I'm > >>>>>>>>>>>>> > running. > >> For > >>>> others on > >>>>>> this list, > >>>>>>>>> if > >>>>>>>>>>> somebody is > >>>>>>>>>>>>> > interested in > >> doing > >>>>>> maintaining it, I'd be > >>>>>>>>> happy > >>>>>>>>>>> to help out > >>>>>>>>>>>>> by > testing on > >>>> Debian-based > >>>>>> Linux > >>>>>>>>> platforms. > >>>>>>>>>>> We need to > >>>>>>>>>>>>> > clarify this > >>>> package's > >>>>>> maintenance status: > >>>>>>>>> if > >>>>>>>>>>> there is > >>>>>>>>>>>>> > nobody > >> interested in > >>>>>> maintaining it, I > >>>>>>>>> would > >>>>>>>>>>> recommend > that > >>>>>>>>>>>>> > bioperl-ext be > >> removed > >>>> from > >>>>>> distribution. > >>>>>>>>>>> It's not in > >>>>>>>>>>>>> > anybody's > >> interest to > >>>> have > >>>>>> unmaintained > >>>>>>>>> software > >>>>>>>>>>> out there > >>>>>>>>>>>>> > causing > >> confusion. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have > cc'd > >> Yee Man > >>>> Chan for > >>>>>> this. > >>>>>>>>> If there > >>>>>>>>>>> isn't a > >>>>>>>>>>>>> > response or > >> the > >>>> message > >>>>>> bounces, we do one > >>>>>>>>> of two > >>>>>>>>>>> things: > >>>>>>>>>>>>> 1) > consider > >> it > >>>> deprecated > >>>>>> (probably > >>>>>>>>> safest). > >>>>>>>>>>>>> 2) > spin it out > >> into a > >>>> separate > >>>>>> module. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Just > tried to > >> comile > >>>> it myself > >>>>>> and am > >>>>>>>>> getting > >>>>>>>>>>> errors (using > >>>>>>>>>>>>> 64bit > perl > >> 5.10), so I > >>>> think, > >>>>>> unless > >>>>>>>>> someone wants > >>>>>>>>>>> to take > >>>>>>>>>>>>> this > on, > >> option #1 is > >>>> best. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> So > Jonny, > >> in > >>>> short, I > >>>>>> would say "do > >>>>>>>>> not use > >>>>>>>>>>>>> > bioperl-ext". > >>>>>>>>>>>>> > >>>>>>>>>>>>> In > general, > >> that's a > >>>> safe > >>>>>> bet.? We're > >>>>>>>>> moving > >>>>>>>>>>> most of > >>>>>>>>>>>>> our > C/C++ > >> bindings to > >>>> BioLib. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > Step > >> back. > >>>> What are > >>>>>> you trying > >>>>>>>>> to > >>>>>>>>>>>>> > accomplish? > >>>> Chris > >>>>>> already > >>>>>>>>> recommended some > >>>>>>>>>>> alternative > >>>>>>>>>>>>> > methods in his > >> email > >>>> of 8/11 > >>>>>> on this > >>>>>>>>>>> subject. > >> Perhaps > >>>>>>>>>>>>> we can > guide > >> you to > >>>> some > >>>>>> software that is > >>>>>>>>>>> actively > >>>>>>>>>>>>> > maintained and > >> will > >>>> meet your > >>>>>> needs. > >>>>>>>>>>>>>> > Rob > >>>>>>>>>>>>> > Exactly. > >> Lots of > >>>> other > >>>>>> (better > >>>>>>>>> supported!) > >>>>>>>>>>> options > >>>>>>>>>>>>> out > there. > >>>> HMMER, SeqAn, > >>>>>> and > >>>>>>>>> others. > >>>>>>>>>>>>> chris > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>> > >>>> > >> > __________________________________________________ > >>>>>>>>>> Do You Yahoo!? > >>>>>>>>>> Tired of spam? > >> Yahoo! Mail > >>>> has the > >>>>>> best spam > >>>>>>>>> protection around > >>>>>>>>>> http://mail.yahoo.com > >>>>>>>>> > >>>>>> > >>>> > >> > _______________________________________________ > >>>>>>>>>> Bioperl-l mailing > list > >>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> --Robert Buels > >>>>>>> Bioinformatics Analyst, Sol > Genomics > >> Network > >>>>>>> Boyce Thompson Institute for > Plant > >> Research > >>>>>>> Tower Rd > >>>>>>> Ithaca, NY? 14853 > >>>>>>> Tel: 503-889-8539 > >>>>>>> rmb32 at cornell.edu > >>>>>>> http://www.sgn.cornell.edu > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > > > From ymc at yahoo.com Mon Aug 17 22:19:27 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:19:27 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Message-ID: <419432.62970.qm@web30403.mail.mud.yahoo.com> I believe this warnings should have been fixed with the latest Bio/Tools/HMM.pm. Are you sure you are using the lastest Bio/Tools/HMM.pm? I noticed that there are two pairs of "use strict" and "use warnings" in this version. :P Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:22 PM > Still seeing that odd warning popping > up: > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose > t/001_basics.t .. Argument "FL" isn't numeric in numeric lt > (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > Have you tried using Yee Man's original Makefile.PL to see > if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > chris > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into > a new distro at Bio-Tools-HMM in the repo.? The tests > are not passing, I think that some bugs need to be fixed in > the logic of things. > > > > Yee Man, could you have a look?? To download the > newly repackaged code: > > > > svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > > > > perl Build.PL; ./Build test > > > > Please check that things are compiling OK, check the > test logic, upgrade the tests to use Test::More, and get the > tests to the point where they are passing. > > > > At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > > > > Rob > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Mon Aug 17 22:28:50 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 15:28:50 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <45F9C6D1-7DD7-4227-B7B9-3FBAF7513B35@illinois.edu> Message-ID: <360578.66990.qm@web30403.mail.mud.yahoo.com> I noticed that Bio/Tools/HMM.pm was removed from the trunk. So I added it back in. I think you shouldn't get the warnings with this version. Yee Man --- On Mon, 8/17/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" , "Yee Man Chan" > Date: Monday, August 17, 2009, 2:28 PM > Take that back.? Yes the 'FL' > warning is still there, but no tests are run b/c (simply > put) there are no regression tests (no use of Test or > Test::More).? If you run './Build test --verbose' you > can see the run, but no test output.? That should be > easy to fix, though. > > chris > > On Aug 17, 2009, at 4:22 PM, Chris Fields wrote: > > > Still seeing that odd warning popping up: > > > > cjfields4:Bio-Tools-HMM cjfields$ ./Build test > --verbose > > t/001_basics.t .. Argument "FL" isn't numeric in > numeric lt (<) at > /Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm > line 185. > > > > Have you tried using Yee Man's original Makefile.PL to > see if it works better?? There appear to be some > differences in the compilation, including a linking warning > popping up. > > > > chris > > > > On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > > > >> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off > into a new distro at Bio-Tools-HMM in the repo.? The > tests are not passing, I think that some bugs need to be > fixed in the logic of things. > >> > >> Yee Man, could you have a look?? To download > the newly repackaged code: > >> > >> svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk > Bio-Tools-HMM > >> > >> perl Build.PL; ./Build test > >> > >> Please check that things are compiling OK, check > the test logic, upgrade the tests to use Test::More, and get > the tests to the point where they are passing. > >> > >> At that point, it should be ready for CPAN, but we > need to decide how we want to coordinate that with releases > of bioperl-live and bioperl-ext. > >> > >> Rob > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ymc at yahoo.com Tue Aug 18 00:24:24 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Mon, 17 Aug 2009 17:24:24 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89E0F0.8010307@cornell.edu> Message-ID: <62126.74727.qm@web30401.mail.mud.yahoo.com> I get it now. So it is now spinned off. Anyway, I updated the HMM.pm in Bio-Tools-HMM with the latest version. I think it should work. Yee Man --- On Mon, 8/17/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Monday, August 17, 2009, 4:00 PM > Yee Man Chan wrote: > > I noticed that Bio/Tools/HMM.pm was removed from the > trunk. So I added it back in. I think you shouldn't get the > warnings with this version. > > Please read my email above with instructions for checkout > out the new Bio-Tools-HMM component, where Bio::Tools::HMM > has been moved.? Please do not add the Bio::Tools::HMM > module back into bioperl-live. > > I think you might be confused about the functions of 'svn > add', 'svn commit', etc, because I don't see any actual > addition of the module in the commit logs.? Please read > through the SVN manual at http://svnbook.red-bean.com/ if you need > clarification. > > Rob > > From whs at eaglegenomics.com Tue Aug 18 09:14:48 2009 From: whs at eaglegenomics.com (Will Spooner) Date: Tue, 18 Aug 2009 10:14:48 +0100 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers In-Reply-To: References: Message-ID: Hi Robert, Speaking for Ensembl, the GeneTree display code is deeply embedded in the API and web code, and refactoring as a standalone package would be exceedingly difficult. Jalview (http://www.jalview.org) may be a good alternative, albeit a Java one. There is code available for driving Jalview from the Ensembl database, and something similar for BioPerl seems reasonable. Will On 17 Aug 2009, at 18:14, Robert Bradbury wrote: > One of the questions facing people working in bioinformatics is "How > do we > present information so that it can be effectively interpreted by > non-informatics specialists?" > > Now, my expertise lies in computer science (esp. O.S. & databases) > and as a > second vocation the biology of aging (DNA damage & repair, to a lesser > extent cancer and pathologies of aging, etc.). Now by my estimate > there are > perhaps 5 people in the world who are able to effectively discuss > computer > science X aging (gerontology) [3]. There are perhaps several dozen > people > where those areas, esp aging, may overlap with DNA damage & repair. > But > then there is a wider audience of perhaps a few hundred members of > AGE, and > maybe a thousand or so who are members of the scientific subgroup of > GSA. > But most of those individuals are "old school" scientists who know > relatively little about bioinformatics. So one has barriers to > presenting > bioinformatics information in ways that they can use usefully. > > I have found in my limited experience that homology graphs of > conserved > protein domains, such as those displayed in HomloGene or those in > Ensembl > (including phylogeny graphs) can be quite useful in reaching > interesting > conclusions. For example, double strand break repair processes > which may > involve 8-10 relatively conserved proteins, may have a critical role > in the > mechanisms of aging. In particular two of those proteins, WRN & > DCLRE1C > (Artemis) contain complementary exonuclease activities which chew up > the DNA > in order to prepare the strands for ligation. Of course, > programmers may > appreciate better than gerontologists the significance of deleting > random > bytes from instruction sequences in ones code. At the recent AGE > meeting in > June several discussions arose as to possible differences in "aging" > in > yeast, *C. elegans* and mammals. [1]. A quick database search > showed that *C. > elegans* seems to be lacking the exonuclease domain on the WRN > homologue and > may be missing a DCLRE1C homologue entirely (which if true would > lead to > conclusions that aging in *C. elegans* may be fundamentally > different from > aging in vertebrates). Explaining this to researchers can best be > done > using pictures. > > I've been through PubMed and have several papers (NAR / BMC > Bioinformatics) > regarding programs to do homology comparisons and phylogeny trees. > However > these seem to lean towards producing less condensed bioinformatics-ish > information. I do not know however whether the outputs from > databases like > PubMed HomoloGene or Ensembl have been packaged in tools that might > be part > of BioPerl. I am interested in programs that can be run on a > regular basis > to draw "pretty pictures" that can be used for publication and/or > internet > browsing. In particular I'm interested in running such programs on > species > of interest to various gerontological communities [2] which involves > subsets > of databases which seem to be scattered around the world. > > Thanks. > > 1. Of course there has been lots of discussion and rationalization > over the > last 15+ years about how "aging" is largely the same in more complex > and > simpler organisms -- in part to justify sequencing some organisms > and in > part to justify funding research at certain laboratories. A closer > examination based on some of the complete and emerging genome > sequences may > suggest this is a very swampy discussion. > 2. For example, nematode DNA repair gene comparisons would be > interesting to > nematode researchers, insect DNA repair gene comparisons to insect > researchers, both to invertebrate researchers, etc. > 3. The recently published textbooks *Aging of the Genome* by Jan > Vijg and > the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg > *et al*, > go a long way towards moving these areas from the stacks of research > libraries into areas for more general discussion. Both volumes deal > extensively with the ~150 DNA repair genes. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- William Spooner whs at eaglegenomics.com http://www.eaglegenomics.com From cjfields at illinois.edu Tue Aug 18 14:35:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 09:35:49 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> Message-ID: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> I think I already answered this: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 chris On Aug 14, 2009, at 2:02 PM, David Quan wrote: > Hello, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Recommendations for elements > that would be capable of behaving in such a way would be greatly > appreciated. Thanks very much. > > David N. Quan > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Aug 18 14:42:09 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 18 Aug 2009 16:42:09 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> <1A4207F8295607498283FE9E93B775B4062D1EF7@EX02.asurite.ad.asu.edu> Message-ID: <628aabb70908180742o4bf93d21tab0b90c328323efa@mail.gmail.com> On Tue, Aug 18, 2009 at 02:36, Kevin Brown wrote: > The obfuscator does help, but even it is a little sparse on data for > modules. Especially information on the realities of the returned data > from a method call. Yep, sorry about that, Kevin. I'm way overdue in devoting a little attention to cleaning up those Deobfuscator bugs and -- just maybe -- putting a prettier face on it. Hoping to find some time in the near future for that. Dave From cjfields at illinois.edu Tue Aug 18 15:04:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:04:40 -0500 Subject: [Bioperl-l] code reuse with moose In-Reply-To: <20090818110102.GA27010@seinfeld> References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > Putting it in the bioperl list, makes more sense here, > > On Wed, 12 Aug 2009, Chris Fields wrote: > >> (BTW, this is re: the reimplementation of major chunks of BioPerl >> using >> Moose, Biome: http://github.com/cjfields/biome/tree/) >> >> Locations should use a Role (specifically, Biome::Role::Range), so >> start/end/strand should be attributes, not methods. With >> attributes the >> best way to do this is probably with a builder, and lazily (start >> requires end, and vice versa). Factor out the common code as Tomas >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- >> >throw() >> exception handling; it simply catches any exceptions and passes >> them to >> the metaclass exception handling. >> >> I've been thinking about making the Range role abstract for this very >> reason (or defining very basic attributes); something like: >> >> ---------------------------- >> >> package Bio::Role::Range; >> >> requires qw(_build_start _build_end _build_strand); >> >> # also require other methods which need to be defined in >> implementation >> >> has 'start' => ( >> isa => 'Int', >> is => 'rw', >> builder => '_build_start', >> lazy => 1 >> ); >> >> # same for end, strand (except strand has a different isa via >> MooseX::Types) >> .... >> >> package Bio::Location::Foo; >> >> with 'Bio::Role::Range'; >> >> sub _build_start { >> # for location-specific start >> } >> >> sub _build_end { >> # for location-specific end >> } >> >> sub _build_strand { >> # for location-specific strand >> } >> >> sub _common_build_method { >> # factor out common code here, call from other builders >> } >> >> ---------------------------- > > This plan makes things much clearer. Currently the > BioMe::Role::Location has a 'requires' keyword and rest of the > location modules consume that role to have its own implementation. At > this point on BioMe::Location::Atomic has attribute based 'start' and > 'end' implememtation. I got a bit confused because in current bioperl > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > i am trying to follow that path in BioMe it has to override that > method. > So, my question is do all the location modules really needs to > inherits > from each other. I am totally aware about the origianl design ideas > but > it would be better to have a flatten hierarchy if possible. Flattening with roles is always a good idea, yes. I wouldn't worry as much about the way it was originally implemented as the general API (and ways in which we can simplify it). > One more thing, what about putting the 'start', 'end' and the other > common base attributes in BioMe::Role::Location instead of > BioMe::Role::Range. I am not sure which would be correct from bioperl > stand of view, just throwing out an idea. That's a possibility. To me Locations are just Ranges with different behavior (hence the below comment...) >> Also, I think the Coordinate-related stuff should be simplified >> down to a >> trait or an attribute; they bring in way too much overhead in >> bioperl w/o >> much added value. > > You mean instead of having 'builder' method, having a specialized > traits handling those. That sounds like even better. > > -siddhartha Yes, that's essentially it. Location behavior could be changed by having CoordinatePolicy as a trait. Similarly, fuzziness for start/ end could also be thought of as a trait. In essence, you could probably role most behavior into attribute traits (which, in Moose, are just roles that are composed into the attribute meta class, Moose::Meta::Attribute). I had started up a Biome::Meta::Attribute class in case we were to go down this path, then we could start registering specific traits within that namespace. Just to note, it might be easier to try the simplest approach first and get tests passing, then layer in traits to see how they act performance-wise. My guess is they will speed things up, but you never know. Locations will be a performance bottleneck as they are used in generic Features. chris From cjfields at illinois.edu Tue Aug 18 15:10:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 10:10:08 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <62126.74727.qm@web30401.mail.mud.yahoo.com> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: Yee Man, Robert, All tests are passing; there was a small change in the expected floating point, but no warning now. Re: passing this on to CPAN, I think it needs a distinct version from BioPerl (something that should probably happen with any spinoffs). I foresee two options (and a possible conflict): 1) Use the same versioning scheme, starting with 1.6.1. 2) Use a simpler scheme a'la Bio::Graphics, which I suggest. Tripartite versions are a PITA, we'll only need to keep that in core. Conflict: Bio::Tools::HMM is currently part of the 1.6 branch (in 1.6.0). If this stays in 1.6.1 then we have two versions of the module floating out there. I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 is out. After that, I could then add Yee Man as PAUSE co- maintainer for those modules (which means Yee Man needs to sign up for a PAUSE account). Any objections to that? chris On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > I get it now. So it is now spinned off. Anyway, I updated the HMM.pm > in Bio-Tools-HMM with the latest version. I think it should work. > > Yee Man > > --- On Mon, 8/17/09, Robert Buels wrote: > >> From: Robert Buels >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Chris Fields" , "BioPerl List" > > >> Date: Monday, August 17, 2009, 4:00 PM >> Yee Man Chan wrote: >>> I noticed that Bio/Tools/HMM.pm was removed from the >> trunk. So I added it back in. I think you shouldn't get the >> warnings with this version. >> >> Please read my email above with instructions for checkout >> out the new Bio-Tools-HMM component, where Bio::Tools::HMM >> has been moved. Please do not add the Bio::Tools::HMM >> module back into bioperl-live. >> >> I think you might be confused about the functions of 'svn >> add', 'svn commit', etc, because I don't see any actual >> addition of the module in the commit logs. Please read >> through the SVN manual at http://svnbook.red-bean.com/ if you need >> clarification. >> >> Rob >> >> > > > From hlapp at gmx.net Tue Aug 18 15:46:55 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 11:46:55 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A89EADD.9050509@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> Message-ID: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > I can see how this might be a good idea, or it might be overkill. > Anybody have thoughts on having feature _sources_ strongly typed > with ontology terms? It's how BioSQL and Chado would store it anyway. I'm not sure whether GFF3 requires it, possibly not. But when you make everything else ontology-typed, why exempt one property that also stands to benefit from more predictable values? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Tue Aug 18 15:49:32 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 08:49:32 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <62126.74727.qm@web30401.mail.mud.yahoo.com> Message-ID: <4A8ACD8C.1060908@cornell.edu> Chris Fields wrote: > I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, and I > could attempt to push the initial Bio-Ext-HMM release after core 1.6.1 > is out. After that, I could then add Yee Man as PAUSE co-maintainer for > those modules (which means Yee Man needs to sign up for a PAUSE > account). Any objections to that? Sounds like a good plan to me, if Yee Man agreed with it. He would be the primary CPAN maintainer of the package. Maybe he should actually be the first uploader too? Then, it would show up under his PAUSE account at the outset, and he would get better attribution and visibility. Rob From cjfields at illinois.edu Tue Aug 18 16:34:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 11:34:00 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 10:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. Might be worth bringing up with Lincoln to get his thoughts. > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar What I'm thinking as well. You can always implement it that way, and if we deem it too heavy-weight then revert back. Or have it evaluated lazily and get the benefits of both. That's the magic of doing this on a branch, it gives you much more latitude to try things out. chris From cain.cshl at gmail.com Tue Aug 18 18:28:05 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 18 Aug 2009 14:28:05 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: Hi Hilmar and all, Actually, Chado stores sources as a dbxref for the feature (where the db.name is "GFF_source") and the source can be any string, which is what the GFF3 spec indicates. I think the source was intended to be free text to allow the creator maximum flexibility when making the GFF; it also allows lots of flexibility when defining what features go into a particular track in GBrowse: you can have lots of gene features in your GFF, but you can segregate them according to what their source attributes are. Additionally, some applications (SynBrowse comes to mind) overload the source value and require them to conform to a certain syntax. So, what I'm trying to say is, source should probably just stay a simple string. Scott On Aug 18, 2009, at 11:46 AM, Hilmar Lapp wrote: > > On Aug 17, 2009, at 7:42 PM, Robert Buels wrote: > >> I can see how this might be a good idea, or it might be overkill. >> Anybody have thoughts on having feature _sources_ strongly typed >> with ontology terms? > > > It's how BioSQL and Chado would store it anyway. I'm not sure > whether GFF3 requires it, possibly not. > > But when you make everything else ontology-typed, why exempt one > property that also stands to benefit from more predictable values? > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From marcelo011982 at gmail.com Tue Aug 18 18:34:17 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Tue, 18 Aug 2009 15:34:17 -0300 Subject: [Bioperl-l] Genbank code from Blast results Message-ID: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> hi all.. I was doing a script that take some information of the results of blastn files. Everythig was ok, but i have some dificult to pic the Genbank code number (the 'gb' below). I tried $obj->each_accession_number $hit->name And some variation of this. ------------------------------ >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h segment 1 gmrtDrNS01 Glycine max cDNA 3', mRNA sequence /clone_end=3' /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 Length = 853 Score = 1336 bits (674), Expect = 0.0 Identities = 793/832 (95%), Gaps = 8/832 (0%) Strand = Plus / Minus Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt 294917 |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc 794 ---------------------------------------- But, i still don't get it. thank you with regards Miwata From hlapp at gmx.net Tue Aug 18 20:01:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 18 Aug 2009 16:01:18 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > Additionally, some applications (SynBrowse comes to mind) overload > the source value and require them to conform to a certain syntax. > > So, what I'm trying to say is, source should probably just stay a > simple string. I would rephrase that to source should probably retain the possibility of using made-up strings. You mention one example yourself, and there have been others in a recent thread on BioSQL [1], for why the option to have predictable, structured values with attached semantics could be very useful. -hilmar [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 18 21:46:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 18 Aug 2009 16:46:25 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8ACD8C.1060908@cornell.edu> References: <62126.74727.qm@web30401.mail.mud.yahoo.com> <4A8ACD8C.1060908@cornell.edu> Message-ID: On Aug 18, 2009, at 10:49 AM, Robert Buels wrote: > Chris Fields wrote: >> I think we should go ahead and remove Bio::Tools::HMM from 1.6.1, >> and I could attempt to push the initial Bio-Ext-HMM release after >> core 1.6.1 is out. After that, I could then add Yee Man as PAUSE >> co-maintainer for those modules (which means Yee Man needs to sign >> up for a PAUSE account). Any objections to that? > > > Sounds like a good plan to me, if Yee Man agreed with it. He would > be the primary CPAN maintainer of the package. Maybe he should > actually be the first uploader too? Then, it would show up under > his PAUSE account at the outset, and he would get better attribution > and visibility. > > Rob At the moment BIOPERLML is the primary maintainer. It's an 'umbrella' account for the bioperl group; a few others exist for stuff like DBI, Catalyst, etc I think. Anyone who's designated a co-maintainer can release code onto CPAN. Several of us can assign new co-maintainer status for modules, so the code doesn't get locked up if someone decides to abandon it. We simply designate another co-maintainer if someone decides to take it over. In fact, that's half the reason I would like to get the ext code out there again; either designate it as abandonware or set it up so that it can be reimplemented by someone with the tuits (maybe using biolib, for instance). We have recently moved Bio::Graphics over to LDS as the primary, though, so this is all a point up for debate. chris From rmb32 at cornell.edu Tue Aug 18 21:56:19 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 14:56:19 -0700 Subject: [Bioperl-l] BioPerl at YAPC::2010 In-Reply-To: <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> References: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net> <4A45383E.40207@cornell.edu> <20090818174053.3f379c5elembark@wrkhors.com@wrkhors.com> Message-ID: <4A8B2383.1030207@cornell.edu> Steven, Could you CC Heath Bair on this? He's the YAPC::NA 2010 coordinator that started this thread. Rob Steven Lembark wrote: > On Fri, 26 Jun 2009 14:06:06 -0700 > Robert Buels wrote: > >> This is a really giant opportunity to expose some of the best >> technologists in the world to what we do in bioinformatics, and possibly >> to entice some of them to help us the heck out! ;-) > > OK, so I'm a few months behind on my email... > > One suggestion: Have them add a BioPerl track to the > conference in advance of getting any submissions for > it. The gent I spoke to in Pittsburgh seemed open to > the idea of a Bioinformatcs/BioPerl track in 2010. > > Opening things up a bit to include Bioinformatics > even beyond BioPerl would give people who are > marginally interested a chance to see what the > whole area is about (e.g., adapting the W-Curve > for use with Perl or how we analyzed Clostridia > using Perl for the bookkeeping). > > In the meantime you might want to see how many > people would be willing to give talks in the > track -- even recycled ones -- before the conference > submission period begins. And, yes, I'd volunteer to > give 1-2 talks. > > enjoi > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jncline at gmail.com Wed Aug 19 03:06:19 2009 From: jncline at gmail.com (Jonathan Cline) Date: Tue, 18 Aug 2009 22:06:19 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: <4A8B6C2B.9030101@gmail.com> Chris Fields wrote: > > Your modules may or may not need the Bio* namespace (that's up to you, > actually); there are several non-bioperl modules that also share the > Bio* namespace, and I believe there are modules that aren't Bio* that > use BioPerl (Gbrowse comes to mind). If you're focusing on > interaction with robotics, Robotics::Bio::X might be a better > namespace for instance (b/c you could expand later into other possibly > non-bio robotics interfaces). Based on your & other opinions I have received, I am creating: Robotics.pm (high level hardware abstraction layer) Robotics::Tecan Robotics::Tecan::Genesis I'll post a release note when it's reached an interesting level of maturity (estimate a couple weeks from now) so anyone with the hardware can play with the package. It's currently working great, and I am adding functionality on a daily basis. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> >> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>>> Sent: Thursday, 30 July 2009 2:07 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Cc: Jonathan Cline >>>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>>> >>>> I am writing a module for communication with biology robotics, as >>>> discussed recently on #bioperl, and I invite your comments. >>>> >>>> >>>> On Namespace: >>>> >>>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >>>> s/w modules already called 'robots' (web spider robots, chat bots, www >>>> automate, etc) so I chose the longer name "robotics" to differentiate >>>> this module as manipulating real hardware. Bio::Robotics is the >>>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>>> manufacturer-specific implementation. Robot control is made more >>>> complex due to the very configurable nature of the work table >>>> (placement >>>> of equipment, type of equipment, type of attached arm, etc). The >>>> abstraction has to be careful not to generalize or assume too >>>> much. In >>>> some cases, the Bio::Robotics modules may expand to arbitrary >>>> equipment >>>> such as thermocyclers, tray holders, imagers, etc - that could be a >>>> future roadmap plan. From rmb32 at cornell.edu Wed Aug 19 04:13:53 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:13:53 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <829996.94283.qm@web30404.mail.mud.yahoo.com> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A8B7C01.5060502@cornell.edu> Yee Man Chan wrote: > I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. The long-term development plan is to factor *ALL* of Bioperl into individual distributions similar to Bio-Tools-HMM. It is actually much easier to maintain and release code in this "broken up" way. This means that the Bio-Ext package is going to go away, so it doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core devs, do you agree with this? > I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Oh good, the next step would just be to coordinate when to do the release in concert with Bioperl 1.6.1, right? Rob From rmb32 at cornell.edu Wed Aug 19 04:37:49 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 18 Aug 2009 21:37:49 -0700 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <190221.61009.qm@web30408.mail.mud.yahoo.com> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A8B819D.9070309@cornell.edu> Yee Man Chan wrote: > Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. Yes, we are probably going to make a Task::BioPerl or something similar. > What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. By going away, I meant that everything in there is going to be spinned off. Except modules that are no longer maintainable, if there are any in there. Rob From deequan at gmail.com Wed Aug 19 04:39:35 2009 From: deequan at gmail.com (deequan) Date: Tue, 18 Aug 2009 21:39:35 -0700 (PDT) Subject: [Bioperl-l] bioperl capability In-Reply-To: <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> Message-ID: <25037707.post@talk.nabble.com> Howdy there, Yes, quite right. I apologize for the double posting. Moreover, I appreciate your assistance in trying to sort out what can and cannot be done with bioperl. To address the problem previously stated, I put together a remarkably misbehaving script that has the following parts: #Some parsing: $q_start = $hsp->query->start; $q_end = $hsp->query->end; $h_start = $hsp->hit->start; $h_end = $hsp->hit->end; $length = $hsp->query->seqlength(); $id = $hit->accession; print OUT "$id\t"; my $seq; if($h_start<$h_end){ #the bit per your recommendation my $begin = $h_start-$q_start+1; my $cease = ($length - $q_end) + $h_end; my $strand = 1; my $factory = Bio::DB::GenBank->new(-format=> 'genbank', -seq_start =>$begin, -seq_stop =>$cease, -strand => $strand, #1 = plus, 2 = minus ); $seq = $factory->get_Seq_by_acc($id); }else{#else assume backward, code not shown} #and some stuff to retrieve the sequence my $len = $seq->length(); my $string = $seq->subseq(1, $len); print OUT "length = $len\t"; print OUT "seq = $string\n"; In your previous reply, you said the code accessing the seq object created by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO for basic IO purposes. Not seeing exactly how to go about that, I tried some other functions in combination that seemed as though they should work (length() and subseq()). Unfortunately, the program does not even run to that point, as the script throws an exception: ------------- EXCEPTION ------------- MSG: acc CP000948 does not exist STACK Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 2 STACK toplevel test.pl:36 ------------------------------------- Oddly, the record corresponding to this accession number can be found here: http://www.ncbi.nlm.nih.gov/nuccore/169887498 Perhaps you'd be willing to offer another hint. Thank you for your assistance thus far. And on behalf of all posters, thank you for sharing your knowledge. 'Preciate. David Q. Chris Fields-5 wrote: > > I think I already answered this: > > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/20302/focus=20305 > > chris > > -- View this message in context: http://www.nabble.com/bioperl-capability-tp25024929p25037707.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Wed Aug 19 05:28:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:29 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> References: <190221.61009.qm@web30408.mail.mud.yahoo.com> <4A8B819D.9070309@cornell.edu> Message-ID: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> On Aug 18, 2009, at 11:37 PM, Robert Buels wrote: > Yee Man Chan wrote: >> Is it going to be an arrangement similar to bioconductor? If so, I >> suppose then it makes sense. But you might want to develop scripts >> to automatically download and install new modules to make it user >> friendly. > Yes, we are probably going to make a Task::BioPerl or something > similar. > >> What do you mean by Bio-Ext is going away? I notice quite many >> people using dpAlign. So if Bio-Ext is going away, then at least >> dpAlign should become another spin off. > By going away, I meant that everything in there is going to be > spinned off. Except modules that are no longer maintainable, if > there are any in there. > > Rob dpAlign could become another spinoff, yes, if it's used (and works fine). The problematic code dealt with pSW, alignment statistics, and staden io_lib support (the latter which is fairly bit rotted now): http://bugzilla.open-bio.org/show_bug.cgi?id=2668 http://bugzilla.open-bio.org/show_bug.cgi?id=1857 http://bugzilla.open-bio.org/show_bug.cgi?id=2069 http://bugzilla.open-bio.org/show_bug.cgi?id=2074 http://bugzilla.open-bio.org/show_bug.cgi?id=2329 dpAlign has it's own bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2384 chris From cjfields at illinois.edu Wed Aug 19 05:28:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:39 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> References: <829996.94283.qm@web30404.mail.mud.yahoo.com> <4A8B7C01.5060502@cornell.edu> Message-ID: <1DA73AAB-EC4F-4F44-BBF2-CFF7B3E4A0BE@illinois.edu> On Aug 18, 2009, at 11:13 PM, Robert Buels wrote: > Yee Man Chan wrote: >> I think it is better to keep Bio-Tools-HMM within the Bio-Ext >> package and then spin this whole Bio-Ext package out to CPAN. I am >> ok with Robert's arrangement to move the related pm files under Bio/ >> Tools/ to the new Bio-Ext package. > > The long-term development plan is to factor *ALL* of Bioperl into > individual distributions similar to Bio-Tools-HMM. It is actually > much easier to maintain and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so it > doesn't make sense to keep Bio-Tools-HMM in it. Chris, other core > devs, do you agree with this? In general, though there will be a limit as to how small we can split these off. For instance, Bio::Tree/TreeIO will be messy to split up and makes sense to keep together. Others could be more easily split off. YMMV. >> I have a PAUSE already due to my other CPAN contributions. So there >> is no need to create a new one. My PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to do the > release in concert with Bioperl 1.6.1, right? > > Rob Yes. That should be easy enough to do; basically Bio::Tools::HMM will be removed from 1.6.1, then core will be released along with Bio::Ext::HMM (or Bio::Tools::HMM, either way it would double as the distribution name). chris From cjfields at illinois.edu Wed Aug 19 05:28:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 00:28:48 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A8B6C2B.9030101@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> <4A8B6C2B.9030101@gmail.com> Message-ID: <2F5111BE-A1F3-437F-AC6C-4AC3BE05E9EB@illinois.edu> On Aug 18, 2009, at 10:06 PM, Jonathan Cline wrote: > Chris Fields wrote: >> >> Your modules may or may not need the Bio* namespace (that's up to >> you, >> actually); there are several non-bioperl modules that also share the >> Bio* namespace, and I believe there are modules that aren't Bio* that >> use BioPerl (Gbrowse comes to mind). If you're focusing on >> interaction with robotics, Robotics::Bio::X might be a better >> namespace for instance (b/c you could expand later into other >> possibly >> non-bio robotics interfaces). > > Based on your & other opinions I have received, I am creating: > > Robotics.pm (high level hardware abstraction layer) > Robotics::Tecan > Robotics::Tecan::Genesis > > > I'll post a release note when it's reached an interesting level of > maturity (estimate a couple weeks from now) so anyone with the > hardware > can play with the package. It's currently working great, and I am > adding functionality on a daily basis. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## That's great to hear! Keep us updated, I'm sure there are a few potential users lurking about here. chris From scott at scottcain.net Wed Aug 19 13:15:12 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 19 Aug 2009 09:15:12 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> Message-ID: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Hilmar, The examples in that thread ought to go in the ninth column; using the Dbxref tag for references back to GenBank for example. The provenience stuff should go in the ninth column as well, though I don't know exactly how would be best. Scott On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > > On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: > >> Additionally, some applications (SynBrowse comes to mind) overload >> the source value and require them to conform to a certain syntax. >> >> So, what I'm trying to say is, source should probably just stay a >> simple string. > > > I would rephrase that to source should probably retain the > possibility of using made-up strings. > > You mention one example yourself, and there have been others in a > recent thread on BioSQL [1], for why the option to have predictable, > structured values with attached semantics could be very useful. > > -hilmar > > [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From saikari78 at gmail.com Wed Aug 19 13:30:07 2009 From: saikari78 at gmail.com (saikari keitele) Date: Wed, 19 Aug 2009 14:30:07 +0100 Subject: [Bioperl-l] Pipeline for generating phylogenetic trees from list of species names Message-ID: Hi, Does anyone know of a simple pipeline for generating a phylogenetic tree from a list of species with bioperl? I've had a look at http://www.bioperl.org/wiki/HOWTO:PhylogeneticAnalysisPipeline#Distance_Distance_in_PHYLIP_.2B_NJ_Tree_in_PHYLIPbut it isn't explicit for the crucial steps (at least given my level of knowledge) For each species, should I extract the longest sequence available for every protein and align it with the same protein sequences of the other species in the list? Would anyone have an example pipeline of the different steps to perform? Thank you very much. Saikari From ymc at yahoo.com Wed Aug 19 02:50:57 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 19:50:57 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <829996.94283.qm@web30404.mail.mud.yahoo.com> I think it is better to keep Bio-Tools-HMM within the Bio-Ext package and then spin this whole Bio-Ext package out to CPAN. I am ok with Robert's arrangement to move the related pm files under Bio/Tools/ to the new Bio-Ext package. There aren't that many modules in Bio-Ext. Plus, based on Chris and Robert's comments, modules other than my dpAlign and HMM appear to be abandoned. Moving HMM out only makes users less likely to try it out. If need be, I can also be a co-maintainer of this spinned off Bio-Ext package. I have a PAUSE already due to my other CPAN contributions. So there is no need to create a new one. My PAUSE account is UMVUE. Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Tuesday, August 18, 2009, 8:10 AM > Yee Man, Robert, > > All tests are passing; there was a small change in the > expected floating point, but no warning now. > > Re: passing this on to CPAN, I think it needs a distinct > version from BioPerl (something that should probably happen > with any spinoffs).? I foresee two options (and a > possible conflict): > > 1) Use the same versioning scheme, starting with 1.6.1. > 2) Use a simpler scheme a'la Bio::Graphics, which I > suggest.? Tripartite versions are a PITA, we'll only > need to keep that in core. > > Conflict: Bio::Tools::HMM is currently part of the 1.6 > branch (in 1.6.0).? If this stays in 1.6.1 then we have > two versions of the module floating out there. > > I think we should go ahead and remove Bio::Tools::HMM from > 1.6.1, and I could attempt to push the initial Bio-Ext-HMM > release after core 1.6.1 is out.? After that, I could > then add Yee Man as PAUSE co-maintainer for those modules > (which means Yee Man needs to sign up for a PAUSE > account).? Any objections to that? > > chris > > On Aug 17, 2009, at 7:24 PM, Yee Man Chan wrote: > > > I get it now. So it is now spinned off. Anyway, I > updated the HMM.pm in Bio-Tools-HMM with the latest version. > I think it should work. > > > > Yee Man > > > > --- On Mon, 8/17/09, Robert Buels > wrote: > > > >> From: Robert Buels > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Yee Man Chan" > >> Cc: "Chris Fields" , > "BioPerl List" > >> Date: Monday, August 17, 2009, 4:00 PM > >> Yee Man Chan wrote: > >>> I noticed that Bio/Tools/HMM.pm was removed > from the > >> trunk. So I added it back in. I think you > shouldn't get the > >> warnings with this version. > >> > >> Please read my email above with instructions for > checkout > >> out the new Bio-Tools-HMM component, where > Bio::Tools::HMM > >> has been moved.? Please do not add the > Bio::Tools::HMM > >> module back into bioperl-live. > >> > >> I think you might be confused about the functions > of 'svn > >> add', 'svn commit', etc, because I don't see any > actual > >> addition of the module in the commit logs.? > Please read > >> through the SVN manual at http://svnbook.red-bean.com/ if you need > >> clarification. > >> > >> Rob > >> > >> > > > > > > > > From ymc at yahoo.com Wed Aug 19 04:24:05 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:24:05 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B7C01.5060502@cornell.edu> Message-ID: <190221.61009.qm@web30408.mail.mud.yahoo.com> Is it going to be an arrangement similar to bioconductor? If so, I suppose then it makes sense. But you might want to develop scripts to automatically download and install new modules to make it user friendly. What do you mean by Bio-Ext is going away? I notice quite many people using dpAlign. So if Bio-Ext is going away, then at least dpAlign should become another spin off. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:13 PM > Yee Man Chan wrote: > > I think it is better to keep Bio-Tools-HMM within the > Bio-Ext package and then spin this whole Bio-Ext package out > to CPAN. I am ok with Robert's arrangement to move the > related pm files under Bio/Tools/ to the new Bio-Ext > package. > > The long-term development plan is to factor *ALL* of > Bioperl into individual distributions similar to > Bio-Tools-HMM.? It is actually much easier to maintain > and release code in this "broken up" way. > > This means that the Bio-Ext package is going to go away, so > it doesn't make sense to keep Bio-Tools-HMM in it.? > Chris, other core devs, do you agree with this? > > > I have a PAUSE already due to my other CPAN > contributions. So there is no need to create a new one. My > PAUSE account is UMVUE. > Oh good, the next step would just be to coordinate when to > do the release in concert with Bioperl 1.6.1, right? > > Rob > > From ymc at yahoo.com Wed Aug 19 04:49:18 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Tue, 18 Aug 2009 21:49:18 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A8B819D.9070309@cornell.edu> Message-ID: <184595.94226.qm@web30407.mail.mud.yahoo.com> Good. That makes sense then. Please update me when all is set. Yee Man --- On Tue, 8/18/09, Robert Buels wrote: > From: Robert Buels > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Chris Fields" , "BioPerl List" > Date: Tuesday, August 18, 2009, 9:37 PM > Yee Man Chan wrote: > > Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > What do you mean by Bio-Ext is going away? I notice > quite many people using dpAlign. So if Bio-Ext is going > away, then at least dpAlign should become another spin off. > By going away, I meant that everything in there is going to > be spinned off.? Except modules that are no longer > maintainable, if there are any in there. > > Rob > > From ymc at yahoo.com Wed Aug 19 09:01:39 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 02:01:39 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <6ADF16A9-3D14-45F3-B972-98134B0A0DB1@illinois.edu> Message-ID: <884845.92813.qm@web30408.mail.mud.yahoo.com> I tried that sample script that reportedly caused the dpAlign "bug" but I can't reproduced it. All I get is a warning from LocatableSeq. ------------------------------------------- [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "-Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl --------------------- WARNING --------------------- MSG: In sequence ABC|9944760 residue count gives end value 101. Overriding value [104] with value 101 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 300 Getting score for ABC|9986984 -> ABC|9944760 = 303 ------------------------------------------ Does the test script crash in your machine? Yee Man --- On Tue, 8/18/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Yee Man Chan" , "BioPerl List" > Date: Tuesday, August 18, 2009, 10:28 PM > On Aug 18, 2009, at 11:37 PM, Robert > Buels wrote: > > > Yee Man Chan wrote: > >> Is it going to be an arrangement similar to > bioconductor? If so, I suppose then it makes sense. But you > might want to develop scripts to automatically download and > install new modules to make it user friendly. > > Yes, we are probably going to make a Task::BioPerl or > something similar. > > > >> What do you mean by Bio-Ext is going away? I > notice quite many people using dpAlign. So if Bio-Ext is > going away, then at least dpAlign should become another spin > off. > > By going away, I meant that everything in there is > going to be spinned off.? Except modules that are no > longer maintainable, if there are any in there. > > > > Rob > > dpAlign could become another spinoff, yes, if it's used > (and works fine).? The problematic code dealt with pSW, > alignment statistics, and staden io_lib support (the latter > which is fairly bit rotted now): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > dpAlign has it's own bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > chris > From cjfields at illinois.edu Wed Aug 19 14:49:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 19 Aug 2009 09:49:15 -0500 Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: <884845.92813.qm@web30408.mail.mud.yahoo.com> References: <884845.92813.qm@web30408.mail.mud.yahoo.com> Message-ID: I'll have a look. It's probably something that hasn't been updated to deal with LocatableSeq's pathological end point checking. chris On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > I tried that sample script that reportedly caused the dpAlign "bug" > but I can't reproduced it. All I get is a warning from LocatableSeq. > ------------------------------------------- > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl "-Iblib/lib" "- > Iblib/arch" "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9944760 residue count gives end value 101. > Overriding value [104] with value 101 for Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT > -GGG-CCGGCCC-AA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 300 > Getting score for ABC|9986984 -> ABC|9944760 > = 303 > ------------------------------------------ > > Does the test script crash in your machine? > > Yee Man > > --- On Tue, 8/18/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was Re: [Bioperl-l] >> Problems with Bioperl-ext package on WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM >> On Aug 18, 2009, at 11:37 PM, Robert >> Buels wrote: >> >>> Yee Man Chan wrote: >>>> Is it going to be an arrangement similar to >> bioconductor? If so, I suppose then it makes sense. But you >> might want to develop scripts to automatically download and >> install new modules to make it user friendly. >>> Yes, we are probably going to make a Task::BioPerl or >> something similar. >>> >>>> What do you mean by Bio-Ext is going away? I >> notice quite many people using dpAlign. So if Bio-Ext is >> going away, then at least dpAlign should become another spin >> off. >>> By going away, I meant that everything in there is >> going to be spinned off. Except modules that are no >> longer maintainable, if there are any in there. >>> >>> Rob >> >> dpAlign could become another spinoff, yes, if it's used >> (and works fine). The problematic code dealt with pSW, >> alignment statistics, and staden io_lib support (the latter >> which is fairly bit rotted now): >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 >> >> dpAlign has it's own bug: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 >> >> chris >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Aug 19 22:19:25 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 19 Aug 2009 18:19:25 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> <4A87275C.5040300@cornell.edu> <4A89EADD.9050509@cornell.edu> <43793C53-C9C8-4854-9F4B-1C7D0A34C53F@gmx.net> <2EE85EA9-1732-4E82-B1B2-4F3150C8845B@scottcain.net> Message-ID: <4907C3F4-C503-4019-BBDA-153ED777276C@gmx.net> Putting it into the 9nth column is the equivalent of storing it in the {seqfeature,bioentry}_qualifier_value tables in BioSQL. -hilmar On Aug 19, 2009, at 9:15 AM, Scott Cain wrote: > Hilmar, > > The examples in that thread ought to go in the ninth column; using > the Dbxref tag for references back to GenBank for example. The > provenience stuff should go in the ninth column as well, though I > don't know exactly how would be best. > > Scott > > > > On Aug 18, 2009, at 4:01 PM, Hilmar Lapp wrote: > >> >> On Aug 18, 2009, at 2:28 PM, Scott Cain wrote: >> >>> Additionally, some applications (SynBrowse comes to mind) overload >>> the source value and require them to conform to a certain syntax. >>> >>> So, what I'm trying to say is, source should probably just stay a >>> simple string. >> >> >> I would rephrase that to source should probably retain the >> possibility of using made-up strings. >> >> You mention one example yourself, and there have been others in a >> recent thread on BioSQL [1], for why the option to have >> predictable, structured values with attached semantics could be >> very useful. >> >> -hilmar >> >> [1] http://lists.open-bio.org/pipermail/biosql-l/2009-August/001602.html >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Thu Aug 20 00:55:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 19 Aug 2009 20:55:22 -0400 Subject: [Bioperl-l] Hi In-Reply-To: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> References: <3bc6bb240908191147j1c707206r4bd290addd2cd2f@mail.gmail.com> Message-ID: Please ask on the mailing list for these things, I am not really sure what you mean by subtract all taxonomy -- I suspect you mean extract all IDs, I think you should take a look at the example like http://bioperl.org/wiki/Module:Bio::DB::Taxonomy I think the example is basically what you want to do, except replace the nodeid with 7742 instead of 33090 -jason On Aug 19, 2009, at 2:47 PM, JingtaoLiu(TSU) wrote: > Hi Sir, > > Thank you for reading this. > I am working for BioChem Dept Texastate university. > I encounter a problem. > I need subtract all taxonomy IDs from vertebrates(taxon id is 7742) > how I can get all the leaf node of these? > > I referenced Bio::DB::Taxonomy, > but i have no clue about it. > Very appreciate for your help. > > Jingtao Liu -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From yannick.wurm at unil.ch Wed Aug 19 19:25:11 2009 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Wed, 19 Aug 2009 21:25:11 +0200 Subject: [Bioperl-l] Programmer job in Lausanne Switzerland Message-ID: <1D1F031E-29F1-4AE4-A225-D9B434ACE070@unil.ch> Dear list, my apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. We have a job opening for assembly and annotation of ant genomes in Lausanne Switzerland. http://www.isb-sib.ch/about-sib/jobs/details/91-sib-bioinformatician-at-sib--unil.html http://fourmidable.unil.ch/BioinformaticsEngineerLausanneAnts.pdf Kind regards, Yannick http://yannick.poulet.org From sidd.basu at gmail.com Thu Aug 20 10:03:07 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Thu, 20 Aug 2009 05:03:07 -0500 Subject: [Bioperl-l] Re: code reuse with moose In-Reply-To: References: <20090812022753.GA815@Macintosh-74.local> <20090818110102.GA27010@seinfeld> Message-ID: <20090820100304.GA1884@seinfeld> On Tue, 18 Aug 2009, Chris Fields wrote: > > On Aug 18, 2009, at 6:01 AM, Siddhartha Basu wrote: > > > Putting it in the bioperl list, makes more sense here, > > > > On Wed, 12 Aug 2009, Chris Fields wrote: > > > >> (BTW, this is re: the reimplementation of major chunks of BioPerl > >> using > >> Moose, Biome: http://github.com/cjfields/biome/tree/) > >> > >> Locations should use a Role (specifically, Biome::Role::Range), so > >> start/end/strand should be attributes, not methods. With attributes > >> the > >> best way to do this is probably with a builder, and lazily (start > >> requires end, and vice versa). Factor out the common code as Tomas > >> indicates. BTW, the $self->throw() is akin to BioPerl's $self- > >> >throw() > >> exception handling; it simply catches any exceptions and passes them > >> to > >> the metaclass exception handling. > >> > >> I've been thinking about making the Range role abstract for this very > >> reason (or defining very basic attributes); something like: > >> > >> ---------------------------- > >> > >> package Bio::Role::Range; > >> > >> requires qw(_build_start _build_end _build_strand); > >> > >> # also require other methods which need to be defined in > >> implementation > >> > >> has 'start' => ( > >> isa => 'Int', > >> is => 'rw', > >> builder => '_build_start', > >> lazy => 1 > >> ); > >> > >> # same for end, strand (except strand has a different isa via > >> MooseX::Types) > >> .... > >> > >> package Bio::Location::Foo; > >> > >> with 'Bio::Role::Range'; > >> > >> sub _build_start { > >> # for location-specific start > >> } > >> > >> sub _build_end { > >> # for location-specific end > >> } > >> > >> sub _build_strand { > >> # for location-specific strand > >> } > >> > >> sub _common_build_method { > >> # factor out common code here, call from other builders > >> } > >> > >> ---------------------------- > > > > This plan makes things much clearer. Currently the > > BioMe::Role::Location has a 'requires' keyword and rest of the > > location modules consume that role to have its own implementation. At > > this point on BioMe::Location::Atomic has attribute based 'start' and > > 'end' implememtation. I got a bit confused because in current bioperl > > 'Bio::Location::Simple' inherits from 'Bio::Location::Atomic' and when > > i am trying to follow that path in BioMe it has to override that > > method. > > So, my question is do all the location modules really needs to > > inherits > > from each other. I am totally aware about the origianl design ideas > > but > > it would be better to have a flatten hierarchy if possible. > > Flattening with roles is always a good idea, yes. I wouldn't worry as > much about the way it was originally implemented as the general API (and > ways in which we can simplify it). Thanks for clarifying that. > > > One more thing, what about putting the 'start', 'end' and the other > > common base attributes in BioMe::Role::Location instead of > > BioMe::Role::Range. I am not sure which would be correct from bioperl > > stand of view, just throwing out an idea. > > That's a possibility. To me Locations are just Ranges with different > behavior (hence the below comment...) > > >> Also, I think the Coordinate-related stuff should be simplified down > >> to a > >> trait or an attribute; they bring in way too much overhead in > >> bioperl w/o > >> much added value. > > > > You mean instead of having 'builder' method, having a specialized > > traits handling those. That sounds like even better. > > > > -siddhartha > > Yes, that's essentially it. Location behavior could be changed by > having CoordinatePolicy as a trait. Similarly, fuzziness for start/end > could also be thought of as a trait. In essence, you could probably role > most behavior into attribute traits (which, in Moose, are just roles that > are composed into the attribute meta class, Moose::Meta::Attribute). I > had started up a Biome::Meta::Attribute class in case we were to go down > this path, then we could start registering specific traits within that > namespace. > > Just to note, it might be easier to try the simplest approach first and > get tests passing, then layer in traits to see how they act > performance-wise. My guess is they will speed things up, but you never > know. Locations will be a performance bottleneck as they are used in > generic Features. That's seemed to be a saner approach. Will play around with the builder approach and get the tests passing at least. thanks, -siddhartha > > chris From ymc at yahoo.com Thu Aug 20 03:01:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Wed, 19 Aug 2009 20:01:28 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <191324.76414.qm@web30403.mail.mud.yahoo.com> I noticed that the $qalseq is a LocatableSeq with gaps. I don't think my program was written to support LocatableSeq with gaps. If I removed the gaps, then I would have the scores agree with each other which should be the desired outcome. --------------------- WARNING --------------------- MSG: In sequence ABC|9986984 residue count gives end value 104. Overriding value [101] with value 104 for Bio::LocatableSeq::end(). TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA --------------------------------------------------- Getting score for ABC|9944760 -> ABC|9986984 = 291 Getting score for ABC|9986984 -> ABC|9944760 = 291 Do you think I should check for this LocatableSeq type and give an error or should I remove the gaps if this is a LocatableSeq? Yee Man --- On Wed, 8/19/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 7:49 AM > I'll have a look.? It's probably > something that hasn't been updated to deal with > LocatableSeq's pathological end point checking. > > chris > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > I tried that sample script that reportedly caused the > dpAlign "bug" but I can't reproduced it. All I get is a > warning from LocatableSeq. > > ------------------------------------------- > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > "-Iblib/lib" "-Iblib/arch" > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > --------------------- WARNING --------------------- > > MSG: In sequence ABC|9944760 residue count gives end > value 101. > > Overriding value [104] with value 101 for > Bio::LocatableSeq::end(). > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > --------------------------------------------------- > > Getting score for ABC|9944760 -> ABC|9986984 > > = 300 > > Getting score for ABC|9986984 -> ABC|9944760 > > = 303 > > ------------------------------------------ > > > > Does the test script crash in your machine? > > > > Yee Man > > > > --- On Tue, 8/18/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: Packaging Bio::Ext::HMM for CPAN, was > Re: [Bioperl-l] Problems with Bioperl-ext package on > WinVista? > >> To: "Robert Buels" > >> Cc: "Yee Man Chan" , > "BioPerl List" > >> Date: Tuesday, August 18, 2009, 10:28 PM > >> On Aug 18, 2009, at 11:37 PM, Robert > >> Buels wrote: > >> > >>> Yee Man Chan wrote: > >>>> Is it going to be an arrangement similar > to > >> bioconductor? If so, I suppose then it makes > sense. But you > >> might want to develop scripts to automatically > download and > >> install new modules to make it user friendly. > >>> Yes, we are probably going to make a > Task::BioPerl or > >> something similar. > >>> > >>>> What do you mean by Bio-Ext is going away? > I > >> notice quite many people using dpAlign. So if > Bio-Ext is > >> going away, then at least dpAlign should become > another spin > >> off. > >>> By going away, I meant that everything in > there is > >> going to be spinned off.? Except modules that > are no > >> longer maintainable, if there are any in there. > >>> > >>> Rob > >> > >> dpAlign could become another spinoff, yes, if it's > used > >> (and works fine).? The problematic code dealt > with pSW, > >> alignment statistics, and staden io_lib support > (the latter > >> which is fairly bit rotted now): > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > >> > >> dpAlign has it's own bug: > >> > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > >> > >> chris > >> > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at gmail.com Thu Aug 20 08:46:52 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Thu, 20 Aug 2009 10:46:52 +0200 Subject: [Bioperl-l] SCF installation Message-ID: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Hi, I am trying to install SCF (a prerequisite to samtools). I installed libread and the compilation seems to be working, only test is failing: zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Bio::SCF zoppel:Bio-SCF-1.01 bernd$ make cp SCF.pm blib/lib/Bio/SCF.pm cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc SCF.c Please specify prototyping behavior for SCF.xs (see perlxs manual) /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c Running Mkbootstrap for Bio::SCF () chmod 644 SCF.bs rm -f blib/arch/auto/Bio/SCF/SCF.bundle LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ -lread -lz \ chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs chmod 644 blib/arch/auto/Bio/SCF/SCF.bs Manifying blib/man3/Bio::SCF.3pm zoppel:Bio-SCF-1.01 bernd$ make test PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) Failed 18/18 subtests Test Summary Report ------------------- t/scf.t (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: Bad plan. You planned 18 tests but ran 0. Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 csys = 0.11 CPU) Result: FAIL Failed 1/1 test programs. 0/0 subtests failed. make: *** [test_dynamic] Error 2 Any idea what might be going wrong? Please not that in the directory there are some file empty: ls -ltr -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER -rw-r--r-- 1 bernd staff 532 17 mai 2006 README -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm drwxr-xr-x 3 bernd staff 102 17 mai 2006 t drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . Thanks, Bernd From cain.cshl at gmail.com Thu Aug 20 14:30:33 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 20 Aug 2009 10:30:33 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: Hi Bernd, Bio::SCF isn't technically part of BioPerl, but I have installed it before so I'll take a shot: do you have the Staden io-lib installed? It is a prereq for Bio::SCF. If you did install it, is it in a normal library path, and did you run ldconfig (if appropriate for your system) after installing it? io-lib can be obtained here: http://staden.sourceforge.net/ If you do have all of those things in place, what version of io-lib are you using? I wonder if there is an incompatibility between Bio::SCF and your version. The INSTALL doc for Bio::SCF indicates that you should have version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may have broken an api call that Bio::SCF depends on. Scott On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only > test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - > typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv > SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN > SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 > cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From dan.bolser at gmail.com Thu Aug 20 15:00:41 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 20 Aug 2009 16:00:41 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? Message-ID: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Hi, Quick version: How do I get a column of Bio::SimpleAlign using ungapped 'reference' sequence coordinates? Longer version: I have a set of pairwise alignments that I would like to process into a 'multiple sequence alignment' (MSA). All the alignments are short sequence 'contigs' aligned to a 'reference' sequence, so one sequence in all the pairwise alignments is constant (making the resulting MSA unambiguous). I came up with the following pseudo-code to create a MSA (Bio::SimpleAlign) from the set of pairwise alignments... initialise: Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. for each pairwise alignment: Create a Bio::LocatableSeq from the given fragment of the REFERENCE sequence (using ungapped REFERENCE coordinates). for each gap in the REFERENCE sequence: Take the position of the gap (in ungapped REFERENCE coordinates) and look up the corresponding column of the MSA (in ungapped REFERENCE coordinates). for each sequence in the column: Check if there is a gap-character at this position. if any sequence has a non gap-character at this position: Stick a gap in the MSA just before this position. Create a Bio::LocatableSeq from the CONTIG sequence (using ungapped REFERENCE coordinates) and add it to the Bio::SimpleAlign. done. I would very much appreciate, 1) feedback on the correctness of the above algorithm (it could be horribly wrong), and 2) advice on how to get a column of the alignment using ungapped REFERENCE coordinates? Sorry if this is a solved problem (where is it solved?). If not, and if I can get it working, I'll try to write a generic function to merge two MSAs when they have a reference sequence in common. For your reference, the pairwise alignments come from the show-aligns command in the MUMmer sequence alignment package, and have the following format: my.reference.fasta my.contigs.multi.fasta ============================================================ -- Alignments between REFERENCE and CONTIG00012 -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat ^ ^ ^ ^ 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa ^^ ^ ^ ^ 29335 aagacccctacataaatatatggtcccattaatattataaattaataat 97 aagacccctatataaatatatggtctcattaatattataaattaataat ^ ^ ... For further reference: This thread: http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html http://www.bioperl.org/wiki/Align_Refactor http://www.bioperl.org/wiki/Alignment_object All the best, Dan. From lincoln.stein at gmail.com Thu Aug 20 16:07:16 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 20 Aug 2009 12:07:16 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From j_martin at lbl.gov Thu Aug 20 16:41:16 2009 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 20 Aug 2009 09:41:16 -0700 Subject: [Bioperl-l] SCF installation In-Reply-To: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> Message-ID: <20090820164115.GA10681@eniac.jgi-psf.org> Hello, Bio::SCF isn't a pre-requisite of samtools or Bio::Samtools, and neither is actually related to Bioperl. samtools has a pretty active mailing list at sourceforge, you might try asking there. http://sourceforge.net/mailarchive/forum.php?forum_name=samtools-help I use samtools all the time w/o either of those modules. Joel On Thu, Aug 20, 2009 at 10:46:52AM +0200, Bernd Jagla wrote: > Hi, > > > > I am trying to install SCF (a prerequisite to samtools). > > I installed libread and the compilation seems to be working, only test is > failing: > > > > zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL > > Checking if your kit is complete... > > Looks good > > Writing Makefile for Bio::SCF > > > > zoppel:Bio-SCF-1.01 bernd$ make > > cp SCF.pm blib/lib/Bio/SCF.pm > > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm > > /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap > /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv SCF.xsc > SCF.c > > Please specify prototyping behavior for SCF.xs (see perlxs manual) > > /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include > -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include > -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" > "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c > > Running Mkbootstrap for Bio::SCF () > > chmod 644 SCF.bs > > rm -f blib/arch/auto/Bio/SCF/SCF.bundle > > LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 > /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup > -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ > > -lread -lz \ > > > > chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle > > cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs > > chmod 644 blib/arch/auto/Bio/SCF/SCF.bs > > Manifying blib/man3/Bio::SCF.3pm > > > > > > zoppel:Bio-SCF-1.01 bernd$ make test > > PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > > t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) > > t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) > > Failed 18/18 subtests > > > > Test Summary Report > > ------------------- > > t/scf.t (Wstat: 512 Tests: 0 Failed: 0) > > Non-zero exit status: 2 > > Parse errors: Bad plan. You planned 18 tests but ran 0. > > Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr 0.01 > csys = 0.11 CPU) > > Result: FAIL > > Failed 1/1 test programs. 0/0 subtests failed. > > make: *** [test_dynamic] Error 2 > > > > > > > > > > Any idea what might be going wrong? > > > > Please not that in the directory there are some file empty: > > > > ls -ltr > > -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf > > -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER > > -rw-r--r-- 1 bernd staff 532 17 mai 2006 README > > -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL > > -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL > > -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs > > -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 t > > drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg > > drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF > > -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml > > -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST > > drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old > > -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib > > drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib > > -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs > > -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o > > -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c > > drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . > > > > > > Thanks, > > > > Bernd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Aug 20 16:42:23 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 20 Aug 2009 17:42:23 +0100 Subject: [Bioperl-l] Creating a MSA from a set of pairwise alignments with a common reference sequence? In-Reply-To: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> References: <2c8757af0908200800r6689470bo9d9e7b634397e969@mail.gmail.com> Message-ID: <4A8D7CEF.4080002@gmail.com> Hi Dan, I think you want the Bio::LocatableSeq method "column_from_residue_number". You might also try combining your pairwise alignments using the profile alignment option in ClustalW. Cheers. Roy. Dan Bolser wrote: > Hi, > > Quick version: How do I get a column of Bio::SimpleAlign using > ungapped 'reference' sequence coordinates? > > > > Longer version: > > I have a set of pairwise alignments that I would like to process into > a 'multiple sequence alignment' (MSA). All the alignments are short > sequence 'contigs' aligned to a 'reference' sequence, so one sequence > in all the pairwise alignments is constant (making the resulting MSA > unambiguous). > > I came up with the following pseudo-code to create a MSA > (Bio::SimpleAlign) from the set of pairwise alignments... > > initialise: > Create an 'empty' Bio::SimpleAlign from the REFERENCE sequence. > > for each pairwise alignment: > Create a Bio::LocatableSeq from the given fragment of the > REFERENCE sequence (using ungapped REFERENCE coordinates). > > for each gap in the REFERENCE sequence: > Take the position of the gap (in ungapped REFERENCE > coordinates) and look up the corresponding column of the MSA > (in ungapped REFERENCE coordinates). > > for each sequence in the column: > Check if there is a gap-character at this position. > > if any sequence has a non gap-character at this position: > Stick a gap in the MSA just before this position. > > Create a Bio::LocatableSeq from the CONTIG sequence (using > ungapped REFERENCE coordinates) and add it to the > Bio::SimpleAlign. > > done. > > > I would very much appreciate, 1) feedback on the correctness of the > above algorithm (it could be horribly wrong), and 2) advice on how to > get a column of the alignment using ungapped REFERENCE coordinates? > > > Sorry if this is a solved problem (where is it solved?). If not, and > if I can get it working, I'll try to write a generic function to merge > two MSAs when they have a reference sequence in common. > > > For your reference, the pairwise alignments come from the show-aligns > command in the MUMmer sequence alignment package, and have the > following format: > > my.reference.fasta my.contigs.multi.fasta > > ============================================================ > -- Alignments between REFERENCE and CONTIG00012 > > -- BEGIN alignment [ +1 29237 - 45714 | +1 1 - 16441 ] > > > 29237 aataacctctttaag.taatatttttctctggtcccaacttgcgccaat > 1 aataa.ctctttaagataatatttttctctggtcccgacttgggccaat > ^ ^ ^ ^ > > 29286 ggaaaaaaatcacttattcgataa.ataataagataaatatattttcta > 49 ggaaaaaaatcactatttcgataagataataagata.atatattttcaa > ^^ ^ ^ ^ > > 29335 aagacccctacataaatatatggtcccattaatattataaattaataat > 97 aagacccctatataaatatatggtctcattaatattataaattaataat > ^ ^ > > ... > > > For further reference: > > This thread: > http://bioperl.org/pipermail/bioperl-l/2009-July/030643.html > > http://www.bioperl.org/wiki/Align_Refactor > > http://www.bioperl.org/wiki/Alignment_object > > > > All the best, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Thu Aug 20 20:31:20 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 20 Aug 2009 16:31:20 -0400 Subject: [Bioperl-l] genbank to fasta conversion Message-ID: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Hello, I have previously converted multiple genbank files to fasta. For some reason I am having trouble with this simple script. #!/usr/bin/perl -w use strict; use Bio::SeqIO; open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); my $in = Bio::SeqIO->new('-file' => "$inFile" , '-format' => 'GenBank'); my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); print $out $_ while <$in>; I keep getting the error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Could not open GLOB(0x36a214): No such file or directory STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ genbank.pm:202 STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 ----------------------------------------------------------- I am probably missing something simple, but would appreciate any help. M From cjfields at illinois.edu Thu Aug 20 20:38:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 20 Aug 2009 15:38:03 -0500 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <7868B105-53AD-4C87-8B21-2E4D4A7781B5@illinois.edu> You are passing filehandles in, not file names. Switch the '-file' parameter to '-fh'. chris On Aug 20, 2009, at 3:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 20 20:43:06 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 20 Aug 2009 13:43:06 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: <4A8DB55A.6060605@cornell.edu> The error is that you are opening a filehandle called $outfile, and then you are stringifying it (resulting in a string containing "GLOB(..)", and telling Bio::SeqIO write to a file named "GLOB(...)", which it can't open. You probably want to use the -fh arguments for your two uses of Bio::SeqIO, either that, or remove your open() calls and pass the filenames to the SeqIO objects directly, like: my $in = Bio::SeqIO->new ('-file' => "C:/Documents and Settings/mydir/Desktop/TARGETING.gb", '-format' => 'GenBank', ); my $out = Bio::SeqIO->new ('-file' => ">C:/Documents and Settings/mydir/Desktop/TARGET.fa", '-format' => 'fasta', ); Rob Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm:310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From sharpton at berkeley.edu Thu Aug 20 20:40:34 2009 From: sharpton at berkeley.edu (Thomas Sharpton) Date: Thu, 20 Aug 2009 13:40:34 -0700 Subject: [Bioperl-l] genbank to fasta conversion In-Reply-To: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> References: <69367b8f0908201331g4c20e2a7yfac69a9ae1a9c7c0@mail.gmail.com> Message-ID: This is a problem I think I can solve, so I'm chiming in for once. Looks to me like you're trying to pass a file handle to the -file setting in your SeqIO object. One of the excellent things about using SeqIO is that you don't need to worry about file handles; it's all taken care of under the hood. Try the following adaptation of your script: #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $inFile = "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"; my $outfile = "C:/Documents and Settings/mydir/Desktop/TARGET.fa"; #OPEN A SEQUENCE FILE OF INTEREST ($inFile) AND CREATE A SEQUENCE STREAM ($in) my $in = Bio::SeqIO->new(-file => "$inFile" , '-format' => 'GenBank'); #OPEN AN OUPUT FILE OF INTEREST ($outfile)AND CREATE AN OUTPUT SEQUENCE STREAM ($out) #NOTICE HOW WE SET -file FOR OUTPUT WITH THE > SYMBOL HERE: my $out = Bio::SeqIO->new(-file => ">$outfile" ,'-format' => 'Fasta'); #NOW LET'S DO THE CONVERSION AND DUMP THE OUTPUT #INSTEAD OF DOING THIS #print $out $_ while <$in>; #TRY THIS while(my $seq = $in->next_seq() ){ $out->write_seq($seq) } The above is pretty much what you'll find here: http://www.bioperl.org/wiki/HOWTO:SeqIO which you should definitely look over to better understand what's happening with SeqIO object. Good luck! Tom On Aug 20, 2009, at 1:31 PM, Mgavi Brathwaite wrote: > Hello, > > I have previously converted multiple genbank files to fasta. For > some reason > I am having trouble with this simple script. > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > > open (my $inFile, "C:/Documents and Settings/mydir/Desktop/ > TARGETING.gb"); > open (my $outfile, ">C:/Documents and Settings/mydir/Desktop/ > TARGET.fa"); > my $in = Bio::SeqIO->new('-file' => "$inFile" , > '-format' => 'GenBank'); > my $out = Bio::SeqIO->new('-file' => "$outfile" ,'-format' => > 'Fasta'); > print $out $_ while <$in>; > > I keep getting the error: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Could not open GLOB(0x36a214): No such file or directory > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Root::IO::_initialize_io C:/Perl/site/lib/Bio/Root/IO.pm: > 310 > STACK: Bio::SeqIO::_initialize C:/Perl/site/lib/Bio/SeqIO.pm:454 > STACK: Bio::SeqIO::genbank::_initialize C:/Perl/site/lib/Bio\SeqIO\ > genbank.pm:202 > STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:351 > STACK: C:/Perl/site/lib/Bio/SeqIO.pm:377 > ----------------------------------------------------------- > > I am probably missing something simple, but would appreciate any help. > > M > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 11:34:49 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 13:34:49 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotide database Message-ID: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Hello all I would like to download the wgs sequences of the unfinished genomes from ncbi. (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi here's an example accession NZ_ACVD00000000 and here's the link to the accession at genbank http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 This record contains the accessions that belong to this record in the following line in the genbank output WGS NZ_ACVD01000001-NZ_ACVD01000139 The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession numbers that are are specified by this range. here's a link http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] The bioperl related question is... Since these are unassembled genomes, there are several contigs for each one, and they all available in this record. Is it possible to download a range without trying to recreate each accession number? on the other hand, it is possible to download each individually , this would mean making the following NZ_ACVD01000001 NZ_ACVD01000002 NZ_ACVD01000003 . . . NZ_ACVD01000139 from NZ_ACVD01000001-NZ_ACVD01000139 I can recreate these numbers and download each one separately. However, sometimes I get a timeout exception and the whole thing stops. the code ( copied shamelessly from the bioperl website, works great to get single accessions) my $id = "NZ_ACVD00000000"; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => 'fullcontig.gb'); I did try and catch the exceptions from the get_Response..but its not working as expected... maybe someone can point out what I'm doing wrong here. For some reason, the code never seems to go any print statement in the catch construct... $ele = "somecontig id"; try { print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; $factory->get_Response(-file => "$genbank_file"); } catch Bio::Root::Exception with { my $err = shift; if (! defined $err) { print "MAY HAVE DOWNLOADED $ele..\n"; } else { print "PROBABLE TIMEOUT ERROR\n"; print "$err\n"; } }; Or is it possible to somehow increase the timeout time for the get_Response method? thanks in advance! regards Rohit From bernd.jagla at gmail.com Fri Aug 21 09:30:27 2009 From: bernd.jagla at gmail.com (Bernd Jagla) Date: Fri, 21 Aug 2009 11:30:27 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi, I have installed io_lib-1.9.0. This produces libread.a. I am working on a Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error message. I don't really know how to test that it is working. I am trying to install Bio-SCF-1.01. It seems that the test.scf file cannot be read. Is there another way using some other tools to see if that is working? (Sorry for misrepresenting samtools. I was actually trying to install Bio-Graphics, which was asking for Bio::SCF). Thanks, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln Stein Sent: Thursday, August 20, 2009 6:07 PM To: scott at scottcain.net Cc: bioperl-l at lists.open-bio.org; Bernd Jagla Subject: Re: [Bioperl-l] SCF installation It is all a bit confusing. On the download page for Staden, there is a release 1.12, but the home page hasn't been updated and still reads 1.11. If you download and install Staden 1.12, you'll get a library named libstaden-read rather than libread; Bio::SCF hasn't been updated for the name change, and so you will have to open up the Makefile.PL and change "-lread" to "-lstaden-read" in order for it to compile. This being said, your log indicates that Bio::SCF compiled and linked just fine, but the test failed, so it may be more of a problem than just getting the staden library installed. Lincoln On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain wrote: > Hi Bernd, > > Bio::SCF isn't technically part of BioPerl, but I have installed it before > so I'll take a shot: do you have the Staden io-lib installed? It is a > prereq for Bio::SCF. If you did install it, is it in a normal library path, > and did you run ldconfig (if appropriate for your system) after installing > it? > > io-lib can be obtained here: > > http://staden.sourceforge.net/ > > If you do have all of those things in place, what version of io-lib are you > using? I wonder if there is an incompatibility between Bio::SCF and your > version. The INSTALL doc for Bio::SCF indicates that you should have > version 0.9, but io-lib is now at 1.11.5. That jump to a whole number may > have broken an api call that Bio::SCF depends on. > > Scott > > > On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: > > Hi, >> >> >> >> I am trying to install SCF (a prerequisite to samtools). >> >> I installed libread and the compilation seems to be working, only test is >> failing: >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >> >> Checking if your kit is complete... >> >> Looks good >> >> Writing Makefile for Bio::SCF >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make >> >> cp SCF.pm blib/lib/Bio/SCF.pm >> >> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >> >> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp -typemap >> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >> SCF.xsc >> SCF.c >> >> Please specify prototyping behavior for SCF.xs (see perlxs manual) >> >> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN SCF.c >> >> Running Mkbootstrap for Bio::SCF () >> >> chmod 644 SCF.bs >> >> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >> >> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/SCF.bundle \ >> >> -lread -lz \ >> >> >> >> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >> >> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >> >> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >> >> Manifying blib/man3/Bio::SCF.3pm >> >> >> >> >> >> zoppel:Bio-SCF-1.01 bernd$ make test >> >> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> >> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >> >> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >> >> Failed 18/18 subtests >> >> >> >> Test Summary Report >> >> ------------------- >> >> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >> >> Non-zero exit status: 2 >> >> Parse errors: Bad plan. You planned 18 tests but ran 0. >> >> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr >> 0.01 >> csys = 0.11 CPU) >> >> Result: FAIL >> >> Failed 1/1 test programs. 0/0 subtests failed. >> >> make: *** [test_dynamic] Error 2 >> >> >> >> >> >> >> >> >> >> Any idea what might be going wrong? >> >> >> >> Please not that in the directory there are some file empty: >> >> >> >> ls -ltr >> >> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >> >> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >> >> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >> >> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >> >> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >> >> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >> >> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >> >> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >> >> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >> >> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >> >> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >> >> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >> >> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >> >> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >> >> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >> >> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >> >> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >> >> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >> >> >> >> >> >> Thanks, >> >> >> >> Bernd >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Fri Aug 21 13:05:25 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 21 Aug 2009 09:05:25 -0400 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina> <6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Fri Aug 21 12:50:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 08:50:08 -0400 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> Message-ID: <71B4268E5B524F719D24088483568870@NewLife> Hi Rohit- Re: timeout, you could try $factory->ua->timeout($number_greater_than_180_sec) before issuing the request. cheers MAJ ----- Original Message ----- From: "Rohit Ghai" To: Sent: Friday, August 21, 2009 7:34 AM Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase > Hello all > > I would like to download the wgs sequences of the unfinished genomes from > ncbi. > (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi > > here's an example accession > > NZ_ACVD00000000 > > and here's the link to the accession at genbank > > http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 > > This record contains the accessions that belong to this record in the > following line in the genbank output > > WGS NZ_ACVD01000001-NZ_ACVD01000139 > > The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession > numbers that are > > are specified by this range. > > here's a link > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] > > > The bioperl related question is... > > Since these are unassembled genomes, there are several contigs for each one, > and they all available in this record. > > Is it possible to download a range without trying to recreate each accession > number? > > on the other hand, it is possible to download each individually , this would > mean making the following > > NZ_ACVD01000001 > NZ_ACVD01000002 > NZ_ACVD01000003 > . > . > . > NZ_ACVD01000139 > > from NZ_ACVD01000001-NZ_ACVD01000139 > > > I can recreate these numbers and download each one separately. However, > sometimes I get a timeout exception > and the whole thing stops. > > the code ( copied shamelessly from the bioperl website, works great to get > single accessions) > > my $id = "NZ_ACVD00000000"; > my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => > 'nucleotide', > -id => > $id, > -rettype > => 'gbwithparts'); > > $factory->get_Response(-file => 'fullcontig.gb'); > > > I did try and catch the exceptions from the get_Response..but its not > working as expected... maybe someone can point out what I'm doing wrong > here. For some reason, the code never seems to go any print statement in the > catch construct... > > $ele = "somecontig id"; > > try { > print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; > $factory->get_Response(-file => "$genbank_file"); > > } catch Bio::Root::Exception with { > my $err = shift; > if (! defined $err) { > print "MAY HAVE DOWNLOADED $ele..\n"; > } else { > print "PROBABLE TIMEOUT ERROR\n"; > print "$err\n"; > } > }; > > > Or is it possible to somehow increase the timeout time for the get_Response > method? > > thanks in advance! > > > regards > > Rohit > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at pasteur.fr Fri Aug 21 13:30:38 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Fri, 21 Aug 2009 15:30:38 +0200 Subject: [Bioperl-l] SCF installation In-Reply-To: References: <012EFB70792A4AC2A9ED710FEA272C67@zillumina><6dce9a0b0908200907j7c182326ma529f68458da6f1c@mail.gmail.com> Message-ID: <0D219C72BC5F432BA5CDBBCFCE94AA02@zillumina> Thanks, I was confused by the error message of Bio::Graphics. Now I tried make, make test and was able to install... Thanks, Let's forget about the rest then since I don't believe I will need that... Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Scott Cain Sent: Friday, August 21, 2009 3:05 PM To: Bernd Jagla Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] SCF installation Hi Bernd, Just so you know, you don't need Bio::SCF for Bio::Graphics either, unless you want to display ABI trace glyphs. It is a suggested ("recommends" in Module::Build parlance) module. Scott On Aug 21, 2009, at 5:30 AM, Bernd Jagla wrote: > Hi, > > I have installed io_lib-1.9.0. This produces libread.a. I am working > on a > Mac OSX 10.5.7. I just recompiled io-lib and didn't see any error > message. I > don't really know how to test that it is working. > > I am trying to install Bio-SCF-1.01. > > It seems that the test.scf file cannot be read. Is there another way > using > some other tools to see if that is working? > > (Sorry for misrepresenting samtools. I was actually trying to install > Bio-Graphics, which was asking for Bio::SCF). > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Lincoln > Stein > Sent: Thursday, August 20, 2009 6:07 PM > To: scott at scottcain.net > Cc: bioperl-l at lists.open-bio.org; Bernd Jagla > Subject: Re: [Bioperl-l] SCF installation > > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If > you download and install Staden 1.12, you'll get a library named > libstaden-read rather than libread; Bio::SCF hasn't been updated for > the > name change, and so you will have to open up the Makefile.PL and > change > "-lread" to "-lstaden-read" in order for it to compile. > > This being said, your log indicates that Bio::SCF compiled and > linked just > fine, but the test failed, so it may be more of a problem than just > getting > the staden library installed. > > Lincoln > > On Thu, Aug 20, 2009 at 10:30 AM, Scott Cain > wrote: > >> Hi Bernd, >> >> Bio::SCF isn't technically part of BioPerl, but I have installed it >> before >> so I'll take a shot: do you have the Staden io-lib installed? It >> is a >> prereq for Bio::SCF. If you did install it, is it in a normal >> library > path, >> and did you run ldconfig (if appropriate for your system) after >> installing >> it? >> >> io-lib can be obtained here: >> >> http://staden.sourceforge.net/ >> >> If you do have all of those things in place, what version of io-lib >> are > you >> using? I wonder if there is an incompatibility between Bio::SCF >> and your >> version. The INSTALL doc for Bio::SCF indicates that you should have >> version 0.9, but io-lib is now at 1.11.5. That jump to a whole >> number may >> have broken an api call that Bio::SCF depends on. >> >> Scott >> >> >> On Aug 20, 2009, at 4:46 AM, Bernd Jagla wrote: >> >> Hi, >>> >>> >>> >>> I am trying to install SCF (a prerequisite to samtools). >>> >>> I installed libread and the compilation seems to be working, only >>> test is >>> failing: >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ perl Makefile.PL >>> >>> Checking if your kit is complete... >>> >>> Looks good >>> >>> Writing Makefile for Bio::SCF >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make >>> >>> cp SCF.pm blib/lib/Bio/SCF.pm >>> >>> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm >>> >>> /opt/local/bin/perl /opt/local/lib/perl5/5.8.9/ExtUtils/xsubpp - >>> typemap >>> /opt/local/lib/perl5/5.8.9/ExtUtils/typemap SCF.xs > SCF.xsc && mv >>> SCF.xsc >>> SCF.c >>> >>> Please specify prototyping behavior for SCF.xs (see perlxs manual) >>> >>> /usr/bin/gcc-4.0 -c -fno-common -DPERL_DARWIN -I/opt/local/include >>> -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include >>> -I/opt/local/include -O3 -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" >>> "-I/opt/local/lib/perl5/5.8.9/darwin-2level/CORE" -DLITTLE_ENDIAN >>> SCF.c >>> >>> Running Mkbootstrap for Bio::SCF () >>> >>> chmod 644 SCF.bs >>> >>> rm -f blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> LD_RUN_PATH="/opt/local/lib" env MACOSX_DEPLOYMENT_TARGET=10.3 >>> /usr/bin/gcc-4.0 -L/opt/local/lib -bundle -undefined dynamic_lookup >>> -L/usr/local/lib SCF.o -o blib/arch/auto/Bio/SCF/ >>> SCF.bundle \ >>> >>> -lread -lz \ >>> >>> >>> >>> chmod 755 blib/arch/auto/Bio/SCF/SCF.bundle >>> >>> cp SCF.bs blib/arch/auto/Bio/SCF/SCF.bs >>> >>> chmod 644 blib/arch/auto/Bio/SCF/SCF.bs >>> >>> Manifying blib/man3/Bio::SCF.3pm >>> >>> >>> >>> >>> >>> zoppel:Bio-SCF-1.01 bernd$ make test >>> >>> PERL_DL_NONLAZY=1 /opt/local/bin/perl "-MExtUtils::Command::MM" "-e" >>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >>> >>> t/scf.t .. get_scf_pointer(...) : failed on read_scf(./test.scf) >>> >>> t/scf.t .. Dubious, test returned 2 (wstat 512, 0x200) >>> >>> Failed 18/18 subtests >>> >>> >>> >>> Test Summary Report >>> >>> ------------------- >>> >>> t/scf.t (Wstat: 512 Tests: 0 Failed: 0) >>> >>> Non-zero exit status: 2 >>> >>> Parse errors: Bad plan. You planned 18 tests but ran 0. >>> >>> Files=1, Tests=0, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 >>> cusr >>> 0.01 >>> csys = 0.11 CPU) >>> >>> Result: FAIL >>> >>> Failed 1/1 test programs. 0/0 subtests failed. >>> >>> make: *** [test_dynamic] Error 2 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Any idea what might be going wrong? >>> >>> >>> >>> Please not that in the directory there are some file empty: >>> >>> >>> >>> ls -ltr >>> >>> -rw-r--r-- 1 bernd staff 167468 23 sep 1999 test.scf >>> >>> -rw-r--r-- 1 bernd staff 1131 31 jan 2006 DISCLAIMER >>> >>> -rw-r--r-- 1 bernd staff 532 17 mai 2006 README >>> >>> -rw-r--r-- 1 bernd staff 525 17 mai 2006 INSTALL >>> >>> -rw-r--r-- 1 bernd staff 396 17 mai 2006 Makefile.PL >>> >>> -rw-r--r-- 1 bernd staff 9308 17 mai 2006 SCF.xs >>> >>> -rw-r--r-- 1 bernd staff 12438 17 mai 2006 SCF.pm >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 t >>> >>> drwxr-xr-x 6 bernd staff 204 17 mai 2006 eg >>> >>> drwxr-xr-x 3 bernd staff 102 17 mai 2006 SCF >>> >>> -rw-r--r-- 1 bernd staff 290 17 mai 2006 META.yml >>> >>> -rw-r--r-- 1 bernd staff 255 17 mai 2006 MANIFEST >>> >>> drwxr-xr-x 4 bernd staff 136 20 ao 10:12 .. >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:13 Makefile.old >>> >>> -rw-r--r-- 1 bernd staff 27915 20 ao 10:16 Makefile >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 pm_to_blib >>> >>> drwxr-xr-x 8 bernd staff 272 20 ao 10:17 blib >>> >>> -rw-r--r-- 1 bernd staff 0 20 ao 10:17 SCF.bs >>> >>> -rw-r--r-- 1 bernd staff 14580 20 ao 10:18 SCF.o >>> >>> -rw-r--r-- 1 bernd staff 15125 20 ao 10:18 SCF.c >>> >>> drwxr-xr-x 21 bernd staff 714 20 ao 10:18 . >>> >>> >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ghai.rohit at gmail.com Fri Aug 21 13:40:02 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Fri, 21 Aug 2009 15:40:02 +0200 Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase In-Reply-To: <71B4268E5B524F719D24088483568870@NewLife> References: <94c73820908210434q64471fbcmecafd8bafde03e6a@mail.gmail.com> <71B4268E5B524F719D24088483568870@NewLife> Message-ID: <94c73820908210640h3b5854fbxe19c259c66cf9ee4@mail.gmail.com> Thanks! I have made the change... no error yet.. so keeping my fingers crossed cheers Rohit On Fri, Aug 21, 2009 at 2:50 PM, Mark A. Jensen wrote: > Hi Rohit- > Re: timeout, you could try > $factory->ua->timeout($number_greater_than_180_sec) > before issuing the request. > cheers MAJ > ----- Original Message ----- From: "Rohit Ghai" > To: > Sent: Friday, August 21, 2009 7:34 AM > Subject: [Bioperl-l] downloading multiple contigs from ncbi > nucleotidedatabase > > > Hello all >> >> I would like to download the wgs sequences of the unfinished genomes from >> ncbi. >> (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi >> >> here's an example accession >> >> NZ_ACVD00000000 >> >> and here's the link to the accession at genbank >> >> http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000 >> >> This record contains the accessions that belong to this record in the >> following line in the genbank output >> >> WGS NZ_ACVD01000001-NZ_ACVD01000139 >> >> The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession >> numbers that are >> >> are specified by this range. >> >> here's a link >> >> >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC] >> >> >> The bioperl related question is... >> >> Since these are unassembled genomes, there are several contigs for each >> one, >> and they all available in this record. >> >> Is it possible to download a range without trying to recreate each >> accession >> number? >> >> on the other hand, it is possible to download each individually , this >> would >> mean making the following >> >> NZ_ACVD01000001 >> NZ_ACVD01000002 >> NZ_ACVD01000003 >> . >> . >> . >> NZ_ACVD01000139 >> >> from NZ_ACVD01000001-NZ_ACVD01000139 >> >> >> I can recreate these numbers and download each one separately. However, >> sometimes I get a timeout exception >> and the whole thing stops. >> >> the code ( copied shamelessly from the bioperl website, works great to get >> single accessions) >> >> my $id = "NZ_ACVD00000000"; >> my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', >> -db => >> 'nucleotide', >> -id => >> $id, >> -rettype >> => 'gbwithparts'); >> >> $factory->get_Response(-file => 'fullcontig.gb'); >> >> >> I did try and catch the exceptions from the get_Response..but its not >> working as expected... maybe someone can point out what I'm doing wrong >> here. For some reason, the code never seems to go any print statement in >> the >> catch construct... >> >> $ele = "somecontig id"; >> >> try { >> print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n"; >> $factory->get_Response(-file => "$genbank_file"); >> >> } catch Bio::Root::Exception with { >> my $err = shift; >> if (! defined $err) { >> print "MAY HAVE DOWNLOADED $ele..\n"; >> } else { >> print "PROBABLE TIMEOUT ERROR\n"; >> print "$err\n"; >> } >> }; >> >> >> Or is it possible to somehow increase the timeout time for the >> get_Response >> method? >> >> thanks in advance! >> >> >> regards >> >> Rohit >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From rmb32 at cornell.edu Fri Aug 21 19:39:31 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 21 Aug 2009 12:39:31 -0700 Subject: [Bioperl-l] added a perltidy profile file Message-ID: <4A8EF7F3.0@cornell.edu> This one is copied from the parrot project. I added it in maintenance/perltidy.conf. Have a look, tweak as you see fit. The idea with perltidy profile files is to use them to enforce coding style rules. So this perltidy profile file would be the place to codify the BioPerl coding standards, such as indentation, use of cuddled elses, etc. So here is one, let's customize it for our needs. The way I usually run perltidy is with -b to modify a file in-place, and with the '-pro=' option to specify a profile file. Example: perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Aug 21 21:03:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 16:03:07 -0500 Subject: [Bioperl-l] bioperl capability In-Reply-To: <25037707.post@talk.nabble.com> References: <470b4b060908141202v1406814cn832dfcd17488c5ee@mail.gmail.com> <921DE62B-9436-46DB-97DC-E10BF4380FD8@illinois.edu> <25037707.post@talk.nabble.com> Message-ID: On Aug 18, 2009, at 11:39 PM, deequan wrote: > > Howdy there, > > Yes, quite right. I apologize for the double posting. > Moreover, I > appreciate your assistance in trying to sort out what can and cannot > be done > with bioperl. To address the problem previously stated, I put > together a > remarkably misbehaving script that has the following parts: > > #Some parsing: > $q_start = $hsp->query->start; > $q_end = $hsp->query->end; > $h_start = $hsp->hit->start; > $h_end = $hsp->hit->end; > $length = $hsp->query->seqlength(); > $id = $hit->accession; > > print OUT "$id\t"; > my $seq; > if($h_start<$h_end){ > > #the bit per your recommendation > my $begin = $h_start-$q_start+1; > my $cease = ($length - $q_end) + $h_end; > my $strand = 1; > my $factory = Bio::DB::GenBank->new(-format=> 'genbank', > -seq_start =>$begin, > -seq_stop =>$cease, > -strand => $strand, #1 = plus, 2 = minus > ); > $seq = $factory->get_Seq_by_acc($id); > }else{#else assume backward, code not shown} > [ > #and some stuff to retrieve the sequence > > my $len = $seq->length(); > my $string = $seq->subseq(1, $len); > print OUT "length = $len\t"; > print OUT "seq = $string\n"; ] Not sure what you are doing with the above sequence. The abve > In your previous reply, you said the code accessing the seq object > created > by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO > for > basic IO purposes. # create an output seq stream somewhere my $out = Bio::SeqIO->new(-file => '>sequences.gb', -format => 'genbank'); .... # take seq object ($seq), write to the stream $out->write_seq($seq); > Not seeing exactly how to go about that, I tried some > other functions in combination that seemed as though they should work > (length() and subseq()). Unfortunately, the program does not even > run to > that point, as the script throws an exception: > > ------------- EXCEPTION ------------- > MSG: acc CP000948 does not exist > STACK Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18 > 2 > STACK toplevel test.pl:36 > ------------------------------------- > > > Oddly, the record corresponding to this accession number can be > found here: > http://www.ncbi.nlm.nih.gov/nuccore/169887498 That's probably something to do with NCBI unfortunately; I'll have to look into it. The best alternative is if you have BLAST reports that include the GI (or UID). That's the most reliable number (using that in coordination with get_Seq_by_id), but it's not on by default, you have to indicate it's inclusion. More recent versions of Bio::SearchIO::blast parse out the GI from the descriptor if it's present. > Perhaps you'd be willing to offer another hint. Thank you for your > assistance thus far. And on behalf of all posters, thank you for > sharing > your knowledge. 'Preciate. > > David Q. No problem. chris From dan.bolser at gmail.com Fri Aug 21 21:55:37 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 21 Aug 2009 22:55:37 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <4A8EF7F3.0@cornell.edu> References: <4A8EF7F3.0@cornell.edu> Message-ID: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Cheers Rob, Whatever objectons may arise from style x or style y, I think it's a great idea to at least have one style or another recognized as being 'standard'. I know TMTOWTDI, but on a project like this, with so many contributors and users, it's essential to at least have a recommendation. I'll try to use this on any contribs. As you pointed out [1], its probably best to provide two patches for any change involving a formating clean up: one to change the fomat to the standard and one to commit the actual code changes. All the best, Dan. [1] irc://irc.freenode.net/#bioperl 2009/8/21 Robert Buels : > This one is copied from the parrot project. ?I added it in > maintenance/perltidy.conf. > Have a look, tweak as you see fit. > > The idea with perltidy profile files is to use them to enforce coding style > rules. ?So this perltidy profile file would be the place to codify the > BioPerl coding standards, such as indentation, use of cuddled elses, etc. > > So here is one, let's customize it for our needs. ?The way I usually run > perltidy is with -b to modify a file in-place, and with the '-pro=' option > to specify a profile file. > > Example: > ? perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY ?14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Sat Aug 22 03:12:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 21 Aug 2009 23:12:55 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <86486D3736614E6A81AF9521B5BB796A@NewLife> Thanks to all (six, seven including Rob and his perltidy) who responded to this thread. (Lurkers, you are not volunteering by responding, honest.) I'm preparing a wiki page (of course) with the major points, some further comments, and an action plan for your consideration. Watch this space. cheers, MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "BioPerl List" Cc: "Chris Fields" Sent: Friday, August 14, 2009 10:32 PM Subject: [Bioperl-l] on BP documentation > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >>I guess that from my perspective, after doing this stuff for >>about 10 years, I personally would prefer to see a "summer of >>documentation" for the bio* languages (or at least bioperl, as that is >>the only one I ever look at). From my own experiences, and from those >>of many colleagues, the documentation for bioperl has gone from >>mediocre to quite poor in the last few years. I largely think the >>wikification of the docs are to blame for this. Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. There is >>far too much inside baseball going on in the documentation scheme. > >>So I worry more about the constant adding of features at the expense >>of documenting what is already there. This is just my 2 cents, and it >>is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I must > agree > that BP docs are rather a rat's nest and of varying quality, but taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal about the > wikification, but I wager that the power of the wiki could be leveraged > to greater advantage (right, Dan?). > > I think that what we all as developers love is to code, and detest is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it all hang > out', > as they said in my mom's era. > > cheers > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 22 04:11:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 21 Aug 2009 23:11:42 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <86486D3736614E6A81AF9521B5BB796A@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <86486D3736614E6A81AF9521B5BB796A@NewLife> Message-ID: <594EBBA3-5043-4DDF-9157-65195747266D@illinois.edu> Mark, One suggestion that i agree with: we need to add API-specific module documentation to the site somehow (not just links to CPAN/PDOC). There are a few ways to do so; a quick way may be to install something like the Mediawiki SecureHTML extension and create a protected template (this would be for pdoc, cpan, or both). Another one is to write up a pod2wiki converter and create API- specific pages, then have a bot automate the pages. A POD extension also exists, but we would still need to embed code. I much prefer the extensions than anything else. chris On Aug 21, 2009, at 10:12 PM, Mark A. Jensen wrote: > Thanks to all (six, seven including Rob and his perltidy) who > responded to this thread. (Lurkers, you are not volunteering > by responding, honest.) I'm preparing a wiki page (of course) > with the major points, some further comments, and an action > plan for your consideration. Watch this space. > cheers, > MAJ > ----- Original Message ----- From: "Mark A. Jensen" > > To: "BioPerl List" > Cc: "Chris Fields" > Sent: Friday, August 14, 2009 10:32 PM > Subject: [Bioperl-l] on BP documentation > > >> Hi All -- >> >> Off-list, an old colleague of mine had this insightful, if damning, >> comment: >> >>> I guess that from my perspective, after doing this stuff for >>> about 10 years, I personally would prefer to see a "summer of >>> documentation" for the bio* languages (or at least bioperl, as >>> that is >>> the only one I ever look at). From my own experiences, and from >>> those >>> of many colleagues, the documentation for bioperl has gone from >>> mediocre to quite poor in the last few years. I largely think the >>> wikification of the docs are to blame for this. Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. There is >>> far too much inside baseball going on in the documentation scheme. >> >>> So I worry more about the constant adding of features at the expense >>> of documenting what is already there. This is just my 2 cents, >>> and it >>> is disappointing to see a downward trend for bioperl in this regard. >> >> I would be really interested in all responses from the list users. >> I must agree >> that BP docs are rather a rat's nest and of varying quality, but >> taken in >> toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount >> of useful and sophisticated information available. I think there are >> approaches we can take to reorganize and standardize the accession >> of it to make it more useful and inviting. I disagree with my pal >> about the >> wikification, but I wager that the power of the wiki could be >> leveraged >> to greater advantage (right, Dan?). >> >> I think that what we all as developers love is to code, and detest >> is to >> document. Since BP is all-volunteer, and volunteers tend to do what >> they like -- the beauty of open source, btw -- documentation reorg >> and cleanup probably must devolve to the Core. I am willing to lead >> such an effort, which will take some time, and more time the fewer >> volunteers there are. First let's hear some thoughts, and 'let it >> all hang out', >> as they said in my mom's era. >> >> cheers >> Mark >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sat Aug 22 14:55:06 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sat, 22 Aug 2009 16:55:06 +0200 Subject: [Bioperl-l] Getting genomic coordinates for a list of SNPs Message-ID: <2ac05d0f0908220755y59b029f2u82eede5b29836a1d@mail.gmail.com> Dear list, I'm searching for a script like this http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates to get the genomic position of a SNP, not a Gene. Does it exist? Thanks a lot Emanuele From cjfields at illinois.edu Sat Aug 22 20:17:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 22 Aug 2009 15:17:46 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> Message-ID: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Anand, You should always post emails to the bioperl-l mailing list, never to individual developers (you'll get an answer much faster). Keep responses on the list as well. Though I use bioperl-db some, I'm probably not the best person to ask. Does anyone know what's going on with this? Does this have to do with the Species/Taxon refactoring? chris Begin forwarded message: > From: "Anand C. Patel" > Date: August 22, 2009 2:57:42 PM CDT > To: cjfields at illinois.edu > Subject: problem with bioperl (where's the Mus?) > > Dr. Fields, > > I'm struggling with what seems to be a strange quirk in Bioperl +/- > Bioperl-db/BioSQL. > > I've successfully loaded in genbank sequences into a biosql database. > > When I try to write a genbank sequence back out, a curious thing > happens -- the Genus is missing from the SOURCE and ORGANISM areas. > > Despite reporting: > primary tag: source > tag: chromosome > value: 3 > > tag: db_xref > value: taxon:10090 > > tag: map > value: 3 74.5 cM > > tag: mol_type > value: mRNA > > tag: organism > value: Mus musculus > The sequence when printed out via SeqIO looks like this: > LOCUS NM_017474 2935 bp dna linear ROD > 13-AUG-2009 > DEFINITION Mus musculus chloride channel calcium activated 3 > (Clca3), mRNA. > ACCESSION NM_017474 XM_978159 > VERSION NM_017474.2 GI:255918210 > KEYWORDS . > SOURCE musculus > ORGANISM musculus > Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; > Bilateria; > Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; > Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; > Tetrapoda; > Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; > Glires; > Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. > Confession -- I have a final project due Monday wherein I boldly > elected to interface Bioperl, MySQL, Perl, and CGI. > (I'm an MD getting my MS in Bioinformatics.) > After many misadventures, I'm getting to the point where I could > actually complete the objectives, but this is bug is rather > problematic. > Thanks, > Anand > Anand C. Patel, MD > Assistant Professor of Pediatrics > Division of Allergy/Pulmonary Medicine > Department of Pediatrics > Washington University School of Medicine > 660 South Euclid Ave, Campus Box 8052 > St. Louis, MO 63110 > acpatel at wustl.edu > acpatel at gmail.com > acpatel at jhu.edu > From hlapp at gmx.net Sat Aug 22 21:36:42 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:36:42 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> Message-ID: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> That's a pretty strange bug. Anand, which version of BioPerl and Bioperl-db are you running? Note that the genus *is* actually there in the lineage (and hence does get retrieved from the database). Apparently the Species object fails to pull it out correctly, though? Anand - I suspect there have been some warnings printed to the terminal - can you post these, and otherwise confirm that there haven't been any? -hilmar On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > Anand, > > You should always post emails to the bioperl-l mailing list, never > to individual developers (you'll get an answer much faster). Keep > responses on the list as well. > > Though I use bioperl-db some, I'm probably not the best person to > ask. Does anyone know what's going on with this? Does this have to > do with the Species/Taxon refactoring? > > chris > > Begin forwarded message: > >> From: "Anand C. Patel" >> Date: August 22, 2009 2:57:42 PM CDT >> To: cjfields at illinois.edu >> Subject: problem with bioperl (where's the Mus?) >> >> Dr. Fields, >> >> I'm struggling with what seems to be a strange quirk in Bioperl +/- >> Bioperl-db/BioSQL. >> >> I've successfully loaded in genbank sequences into a biosql database. >> >> When I try to write a genbank sequence back out, a curious thing >> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >> >> Despite reporting: >> primary tag: source >> tag: chromosome >> value: 3 >> >> tag: db_xref >> value: taxon:10090 >> >> tag: map >> value: 3 74.5 cM >> >> tag: mol_type >> value: mRNA >> >> tag: organism >> value: Mus musculus >> The sequence when printed out via SeqIO looks like this: >> LOCUS NM_017474 2935 bp dna linear ROD >> 13-AUG-2009 >> DEFINITION Mus musculus chloride channel calcium activated 3 >> (Clca3), mRNA. >> ACCESSION NM_017474 XM_978159 >> VERSION NM_017474.2 GI:255918210 >> KEYWORDS . >> SOURCE musculus >> ORGANISM musculus >> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >> Bilateria; >> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >> Tetrapoda; >> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >> Glires; >> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >> Confession -- I have a final project due Monday wherein I boldly >> elected to interface Bioperl, MySQL, Perl, and CGI. >> (I'm an MD getting my MS in Bioinformatics.) >> After many misadventures, I'm getting to the point where I could >> actually complete the objectives, but this is bug is rather >> problematic. >> Thanks, >> Anand >> Anand C. Patel, MD >> Assistant Professor of Pediatrics >> Division of Allergy/Pulmonary Medicine >> Department of Pediatrics >> Washington University School of Medicine >> 660 South Euclid Ave, Campus Box 8052 >> St. Louis, MO 63110 >> acpatel at wustl.edu >> acpatel at gmail.com >> acpatel at jhu.edu >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 21:42:32 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 17:42:32 -0400 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: Consistent coding style is in principle a good thing. It's also worth to keep in mind one of the old BioPerl principles - don't change working code purely to change style. In my interpretation of the rule, however, this has always applied to code writing style, and not code formatting style. I'm assuming the goal here is only to make the formatting consistent. -hilmar On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > Cheers Rob, > > Whatever objectons may arise from style x or style y, I think it's a > great idea to at least have one style or another recognized as being > 'standard'. I know TMTOWTDI, but on a project like this, with so many > contributors and users, it's essential to at least have a > recommendation. I'll try to use this on any contribs. > > As you pointed out [1], its probably best to provide two patches for > any change involving a formating clean up: one to change the fomat to > the standard and one to commit the actual code changes. > > > All the best, > Dan. > > [1] irc://irc.freenode.net/#bioperl > > > 2009/8/21 Robert Buels : >> This one is copied from the parrot project. I added it in >> maintenance/perltidy.conf. >> Have a look, tweak as you see fit. >> >> The idea with perltidy profile files is to use them to enforce >> coding style >> rules. So this perltidy profile file would be the place to codify >> the >> BioPerl coding standards, such as indentation, use of cuddled >> elses, etc. >> >> So here is one, let's customize it for our needs. The way I >> usually run >> perltidy is with -b to modify a file in-place, and with the '-pro=' >> option >> to specify a profile file. >> >> Example: >> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 22 23:21:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 22 Aug 2009 19:21:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > [...] > I think I know what's broken. Using load_seqdatabases.pl, I'd put a > set of sequences from genbank into a biosql db in mysql. > > I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl > script from biosql. Did you load the NCBI taxonomy first, or afterwards? > > When I searched for house (as in house mouse), I found that the name > of the type of taxon class was "genbank common name". > > When I searched for musculus, it does appear as a type of > "scientific name". It is the 'scientific name' class names that Bioperl-db will onto the lineage array. > [...] > I'm not just getting warnings. I'm getting errors. Tons of them. > It's a wonder it's working at all. I'm not sure what you're referring to, but what you pasted into your email were neither errors nor warnings but a debugging log (and what it prints looks like it's working fine). You triggered that by setting -verbose to a value greater than 0. If you don't want debugging output, then you can just leave off that argument (no debugging output is the default). > > I started with the getentry.cgi script in the cgi-bin folder, and > stripped most of it away. I see - which reminds me that I need to look at that script; I'm afraid it hasn't been updated for a long time (that doesn't mean though that it can't work - the core API has been stable for years). > > Code: > #!/usr/bin/perl > > [...] > if( $@ || !defined $seq) { > print "Got fetch exception of...\n
$@\n
"; > exit(0); > } Wouldn't you want to put that right after the eval() clause? -hilmar > > >> >> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >> >>> Anand, >>> >>> You should always post emails to the bioperl-l mailing list, never >>> to individual developers (you'll get an answer much faster). Keep >>> responses on the list as well. >>> >>> Though I use bioperl-db some, I'm probably not the best person to >>> ask. Does anyone know what's going on with this? Does this have >>> to do with the Species/Taxon refactoring? >>> >>> chris >>> >>> Begin forwarded message: >>> >>>> From: "Anand C. Patel" >>>> Date: August 22, 2009 2:57:42 PM CDT >>>> To: cjfields at illinois.edu >>>> Subject: problem with bioperl (where's the Mus?) >>>> >>>> Dr. Fields, >>>> >>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>> +/- Bioperl-db/BioSQL. >>>> >>>> I've successfully loaded in genbank sequences into a biosql >>>> database. >>>> >>>> When I try to write a genbank sequence back out, a curious thing >>>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>>> >>>> Despite reporting: >>>> primary tag: source >>>> tag: chromosome >>>> value: 3 >>>> >>>> tag: db_xref >>>> value: taxon:10090 >>>> >>>> tag: map >>>> value: 3 74.5 cM >>>> >>>> tag: mol_type >>>> value: mRNA >>>> >>>> tag: organism >>>> value: Mus musculus >>>> The sequence when printed out via SeqIO looks like this: >>>> LOCUS NM_017474 2935 bp dna linear >>>> ROD 13-AUG-2009 >>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>> (Clca3), mRNA. >>>> ACCESSION NM_017474 XM_978159 >>>> VERSION NM_017474.2 GI:255918210 >>>> KEYWORDS . >>>> SOURCE musculus >>>> ORGANISM musculus >>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>> Bilateria; >>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>> Tetrapoda; >>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>> Glires; >>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>> Confession -- I have a final project due Monday wherein I boldly >>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>> (I'm an MD getting my MS in Bioinformatics.) >>>> After many misadventures, I'm getting to the point where I could >>>> actually complete the objectives, but this is bug is rather >>>> problematic. >>>> Thanks, >>>> Anand >>>> Anand C. Patel, MD >>>> Assistant Professor of Pediatrics >>>> Division of Allergy/Pulmonary Medicine >>>> Department of Pediatrics >>>> Washington University School of Medicine >>>> 660 South Euclid Ave, Campus Box 8052 >>>> St. Louis, MO 63110 >>>> acpatel at wustl.edu >>>> acpatel at gmail.com >>>> acpatel at jhu.edu >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 14:38:48 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:38:48 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: On Aug 22, 2009, at 9:13 PM, Anand C. Patel wrote: > Turns out that using the default namespace bioperl doesn't change > anything. No it shouldn't, so long as you are consistent about it. (And if you're not, all that should happen is that you don't find your sequences any more.) > > Common name -- still "genbank common name" in name_class in the > taxon_name table for "house mouse", which I think the module is > looking for as "common name". If you are loading the NCBI taxonomy first, this is coming from NCBI, not one of the scripts or BioPerl, and hence we have no control over it. Are you saying that there is no designated name of class 'common name' for Mus musculus in the NCBI taxonomy dump? Also, the common name being present or not should have no bearing on the lineage array, where the actual problem is, so I don't understand right now how this would be connected to the problem you are seeing. > > It's not behaving differently despite reloading the sequences. > > I've created a horrible munge that fixes it for cosmetic purposes: > my $species = $seq->species; > my $justspecies = $species->scientific_name(); > my $binspecies = $species->binomial(); > > my $gbstring2 = $gbstring; > > $gbstring2 =~ s/$binspecies/$justspecies/g; > $gbstring2 =~ s/$justspecies/$binspecies/g; I don't understand what you are trying to achieve here - it seems like you are making a substitution and then reverting it? Also, $species- >scientific_name() and $species->binomial() should be identical for Mus musculus - are you finding different values being returned? So in essence, I wouldn't expect your above code snippet to have any effect, for both of these reasons. How do you find $gbstring2 to be different from $gbstring at the end of this block of code? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 23 14:42:58 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 23 Aug 2009 10:42:58 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Message-ID: <119BC08A-6D3A-4D03-B0D5-7619EDE682AE@gmx.net> On Aug 22, 2009, at 8:13 PM, Anand C. Patel wrote: > Do I need to load ontology before loading sequences? You don't. Especially if you load genbank sequences as they come. Loading ontologies that are used for sequence annotation is useful as it will get your features (or sequences) linked to fully populated (description, synonyms, relationships, etc) terms rather than skeleton term records created on the fly. However, in GenBank format ontology terms are part of the feature table, and require a post-processing (using, e.g., a SeqProcessor class) step to be identified and turned into Bio::Annotation::OntologyTerm objects. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jorismeys at gmail.com Sun Aug 23 15:08:47 2009 From: jorismeys at gmail.com (joris meys) Date: Sun, 23 Aug 2009 17:08:47 +0200 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree Message-ID: Hi, I'm currently exploring the phylogenetic parts of Bio Perl, but I can't seem to find a quick solution to following problem : Say you have a tree obtained by a certain method. From this tree, you want to have the evolutionary distances between species, defined as the sum of the branch lengths between any 2 species. There is as far as I know no function for doing that. But is there a possibility to get a list of some sort of "shortest paths" from one species to another, allowing to easily calculate that matrix? >From the phylip package, I get following data if I run the neighbor or fitch program. From there I can easily get an algorithm to calculate the distances I need. But I also need to do that for maximum likelihood trees and the like. Is there a way to get this information in Bio Perl? >From to dist node1 sp1 xxxxx node2 sp3 xxxxxx node1 node2 xxxxx node 1 sp2 xxxxx Kind regards Joris From heikki.lehvaslaiho at gmail.com Mon Aug 24 05:59:22 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Mon, 24 Aug 2009 08:59:22 +0300 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: De facto coding style standard for BioPerl has been emacs using cperl mode and bioperl.list file. As long as this configuration does not change the conventions used, I see this as great way in helping to format code from other editors. -Heikki 2009/8/23 Hilmar Lapp : > Consistent coding style is in principle a good thing. > > It's also worth to keep in mind one of the old BioPerl principles - don't > change working code purely to change style. In my interpretation of the > rule, however, this has always applied to code writing style, and not code > formatting style. I'm assuming the goal here is only to make the formatting > consistent. > > ? ? ? ?-hilmar > > On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: > >> Cheers Rob, >> >> Whatever objectons may arise from style x or style y, I think it's a >> great idea to at least have one style or another recognized as being >> 'standard'. I know TMTOWTDI, but on a project like this, with so many >> contributors and users, it's essential to at least have a >> recommendation. I'll try to use this on any contribs. >> >> As you pointed out [1], its probably best to provide two patches for >> any change involving a formating clean up: one to change the fomat to >> the standard and one to commit the actual code changes. >> >> >> All the best, >> Dan. >> >> [1] irc://irc.freenode.net/#bioperl >> >> >> 2009/8/21 Robert Buels : >>> >>> This one is copied from the parrot project. ?I added it in >>> maintenance/perltidy.conf. >>> Have a look, tweak as you see fit. >>> >>> The idea with perltidy profile files is to use them to enforce coding >>> style >>> rules. ?So this perltidy profile file would be the place to codify the >>> BioPerl coding standards, such as indentation, use of cuddled elses, etc. >>> >>> So here is one, let's customize it for our needs. ?The way I usually run >>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>> option >>> to specify a profile file. >>> >>> Example: >>> ?perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>> >>> Rob >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY ?14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp ?-:- ?Durham, NC ?-:- ?hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849 office: +966 2 808 2429 Building #2, Office #4216 Computational Bioscience Research Centre (CBRC) 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia From geoeco at rambler.ru Mon Aug 24 09:20:13 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Mon, 24 Aug 2009 13:20:13 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file Message-ID: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Dear all, I am trying to extract species taxonomy from ORGANISM line. In fact I only need a first line under ORGANISM tag (e.i. genus + species). I though that it would be possible to do with the SeqBuilder object by stating $builder->add_wanted_slot('display_id','species'); the problem is, however, that I've got an empty file as a result. What might be wrong with the script (see below)? Thanks a lot in advance for any ideas, ------------------------------------------- #!/usr/bin/perl use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'raw'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species'); while(my $seq = $seq_in->next_seq()) { $seq_out->write_seq($seq); } exit; ---------------------------------------------------- Anna From maj at fortinbras.us Mon Aug 24 11:30:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 07:30:27 -0400 Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree In-Reply-To: References: Message-ID: Hi Joris, AFAIK, there is only one path between any two nodes in a typical phylogenetic tree, the one passing through the most recent common ancestor of the nodes. The distance() method in Bio::Tree::TreeFunctionsI will give you what I think you want: use Bio::TreeIO; use Bio::Tree::TreeFunctionsI; $t = Bio::TreeIO->new(-file=>'t/data/urease.tre.nexus', -format=>'nexus')->next_tree; $n1 = $t->find_node('Anidulans'); $n2 = $t->find_node('Ncrassa'); $dist = $t->distance(-nodes => [$n1, $n2] ); print $dist; Use the Bio::TreeIO package to read in the tree in your favorite format; it will handle many. cheers, MAJ ----- Original Message ----- From: "joris meys" To: Sent: Sunday, August 23, 2009 11:08 AM Subject: [Bioperl-l] Getting distance matrix from phylogenetic tree > Hi, > > I'm currently exploring the phylogenetic parts of Bio Perl, but I > can't seem to find a quick solution to following problem : > Say you have a tree obtained by a certain method. From this tree, you > want to have the evolutionary distances between species, defined as > the sum of the branch lengths between any 2 species. There is as far > as I know no function for doing that. But is there a possibility to > get a list of some sort of "shortest paths" from one species to > another, allowing to easily calculate that matrix? > >>From the phylip package, I get following data if I run the neighbor or > fitch program. From there I can easily get an algorithm to calculate > the distances I need. But I also need to do that for maximum > likelihood trees and the like. Is there a way to get this information > in Bio Perl? >>From to dist > node1 sp1 xxxxx > node2 sp3 xxxxxx > node1 node2 xxxxx > node 1 sp2 xxxxx > > Kind regards > Joris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.bolser at gmail.com Mon Aug 24 12:26:13 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:26:13 +0100 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <2c8757af0908240526j1cb0a455x53f7f3dccaceda86@mail.gmail.com> 2009/8/24 Heikki Lehvaslaiho : > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. 'bioperl.list' file? I guess you made a typo and you mean bioperl.lisp http://www.bioperl.org/wiki/Emacs_template > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - don't >> change working code purely to change style. In my interpretation of the >> rule, however, this has always applied to code writing style, and not code >> formatting style. I'm assuming the goal here is only to make the formatting >> consistent. I have changed coding style in the past. IIRC this was in the Quality.pm file. I made the changes because two different styles were being used to do (roughly) the same thing at different points in the script. The two styles were being used interchangeably (at random?). As a noob, the use of two different styles was very confusing, because I didn't know if the difference was significant or what the significance of the difference might be. I resolved the issue by writing a set of additional tests and then slowly harmonizing the coding style while confirming that the tests were still running OK. In this case I think it was reasonable to try to have a consistent style at least within the module. Or should I have left the style as it was? Cheers, Dan. From dan.bolser at gmail.com Mon Aug 24 12:50:46 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 13:50:46 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> Message-ID: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> I just ran into the same problem described here. Here is my code to demonstrate what I expected: #!/usr/bin/perl -w use strict; use Bio::SimpleAlign; use Bio::LocatableSeq; use Bio::AlignIO; my $CLUDGE = 0; ## REF tacattaaagacccg ## SEQ1 taca.taaa...... ## SEQ2 .....taaaga.ccg my $aln = Bio::SimpleAlign->new(); $aln->gap_char('.'); my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' ); my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' ); $aln->add_seq( $r ); $aln->add_seq( $s1 ); $aln->add_seq( $s2 ); if($CLUDGE){ foreach(($r, $s1, $s2)){ $_->seq( '.' x ($_->start - 1) . $_->seq ) } } ## Prepare an 'output stream' for the alignment: my $aliWriter = Bio::AlignIO-> new( -fh => \*STDOUT, -format => 'clustalw', ); warn "\nOUTPUT:\n"; $aliWriter->write_aln($aln); I was calling the "fill in the gaps yourself" step a CLUDGE because I had expected the alignment object to take care of this for me. Is there any reason that it couldn't do this 'CLUDGE' automatically? It seems strange that it insists on being passed locatable sequence objects, but then largely ignore the given location. Would it not be possible to have this happen when the sequences are written out from the alignment? I think it should still be possible to index the column number via the (gapless) sequence number... or did I get confused? There are two levels of confusion here (on my part), 1) the concepts behind the objects and 2) the implementation details. Thanks for any hints on how to understand or potentially how to fix these problems. Cheers, Dan. 2009/7/22 Mark A. Jensen : > Hi Paolo, > I think I see what you want to do, however, it doesn't quite work > this way. I'm supposing you want to specify something like > > s1/3-6 attc > s2/7-10 gaag > > and obtain output like > > s1 --attc---- > s2 ------gaag > > But (and this is why LocatableSeqs are "locatable"), the alignment described > by the former data is always going to be > > s1 attc > s2 gaag > > so that I can query the alignment *column* number 1 and obtain > the residue coordinates of the original sequences in that column: > > $loc = $aln->get_seq_by_pos(1)->location_from_column(1); # 3 > > or vice-versa > > $col = $aln->column_from_residue_number( 's1', 3); # 1 > > As far as I know, you have to fill in the gaps yourself; a good > exercise, since you already have all the information you need, in having set > up the start and end coordinates (which are really > the column coordinates in this model). > If this wasn't what you had in mind, I apologize. > cheers, Mark > > > ----- Original Message ----- From: "Paolo Pavan" > To: > Sent: Thursday, July 16, 2009 6:17 AM > Subject: [Bioperl-l] Bio::SimpleAlign constructor? > > >> Hi, >> I have a brief question: I would like to know if there is a method to >> obtain a valid formatted and flush Bio::SimpleAlign object (i.e. >> properly filled with gaps on the right and on the left side of each >> sequence) given a bounch of Bio::LocatableSeq objects in which I have >> specified the -start and -end properties. >> Can anyone help me? Thank you very much, >> >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ghai.rohit at gmail.com Mon Aug 24 12:53:03 2009 From: ghai.rohit at gmail.com (Rohit Ghai) Date: Mon, 24 Aug 2009 14:53:03 +0200 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> hi I think you forgot to add the "seq" in the builder.. thats why the file is empty. Also, the species name, though being parsed, is nowhere in the output. Here's a version using fasta output that you can probably customize further. This also takes the full name of the organism and adds to the description line in the output. use strict; use Bio::SeqIO; use Bio::Seq::SeqBuilder; my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; my $infile = shift or die $usage; my $infileformat = 'Genbank' ; my $outfile = shift or die $usage; my $outfileformat = 'fasta'; my $i = 0; my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); my $builder = $seq_in->sequence_builder(); $builder->want_none(); $builder->add_wanted_slot('display_id','species','seq','description'); while(my $seq = $seq_in->next_seq()) { my $desc = $seq->description(); my $species_string = $seq->species()->binomial('FULL'); $desc = $desc . " [$species_string]"; $seq->description($desc); $seq_out->write_seq($seq); } exit; On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact I only > need a first line under ORGANISM tag (e.i. genus + species). I though that > it would be possible to do with the SeqBuilder object by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Aug 24 12:55:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:55:56 -0500 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Anna, It's stored in the Bio::Species object. I have to say, though, I think you're using a stick of dynamite for a scalpel here; if you only need ORGANISM parse it out directly (it's much faster). Or am I missing something? chris On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 12:56:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 07:56:02 -0500 Subject: [Bioperl-l] added a perltidy profile file In-Reply-To: References: <4A8EF7F3.0@cornell.edu> <2c8757af0908211455m59f5a4a8x679cbe42d90d00ae@mail.gmail.com> Message-ID: <1E5347D2-A60F-49CB-8F3B-C5E06342417E@illinois.edu> Heikki, perltidy has become the most common way to standardize perl coding style (in a non-text-editor-dependent way). A number of projects have started using it as a means for checking and cleaning up modules prior to release. I think Perl Best Practices reinforced that. chris On Aug 24, 2009, at 12:59 AM, Heikki Lehvaslaiho wrote: > De facto coding style standard for BioPerl has been emacs using cperl > mode and bioperl.list file. As long as this configuration does not > change the conventions used, I see this as great way in helping to > format code from other editors. > > > -Heikki > > 2009/8/23 Hilmar Lapp : >> Consistent coding style is in principle a good thing. >> >> It's also worth to keep in mind one of the old BioPerl principles - >> don't >> change working code purely to change style. In my interpretation of >> the >> rule, however, this has always applied to code writing style, and >> not code >> formatting style. I'm assuming the goal here is only to make the >> formatting >> consistent. >> >> -hilmar >> >> On Aug 21, 2009, at 5:55 PM, Dan Bolser wrote: >> >>> Cheers Rob, >>> >>> Whatever objectons may arise from style x or style y, I think it's a >>> great idea to at least have one style or another recognized as being >>> 'standard'. I know TMTOWTDI, but on a project like this, with so >>> many >>> contributors and users, it's essential to at least have a >>> recommendation. I'll try to use this on any contribs. >>> >>> As you pointed out [1], its probably best to provide two patches for >>> any change involving a formating clean up: one to change the fomat >>> to >>> the standard and one to commit the actual code changes. >>> >>> >>> All the best, >>> Dan. >>> >>> [1] irc://irc.freenode.net/#bioperl >>> >>> >>> 2009/8/21 Robert Buels : >>>> >>>> This one is copied from the parrot project. I added it in >>>> maintenance/perltidy.conf. >>>> Have a look, tweak as you see fit. >>>> >>>> The idea with perltidy profile files is to use them to enforce >>>> coding >>>> style >>>> rules. So this perltidy profile file would be the place to >>>> codify the >>>> BioPerl coding standards, such as indentation, use of cuddled >>>> elses, etc. >>>> >>>> So here is one, let's customize it for our needs. The way I >>>> usually run >>>> perltidy is with -b to modify a file in-place, and with the '-pro=' >>>> option >>>> to specify a profile file. >>>> >>>> Example: >>>> perltidy -b -pro=maintenance/perltidy.conf Bio/SimpleAlign.pm >>>> >>>> Rob >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +966 545 595 849 office: +966 2 808 2429 > Building #2, Office #4216 > Computational Bioscience Research Centre (CBRC) > 4700 King Abdullah University of Science and Technology (KAUST) > Thuwal 23955-6900, Kingdom of Saudi Arabia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 13:36:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 08:36:32 -0500 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: Dan, all, Bio::SimpleAlign doesn't align anything for you. It makes no assumptions about the data being added, beyond possibly checking for the seqs to be flush prior to analyses. Here's the reason why: The object doesn't 'know' the seqs map across from one to the other as below: > ... > ## REF tacattaaagacccg > ## SEQ1 taca.taaa...... > ## SEQ2 .....taaaga.ccg > > my $aln = Bio::SimpleAlign->new(); > > $aln->gap_char('.'); > > my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); > my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, - > seq=>'taca.taaa' ); > my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, - > seq=>'taaaga.ccg' ); > > $aln->add_seq( $r ); > $aln->add_seq( $s1 ); > $aln->add_seq( $s2 ); Above, you are making the assumption that SimpleAlign 'knows' where to match the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does NOT indicate that (the LocatableSeq docs, and their usage, should indicate that). Think about HSP alignments in a BLAST report; the start/end/strand coordinates are where the sequence in the alignment maps to the original query or hit sequence. They don't indicate where the hit maps to the query (the alignment itself does that in a column-wise fashion). I'm not sure, maybe it needs to be more explicit in the documentation, but SimpleAlign does not align the sequences for you (and it shouldn't be expected to). There are much better (faster, more accurate) ways to do that. > if($CLUDGE){ > foreach(($r, $s1, $s2)){ > $_->seq( '.' x ($_->start - 1) . $_->seq ) > } > } > > ## Prepare an 'output stream' for the alignment: > my $aliWriter = Bio::AlignIO-> > new( -fh => \*STDOUT, > -format => 'clustalw', > ); > > warn "\nOUTPUT:\n"; > $aliWriter->write_aln($aln); ... > I was calling the "fill in the gaps yourself" step a CLUDGE because I > had expected the alignment object to take care of this for me. Is > there any reason that it couldn't do this 'CLUDGE' automatically? It > seems strange that it insists on being passed locatable sequence > objects, but then largely ignore the given location. > > Would it not be possible to have this happen when the sequences are > written out from the alignment? I think it should still be possible to > index the column number via the (gapless) sequence number... or did I > get confused? There are two levels of confusion here (on my part), 1) > the concepts behind the objects and 2) the implementation details. Mentioned above (no assumptions on how locatableseqs map to one another). WYSIWYG. There is nothing precluding you from writing up code to do that, though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl alignment implementation (there are, believe it or not, pure perl implementations of Smith- Waterman and Needleman-Wunsch. > Thanks for any hints on how to understand or potentially how to fix > these problems. > > Cheers, > Dan. Not that SimpleAlign and LocatableSeqs don't have their share of problems. However, I don't think you can expect this behavior to change with the refactors. chris From hlapp at gmx.net Mon Aug 24 13:44:43 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:44:43 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> Message-ID: On Aug 23, 2009, at 1:25 PM, Anand C. Patel wrote: > The other piece of potentially useful information is below -- output > from > SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; > (taxon_id 138 maps to ncbi_taxon_id 10090) > > taxon_id name name_class > 138 LK3 transgenic mice includes > 138 Mus muscaris misnomer > 138 Mus musculus scientific name > 138 Mus sp. 129SV includes > 138 house mouse genbank common name > 138 mice C57BL/6xCBA/CaJ hybrid misspelling > 138 mouse common name > 138 nude mice includes > 138 transgenic mice includes > > The source from the genbank entry NM_017474 is: > SOURCE Mus musculus (house mouse) > > Which is why I think the issue is that the name_class is "genbank > common name" rather than common name. Note that apparently NCBI has decided that the common name is 'mouse', not 'house mouse'. Why what they report in the genbank record is different from what they decided to be the common name is beyond me. Note also that the common name in parentheses is optional. If it's missing the record is still in valid format. > What does strike me as odd though is that not even "mouse" shows up > -- common_name is empty. Indeed, that's odd. Can you file this as a bug report and assign to the bioperl-db queue? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Mon Aug 24 13:50:17 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 09:50:17 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> Message-ID: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > [...] > Code snippet: > my $species = $seq->species; > print "common name = ",$species->common_name, "\n"; > print "scientific name = ",$species->scientific_name, "\n"; > print "species = ",$species->species, "\n"; > print "genus = ",$species->genus, "\n"; > print "sub_species = ",$species->sub_species, "\n"; > print "binomial = ",$species->binomial, "\n"; > print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; > > Output: > common name = > scientific name = musculus > species = musculus > genus = Mus > sub_species = > binomial = Mus musculus > ncbi_taxid = 10090 This points to a problem in Bio::Species::scientific_name(), given that binomial() is correct. Could you file this as a bug report? > The common name is missing, despite having loaded it from NCBI > taxonomy using the provided script. > It is ONLY present as this "genbank common name". > [...] > I could go through and replace all of the instances of "genbank > common name" with "common name" and see if this fixes it. I think we need to first discuss how we want to treat the 'common name' versus 'genbank common name' classes in BioPerl. So question for everyone: do we need to have both available (in which case we need to add an accessor in Bio::Species), or only 'common name', or should 'genbank common name' override 'common name' if both are present and have different values. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Mon Aug 24 14:18:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Aug 2009 15:18:20 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> Message-ID: <320fb6e00908240718q194afe78j4a05b31aeb33e313@mail.gmail.com> On Mon, Jul 27, 2009 at 2:06 PM, Chris Fields wrote: > > I added this (and the others) to our ticket tracking this. ?Looks like > solexa conversion either way is borked, which is very likely an issue > with conversion. Hi Chris, I've been digging into the current SVN code for BioPerl's FASTQ support - I realised you are doing the Solexa to PHRED mapping twice when parsing "fastq-solexa" files. Using "qual" output (which shows the PHRED scores in plain text) makes it very clear something is wrong: $ cat solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<; That is Solexa scores from 40 (h) down to -5 (;), which should map onto PHRED scores from 40 down to 1 (according to our prior discussions). $ ./bioperl_solexa2qual.pl < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 10 9 8 7 6 6 5 5 5 5 4 4 4 4 For reference, $ python biopython_solexa2qual.py < solexa_faked.fastq >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 9 8 7 6 5 5 4 4 3 3 2 2 1 1 I can "fix" this in fastq.pm by commenting out one of the log mappings, for example see the patch I've just uploaded to Bug 2857: http://bugzilla.open-bio.org/show_bug.cgi?id=2857 That brings me to another problem, consider the following (with the double conversion fixed): $ ./bioperl_solexa2solexa.pl < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJJHGFEDDBB@@>><< If you compare that to the original, you'll notice a loss of detail in the poor quality reads. e.g. Solexa scores 9 (I) and 10 (J) have both been mapped onto 10 (J). I believe this happens because BioPerl is converting the Solexa scores to PHRED scores on loading (which is fine - EMBOSS does this too), but you are also storing them as integers! In order to preserve these details, I think you'll have to hold the converted PHRED scores as floating point numbers (which I think is what EMBOSS does). This has the downside of taking more memory, and may also complicate file output (you may need to round things). Regards, Peter (@Biopython) From acpatel at gmail.com Sat Aug 22 22:44:20 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 17:44:20 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> Message-ID: <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> On Aug 22, 2009, at 4:36 PM, Hilmar Lapp wrote: > That's a pretty strange bug. Anand, which version of BioPerl and > Bioperl-db are you running? BioPerl is: https://launchpad.net/ubuntu/karmic/+source/bioperl/1.6.0-2ubuntu1 (1.6.0 loaded via apt-get into ubuntu karmic alpha 4) BioPerl-db is version 1.006 (1.6.0) loaded via CPAN. BioSQL is 1.0.1 I think I know what's broken. Using load_seqdatabases.pl, I'd put a set of sequences from genbank into a biosql db in mysql. I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl script from biosql. When I searched for house (as in house mouse), I found that the name of the type of taxon class was "genbank common name". When I searched for musculus, it does appear as a type of "scientific name". > Note that the genus *is* actually there in the lineage (and hence > does get retrieved from the database). Apparently the Species object > fails to pull it out correctly, though? > > Anand - I suspect there have been some warnings printed to the > terminal - can you post these, and otherwise confirm that there > haven't been any? > > -hilmar I'm not just getting warnings. I'm getting errors. Tons of them. It's a wonder it's working at all. I started with the getentry.cgi script in the cgi-bin folder, and stripped most of it away. Code: #!/usr/bin/perl use DBI; use CGI::Carp qw( fatalsToBrowser ); use CGI qw/:standard/; use Bio::DB::BioDB; use Bio::Seq::RichSeq; use Bio::SeqIO; use IO::String; my $q = new CGI; # create new CGI object print $q->header; # create the HTTP header my $value = "NM_017474"; my $host = "localhost"; my $dbname = "biosql"; my $driver = "mysql"; my $dbuser = "webuser"; my $dbpass = "wrjFfjjW9y243xvF"; my $biodbname = "genbank"; my $seq; eval { my $db = Bio::DB::BioDB->new(-database => "biosql", -host => $host, -dbname => $dbname, -driver => $driver, -user => $dbuser, -pass => $dbpass, -verbose => 10, ); my $seqadaptor = $db->get_object_adaptor('Bio::SeqI'); $seq = Bio::Seq::RichSeq->new( -accession_number => $value, - namespace => $biodbname ); $seq = $seqadaptor->find_by_unique_key($seq); }; my $seqfh = IO::String->new($gbstring); my $ioseq = Bio::SeqIO->new(-fh => $seqfh, -format => 'genbank'); $ioseq->write_seq($seq); if( $@ || !defined $seq) { print "Got fetch exception of...\n
$@\n
"; exit(0); } print "BioSQL display of ". $seq->display_id ."\n"; print "\n"; print "
\n
".$gbstring."\n
\n
\n"; Errors (some but not all): test1.cgi: attempting to load adaptor class for Bio::SeqI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load adaptor class for BioNamespace test1.cgi: \tattempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor test1.cgi: preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? test1.cgi: BioNamespaceAdaptor: binding UK column 1 to "genbank" (namespace) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqAdaptor test1.cgi: preparing UK select statement: SELECT bioentry.bioentry_id, bioentry.name, bioentry.identifier, bioentry.accession, bioentry.description, bioentry.version, bioentry.division, bioentry.biodatabase_id, bioentry.taxon_id FROM bioentry WHERE biodatabase_id = ? AND accession = ? test1.cgi: SeqAdaptor: binding UK column 1 to "1" (bionamespace) test1.cgi: SeqAdaptor: binding UK column 2 to "NM_017474" (accession_number) test1.cgi: attempting to load adaptor class for Bio::PrimarySeq test1.cgi: \tattempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor test1.cgi: preparing PK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE biodatabase_id = ? test1.cgi: BioNamespaceAdaptor: binding PK column to "1" test1.cgi: attempting to load adaptor class for Bio::Species test1.cgi: \tattempting to load module Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor test1.cgi: preparing PK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND taxon_name.name_class = 'scientific name' AND taxon.taxon_id = ? test1.cgi: SpeciesAdaptor: binding PK column to "138" test1.cgi: prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM taxon node, taxon taxon, taxon_name name WHERE name.taxon_id = node.taxon_id AND taxon.left_value >= node.left_value AND taxon.left_value <= node.right_value AND taxon.taxon_id = ? AND name.name_class = 'scientific name' ORDER BY node.left_value test1.cgi: preparing SELECT COMMON_NAME: SELECT taxon_name.name FROM taxon_name WHERE taxon_name.taxon_id = ? AND taxon_name.name_class = 'common_name' test1.cgi: attempting to load adaptor class for Bio::Tree::Tree test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::Root test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Root::RootI test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::RootAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeAdaptor test1.cgi: attempting to load adaptor class for Bio::Tree::TreeFunctionsI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor test1.cgi: no adaptor found for class Bio::Tree::Tree test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy::list test1.cgi: \tattempting to load module Bio::DB::BioSQL::listAdaptor test1.cgi: attempting to load adaptor class for Bio::DB::Taxonomy test1.cgi: \tattempting to load module Bio::DB::BioSQL::TaxonomyAdaptor test1.cgi: no adaptor found for class Bio::DB::Taxonomy::list test1.cgi: attempting to load adaptor class for Biosequence test1.cgi: \tattempting to load module Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BiosequenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BiosequenceAdaptor test1.cgi: preparing UK select statement: SELECT biosequence.bioentry_id, biosequence.version, biosequence.length, biosequence.alphabet, NULL, NULL, biosequence.bioentry_id FROM biosequence WHERE bioentry_id = ? test1.cgi: BiosequenceAdaptor: binding UK column 1 to "1" (primary_seq) test1.cgi: attempting to load adaptor class for Bio::AnnotationCollectionI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor test1.cgi: attempting to load adaptor class for Bio::Annotation::TypeManager test1.cgi: \tattempting to load module Bio::DB::BioSQL::TypeManagerAdaptor test1.cgi: no adaptor found for class Bio::Annotation::TypeManager test1.cgi: attempting to load adaptor class for Bio::Annotation::Reference test1.cgi: \tattempting to load module Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::ReferenceAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.reference_id, t2.authors, t2.title, t2.location, t2.crc, bioentry_reference.start_pos, bioentry_reference.end_pos, bioentry_reference.rank, t2.dbxref_id FROM bioentry t1, reference t2, bioentry_reference WHERE t1.bioentry_id = bioentry_reference.bioentry_id AND t2.reference_id = bioentry_reference.reference_id AND t1.bioentry_id = ? test1.cgi: ReferenceAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::DBLink test1.cgi: \tattempting to load module Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::DBLinkAdaptor test1.cgi: preparing PK select statement: SELECT dbxref.dbxref_id, dbxref.dbname, dbxref.accession, dbxref.version, NULL FROM dbxref WHERE dbxref_id = ? test1.cgi: DBLinkAdaptor: binding PK column to "1" test1.cgi: DBLinkAdaptor: binding PK column to "2" test1.cgi: DBLinkAdaptor: binding PK column to "3" test1.cgi: DBLinkAdaptor: binding PK column to "4" test1.cgi: DBLinkAdaptor: binding PK column to "5" test1.cgi: DBLinkAdaptor: binding PK column to "6" test1.cgi: DBLinkAdaptor: binding PK column to "7" test1.cgi: DBLinkAdaptor: binding PK column to "8" test1.cgi: DBLinkAdaptor: binding PK column to "9" test1.cgi: DBLinkAdaptor: binding PK column to "10" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, bioentry_dbxref.rank FROM bioentry t1, dbxref t2, bioentry_dbxref WHERE t1.bioentry_id = bioentry_dbxref.bioentry_id AND t2.dbxref_id = bioentry_dbxref.dbxref_id AND t1.bioentry_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: attempting to load adaptor class for Bio::Annotation::SimpleValue test1.cgi: \tattempting to load module Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::Ontology test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::OntologyAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing UK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE name = ? test1.cgi: OntologyAdaptor: binding UK column 1 to "Annotation Tags" (name) test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::SimpleValueAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, bioentry_qualifier_value.value, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::OntologyTerm test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyTermAdaptor test1.cgi: attempting to load adaptor class for Bio::AnnotationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::AnnotationAdaptor test1.cgi: attempting to load adaptor class for Bio::Ontology::TermI test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::TermAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::TermAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::TermAdaptorDriver as driver peer for Bio::DB::BioSQL::TermAdaptor test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, bioentry_qualifier_value.rank, t2.ontology_id FROM bioentry t1, term t2, bioentry_qualifier_value WHERE t1.bioentry_id = bioentry_qualifier_value.bioentry_id AND t2.term_id = bioentry_qualifier_value.term_id AND (t1.bioentry_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::Seq::RichSeq) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: attempting to load adaptor class for Bio::Annotation::Comment test1.cgi: \tattempting to load module Bio::DB::BioSQL::CommentAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::CommentAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::CommentAdaptor test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE t1.bioentry_id = ? test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::Seq::RichSeq: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::SeqFeatureI test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::SeqFeatureAdaptor test1.cgi: preparing query: SELECT t1.seqfeature_id, t1.display_name, t1.rank, t1.bioentry_id, t1.type_term_id, t1.source_term_id FROM seqfeature t1 WHERE t1.bioentry_id = ? ORDER BY t1.rank test1.cgi: Query FIND FEATURE BY SEQ: binding column 1 to "1" test1.cgi: preparing PK select statement: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL, term.ontology_id FROM term WHERE term_id = ? test1.cgi: TermAdaptor: binding PK column to "245" test1.cgi: attempting to load adaptor class for Bio::Ontology::OntologyI test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::OntologyAdaptor test1.cgi: preparing PK select statement: SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE ontology_id = ? test1.cgi: OntologyAdaptor: binding PK column to "32" test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, term_dbxref.rank FROM term t1, dbxref t2, term_dbxref WHERE t1.term_id = term_dbxref.term_id AND t2.dbxref_id = term_dbxref.dbxref_id AND t1.term_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "245" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: preparing: SELECT synonym FROM term_synonym WHERE term_id = ? test1.cgi: SELECT SYNONYMS: executing with values (245) (FK to Bio::Ontology::Term) test1.cgi: TermAdaptor: binding PK column to "246" test1.cgi: OntologyAdaptor: binding PK column to "33" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "246" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (246) (FK to Bio::Ontology::Term) test1.cgi: attempting to load adaptor class for Bio::LocationI test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::LocationAdaptor test1.cgi: instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::LocationAdaptor test1.cgi: attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor test1.cgi: Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::LocationAdaptor test1.cgi: preparing query: SELECT t1.location_id, t1.start_pos, t1.end_pos, t1.strand, t1.rank, t1.seqfeature_id, t1.dbxref_id FROM location t1 WHERE t1.seqfeature_id = ? test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "1" test1.cgi: attempting to load adaptor class for Bio::DB::Persistent::PersistentObjectFactory test1.cgi: \tattempting to load module Bio::DB::BioSQL::PersistentObjectFactoryAdaptor test1.cgi: attempting to load adaptor class for Bio::Factory::ObjectFactoryI test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryIAdaptor test1.cgi: \tattempting to load module Bio::DB::BioSQL::ObjectFactoryAdaptor test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: preparing SELECT ASSOC query: SELECT t2.dbxref_id, t2.dbname, t2.accession, t2.version, seqfeature_dbxref.rank FROM seqfeature t1, dbxref t2, seqfeature_dbxref WHERE t1.seqfeature_id = seqfeature_dbxref.seqfeature_id AND t2.dbxref_id = seqfeature_dbxref.dbxref_id AND t1.seqfeature_id = ? test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.name, seqfeature_qualifier_value.value, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id = ?) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: preparing SELECT ASSOC query: SELECT t2.term_id, t2.identifier, t2.name, t2.definition, t2.is_obsolete, seqfeature_qualifier_value.rank, t2.ontology_id FROM seqfeature t1, term t2, seqfeature_qualifier_value WHERE t1.seqfeature_id = seqfeature_qualifier_value.seqfeature_id AND t2.term_id = seqfeature_qualifier_value.term_id AND (t1.seqfeature_id = ? AND t2.ontology_id != ?) test1.cgi: TermAdaptor: binding ASSOC column 1 to "1" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing query: SELECT t1.comment_id, t1.comment_text, t1.rank, t1.bioentry_id FROM comment t1 WHERE 1 = 1 test1.cgi: Query FIND Bio::Annotation::Comment BY Bio::SeqFeature::Generic: binding column 1 to "1" test1.cgi: TermAdaptor: binding PK column to "260" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "260" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (260) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "2" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "2" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "250" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "250" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (250) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "3" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "3" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: TermAdaptor: binding PK column to "264" test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "264" (FK to Bio::Ontology::Term) test1.cgi: SELECT SYNONYMS: executing with values (264) (FK to Bio::Ontology::Term) test1.cgi: Query FIND LOCATION BY FEATURE: binding column 1 to "4" test1.cgi: no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory test1.cgi: DBLinkAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: SimpleValueAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::SimpleValue::ontology) test1.cgi: TermAdaptor: binding ASSOC column 1 to "4" (FK to Bio::SeqFeature::Generic) test1.cgi: TermAdaptor: binding ASSOC column 2 to "31" (constraint Bio::Annotation::OntologyTerm::ontology) test1.cgi: preparing SELECT statement: SELECT seq FROM biosequence WHERE bioentry_id = ? > > On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: > >> Anand, >> >> You should always post emails to the bioperl-l mailing list, never >> to individual developers (you'll get an answer much faster). Keep >> responses on the list as well. >> >> Though I use bioperl-db some, I'm probably not the best person to >> ask. Does anyone know what's going on with this? Does this have >> to do with the Species/Taxon refactoring? >> >> chris >> >> Begin forwarded message: >> >>> From: "Anand C. Patel" >>> Date: August 22, 2009 2:57:42 PM CDT >>> To: cjfields at illinois.edu >>> Subject: problem with bioperl (where's the Mus?) >>> >>> Dr. Fields, >>> >>> I'm struggling with what seems to be a strange quirk in Bioperl >>> +/- Bioperl-db/BioSQL. >>> >>> I've successfully loaded in genbank sequences into a biosql >>> database. >>> >>> When I try to write a genbank sequence back out, a curious thing >>> happens -- the Genus is missing from the SOURCE and ORGANISM areas. >>> >>> Despite reporting: >>> primary tag: source >>> tag: chromosome >>> value: 3 >>> >>> tag: db_xref >>> value: taxon:10090 >>> >>> tag: map >>> value: 3 74.5 cM >>> >>> tag: mol_type >>> value: mRNA >>> >>> tag: organism >>> value: Mus musculus >>> The sequence when printed out via SeqIO looks like this: >>> LOCUS NM_017474 2935 bp dna linear >>> ROD 13-AUG-2009 >>> DEFINITION Mus musculus chloride channel calcium activated 3 >>> (Clca3), mRNA. >>> ACCESSION NM_017474 XM_978159 >>> VERSION NM_017474.2 GI:255918210 >>> KEYWORDS . >>> SOURCE musculus >>> ORGANISM musculus >>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>> Bilateria; >>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>> Tetrapoda; >>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>> Glires; >>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>> Confession -- I have a final project due Monday wherein I boldly >>> elected to interface Bioperl, MySQL, Perl, and CGI. >>> (I'm an MD getting my MS in Bioinformatics.) >>> After many misadventures, I'm getting to the point where I could >>> actually complete the objectives, but this is bug is rather >>> problematic. >>> Thanks, >>> Anand >>> Anand C. Patel, MD >>> Assistant Professor of Pediatrics >>> Division of Allergy/Pulmonary Medicine >>> Department of Pediatrics >>> Washington University School of Medicine >>> 660 South Euclid Ave, Campus Box 8052 >>> St. Louis, MO 63110 >>> acpatel at wustl.edu >>> acpatel at gmail.com >>> acpatel at jhu.edu >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sun Aug 23 00:04:35 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:04:35 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? First -- before the sequences. In fact, I'm in the midst of reloading the taxonomy into a clean new database. I used namespace "genbank" instead of namespace "bioperl". Could that be the problem? >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). I did not know that! They were flagged "error", so I thought those might be the problem. >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > It works -- I just think I confused the system by not sticking with the default namespace? Thanks, Anand >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at gmail.com Sun Aug 23 00:13:37 2009 From: acpatel at gmail.com (Anand C. Patel) Date: Sat, 22 Aug 2009 19:13:37 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <2651C1FC-5BE3-4FDF-9325-6AB3BDB55738@gmail.com> Do I need to load ontology before loading sequences? (I promise I've been reading the documentation for days, and could not find a yea or nay on this) Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From acpatel at usa.net Sun Aug 23 01:13:14 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sat, 22 Aug 2009 20:13:14 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> Message-ID: <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Turns out that using the default namespace bioperl doesn't change anything. Common name -- still "genbank common name" in name_class in the taxon_name table for "house mouse", which I think the module is looking for as "common name". It's not behaving differently despite reloading the sequences. I've created a horrible munge that fixes it for cosmetic purposes: my $species = $seq->species; my $justspecies = $species->scientific_name(); my $binspecies = $species->binomial(); my $gbstring2 = $gbstring; $gbstring2 =~ s/$binspecies/$justspecies/g; $gbstring2 =~ s/$justspecies/$binspecies/g; But this does not strike me as a long term solution. Thanks, Anand On Aug 22, 2009, at 6:21 PM, Hilmar Lapp wrote: > > On Aug 22, 2009, at 6:44 PM, Anand C. Patel wrote: > >> [...] >> I think I know what's broken. Using load_seqdatabases.pl, I'd put >> a set of sequences from genbank into a biosql db in mysql. >> >> I'd also loaded the ncbi taxonomy using the load_ncbi_taxonomy.pl >> script from biosql. > > Did you load the NCBI taxonomy first, or afterwards? > >> >> When I searched for house (as in house mouse), I found that the >> name of the type of taxon class was "genbank common name". >> >> When I searched for musculus, it does appear as a type of >> "scientific name". > > It is the 'scientific name' class names that Bioperl-db will onto > the lineage array. > >> [...] >> I'm not just getting warnings. I'm getting errors. Tons of them. >> It's a wonder it's working at all. > > I'm not sure what you're referring to, but what you pasted into your > email were neither errors nor warnings but a debugging log (and what > it prints looks like it's working fine). You triggered that by > setting -verbose to a value greater than 0. If you don't want > debugging output, then you can just leave off that argument (no > debugging output is the default). > >> >> I started with the getentry.cgi script in the cgi-bin folder, and >> stripped most of it away. > > I see - which reminds me that I need to look at that script; I'm > afraid it hasn't been updated for a long time (that doesn't mean > though that it can't work - the core API has been stable for years). > >> >> Code: >> #!/usr/bin/perl >> >> [...] >> if( $@ || !defined $seq) { >> print "Got fetch exception of...\n
$@\n
"; >> exit(0); >> } > > Wouldn't you want to put that right after the eval() clause? > > -hilmar > >> >> >>> >>> On Aug 22, 2009, at 4:17 PM, Chris Fields wrote: >>> >>>> Anand, >>>> >>>> You should always post emails to the bioperl-l mailing list, >>>> never to individual developers (you'll get an answer much >>>> faster). Keep responses on the list as well. >>>> >>>> Though I use bioperl-db some, I'm probably not the best person to >>>> ask. Does anyone know what's going on with this? Does this have >>>> to do with the Species/Taxon refactoring? >>>> >>>> chris >>>> >>>> Begin forwarded message: >>>> >>>>> From: "Anand C. Patel" >>>>> Date: August 22, 2009 2:57:42 PM CDT >>>>> To: cjfields at illinois.edu >>>>> Subject: problem with bioperl (where's the Mus?) >>>>> >>>>> Dr. Fields, >>>>> >>>>> I'm struggling with what seems to be a strange quirk in Bioperl >>>>> +/- Bioperl-db/BioSQL. >>>>> >>>>> I've successfully loaded in genbank sequences into a biosql >>>>> database. >>>>> >>>>> When I try to write a genbank sequence back out, a curious thing >>>>> happens -- the Genus is missing from the SOURCE and ORGANISM >>>>> areas. >>>>> >>>>> Despite reporting: >>>>> primary tag: source >>>>> tag: chromosome >>>>> value: 3 >>>>> >>>>> tag: db_xref >>>>> value: taxon:10090 >>>>> >>>>> tag: map >>>>> value: 3 74.5 cM >>>>> >>>>> tag: mol_type >>>>> value: mRNA >>>>> >>>>> tag: organism >>>>> value: Mus musculus >>>>> The sequence when printed out via SeqIO looks like this: >>>>> LOCUS NM_017474 2935 bp dna linear >>>>> ROD 13-AUG-2009 >>>>> DEFINITION Mus musculus chloride channel calcium activated 3 >>>>> (Clca3), mRNA. >>>>> ACCESSION NM_017474 XM_978159 >>>>> VERSION NM_017474.2 GI:255918210 >>>>> KEYWORDS . >>>>> SOURCE musculus >>>>> ORGANISM musculus >>>>> Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; >>>>> Bilateria; >>>>> Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; >>>>> Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; >>>>> Tetrapoda; >>>>> Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; >>>>> Glires; >>>>> Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. >>>>> Confession -- I have a final project due Monday wherein I boldly >>>>> elected to interface Bioperl, MySQL, Perl, and CGI. >>>>> (I'm an MD getting my MS in Bioinformatics.) >>>>> After many misadventures, I'm getting to the point where I could >>>>> actually complete the objectives, but this is bug is rather >>>>> problematic. >>>>> Thanks, >>>>> Anand >>>>> Anand C. Patel, MD >>>>> Assistant Professor of Pediatrics >>>>> Division of Allergy/Pulmonary Medicine >>>>> Department of Pediatrics >>>>> Washington University School of Medicine >>>>> 660 South Euclid Ave, Campus Box 8052 >>>>> St. Louis, MO 63110 >>>>> acpatel at wustl.edu >>>>> acpatel at gmail.com >>>>> acpatel at jhu.edu >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From jkb at sanger.ac.uk Mon Aug 24 09:02:34 2009 From: jkb at sanger.ac.uk (James Bonfield) Date: Mon, 24 Aug 2009 10:02:34 +0100 Subject: [Bioperl-l] SCF installation Message-ID: <20090824090234.GB821@sanger.ac.uk> Lincoln Stein wrote: > It is all a bit confusing. On the download page for Staden, there is a > release 1.12, but the home page hasn't been updated and still reads > 1.11. If you download and install Staden 1.12, you'll get a library > named libstaden-read rather than libread; Bio::SCF hasn't been updated > for the name change, and so you will have to open up the Makefile.PL > and change "-lread" to "-lstaden-read" in order for it to compile. This post was pointed out to me by one of the Debian maintainers. I'm mailing the list directly but am not a subscriber, so please keep me listed in any replies. The Staden Package home page recently underwent a revamp to use the RSS feeds, automatically updating it. Unfortunately within a couple weeks of doing that sourceforge managed to break the file release RSS and so the site has stopped updating. The News section is still working though, so I ought to add a news post about io_lib-1.12.1 and it'll at least appear somewhere on the home page. Regarding the library name change, this was requested by Debian and also already implemented by Fedora. I agree with it too as libread.so is a truely appalling name, so the new name is here to stay. There shouldn't be a great number of differences compared to the 1.11.x release set though, with the only incompatibility I can immediately think of being the change from int to size_t in the Array structs. James PS. There's been very few changes to SCF over the years so it's likely all working just fine. Most recent io_lib changes have been SRF support, and a few associated tweaks to ZTR necessitated by SRF. -- James Bonfield (jkb at sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From acpatel at usa.net Sun Aug 23 17:17:08 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:17:08 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> On Aug 23, 2009, at 9:38 AM, Hilmar Lapp wrote: >> Common name -- still "genbank common name" in name_class in the >> taxon_name table for "house mouse", which I think the module is >> looking for as "common name". > > If you are loading the NCBI taxonomy first, this is coming from > NCBI, not one of the scripts or BioPerl, and hence we have no > control over it. Are you saying that there is no designated name of > class 'common name' for Mus musculus in the NCBI taxonomy dump? > > Also, the common name being present or not should have no bearing on > the lineage array, where the actual problem is, so I don't > understand right now how this would be connected to the problem you > are seeing. > >> >> It's not behaving differently despite reloading the sequences. >> >> I've created a horrible munge that fixes it for cosmetic purposes: >> my $species = $seq->species; >> my $justspecies = $species->scientific_name(); >> my $binspecies = $species->binomial(); >> >> my $gbstring2 = $gbstring; >> >> $gbstring2 =~ s/$binspecies/$justspecies/g; >> $gbstring2 =~ s/$justspecies/$binspecies/g; > > I don't understand what you are trying to achieve here - it seems > like you are making a substitution and then reverting it? Also, > $species->scientific_name() and $species->binomial() should be > identical for Mus musculus - are you finding different values being > returned? > > So in essence, I wouldn't expect your above code snippet to have any > effect, for both of these reasons. How do you find $gbstring2 to be > different from $gbstring at the end of this block of code? > > -hilmar I should have been clearer. Code snippet: my $species = $seq->species; print "common name = ",$species->common_name, "\n"; print "scientific name = ",$species->scientific_name, "\n"; print "species = ",$species->species, "\n"; print "genus = ",$species->genus, "\n"; print "sub_species = ",$species->sub_species, "\n"; print "binomial = ",$species->binomial, "\n"; print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; Output: common name = scientific name = musculus species = musculus genus = Mus sub_species = binomial = Mus musculus ncbi_taxid = 10090 The common name is missing, despite having loaded it from NCBI taxonomy using the provided script. It is ONLY present as this "genbank common name". So, what I get in $gbstring is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE musculus ORGANISM musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. What I get in $gbstring2 is: LOCUS NM_017474 2935 bp dna linear ROD 13- AUG-2009 DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3), mRNA. ACCESSION NM_017474 XM_978159 VERSION NM_017474.2 GI:255918210 KEYWORDS . SOURCE Mus musculus ORGANISM Mus musculus Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. Not perfect -- common name is still missing, but better. I could go through and replace all of the instances of "genbank common name" with "common name" and see if this fixes it. Any other thoughts? Thanks, Anand > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From acpatel at usa.net Sun Aug 23 17:25:16 2009 From: acpatel at usa.net (Anand C. Patel) Date: Sun, 23 Aug 2009 12:25:16 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> Message-ID: <855B196F-90D5-4170-AC0E-17A8F49A896C@usa.net> The other piece of potentially useful information is below -- output from SELECT * FROM `biosql`.`taxon_name` WHERE `taxon_id` = 138; (taxon_id 138 maps to ncbi_taxon_id 10090) taxon_id name name_class 138 LK3 transgenic mice includes 138 Mus muscaris misnomer 138 Mus musculus scientific name 138 Mus sp. 129SV includes 138 house mouse genbank common name 138 mice C57BL/6xCBA/CaJ hybrid misspelling 138 mouse common name 138 nude mice includes 138 transgenic mice includes The source from the genbank entry NM_017474 is: SOURCE Mus musculus (house mouse) Which is why I think the issue is that the name_class is "genbank common name" rather than common name. What does strike me as odd though is that not even "mouse" shows up -- common_name is empty. Thanks again, Anand From maj at fortinbras.us Mon Aug 24 14:37:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 24 Aug 2009 10:37:45 -0400 Subject: [Bioperl-l] The Documentation Project Message-ID: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Hi All, I'm starting this journey of 1000 mi (1620 km) with the following step: http://www.bioperl.org/wiki/The_Documentation_Project Please visit and comment. Thanks, Mark From hlapp at gmx.net Mon Aug 24 14:47:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 10:47:34 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> Message-ID: <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Hi Anna, sequence formats all have some varying amount of information that must be present or otherwise the syntax is invalid. If what you need is a two-column table of display_id and species name, then I would simply write that, and not squeeze it into a standard sequence format. (Unless you actually do want the sequence too, in which case you need to add it as a wanted slot; even in that case though, writing a three- column table might serve you better.) -hilmar On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > Dear all, > > I am trying to extract species taxonomy from ORGANISM line. In fact > I only need a first line under ORGANISM tag (e.i. genus + species). > I though that it would be possible to do with the SeqBuilder object > by stating > > $builder->add_wanted_slot('display_id','species'); > > the problem is, however, that I've got an empty file as a result. > What might be wrong with the script (see below)? > Thanks a lot in advance for any ideas, > > ------------------------------------------- > > #!/usr/bin/perl > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'raw'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > $builder->add_wanted_slot('display_id','species'); > > while(my $seq = $seq_in->next_seq()) { > $seq_out->write_seq($seq); > } > > exit; > > ---------------------------------------------------- > > Anna > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Mon Aug 24 16:50:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 11:50:05 -0500 Subject: [Bioperl-l] The Documentation Project In-Reply-To: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> References: <17B6051D3FFD41E8AE7C10AF182F49B2@NewLife> Message-ID: Mark, We should probably keep some of this discussion on the list, primarily as I've been running into conflicts with responses on the wiki page. It's more amenable to discussion. For anyone out there interested, you should speak up now, this is the best opportunity to do so (we're considering lack of input assent). I want to make a a few key points on behalf of the devs. It's impossible to consistently maintain two active copies of any documentation (wiki vs docs in the distribution). I have tried keeping up with this, helping with the 1.5.2 release, and full-on with the 1.6.0 release, and it's an extreme headache. From the maintenance point-of-view, this is what I would do: 1) Where possible always link to the official POD (either pdoc or CPAN) from the distribution. Make the API documentation link very prominent (I moved it to the docs section in the sidebar). Protect wiki module pages (in line with the 'one official copy' rule), allow writable discussion pages for additional, wiki-specific documentation (which can be added to the official docs as needed). 2) ...or, have a search bar specifically for the module documentation that links directly to the proper API/PDOC/CPAN page. Not sure how feasible that is, particularly since we plan on splitting things up a bit. 3) POD-ify any relevant documentation we intend on including in the wiki that also comes with the distribution (similar to Moose::Manual). I do not want to repeatedly edit a plain text INSTALL/ BUGS/DEPENDENCIES file to correspond with the wikified version for every release (nor vice versa). Long term: (this is my own personal style, YMMV) move all POD to the end of the file. Add a 'Status' tags to any method docs indicating implementation status (virtual, stable, unstable, public, private, etc). Move method POD to it's own section within the main documentation. Implement a coding style (as mentioned recently on list using perltidy, but also using proper method names). HOWTO's are also subject to API changes, but we haven't run into many issues with those yet, and they're wiki-specific. chris On Aug 24, 2009, at 9:37 AM, Mark A. Jensen wrote: > Hi All, > I'm starting this journey of 1000 mi (1620 km) with the following > step: > http://www.bioperl.org/wiki/The_Documentation_Project > Please visit and comment. > Thanks, > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 24 17:37:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 12:37:39 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: On Aug 24, 2009, at 12:16 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >>> >>> ... >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I think it's bioperl-db-related. You've previously pointed out the incongruity bioperl-db has with Bio::Species in a bug report (I indicated that in a separate post to this thread). >>> The common name is missing, despite having loaded it from NCBI >>> taxonomy using the provided script. >>> It is ONLY present as this "genbank common name". >>> [...] >>> I could go through and replace all of the instances of "genbank >>> common name" with "common name" and see if this fixes it. >> I think we need to first discuss how we want to treat the 'common >> name' versus 'genbank common name' classes in BioPerl. >> So question for everyone: do we need to have both available (in >> which case we need to add an accessor in Bio::Species), or only >> 'common name', or should 'genbank common name' override 'common >> name' if both are present and have different values. > > Bio::Species (via Bio::Taxon) has the common_names() method, for > which common_name() is an alias that in scalar context returns the > first of possibly many common names, one of which may be the genbank > common name. > > See: > http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes Yes, but that method stored names in an array and removes the context, presumed or not. If there are two or more, which names correspond to common_name, which to genbank_common_name (and which should we prefer)? chris From bix at sendu.me.uk Mon Aug 24 17:16:13 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 18:16:13 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> Message-ID: <4A92CADD.10901@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 23, 2009, at 1:17 PM, Anand C. Patel wrote: > >> [...] >> Code snippet: >> my $species = $seq->species; >> print "common name = ",$species->common_name, "\n"; >> print "scientific name = ",$species->scientific_name, "\n"; >> print "species = ",$species->species, "\n"; >> print "genus = ",$species->genus, "\n"; >> print "sub_species = ",$species->sub_species, "\n"; >> print "binomial = ",$species->binomial, "\n"; >> print "ncbi_taxid = ",$species->ncbi_taxid, "\n"; >> >> Output: >> common name = >> scientific name = musculus >> species = musculus >> genus = Mus >> sub_species = >> binomial = Mus musculus >> ncbi_taxid = 10090 > > This points to a problem in Bio::Species::scientific_name(), given that > binomial() is correct. Could you file this as a bug report? What code creates the Bio::Species object here? I suspect this code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> The common name is missing, despite having loaded it from NCBI >> taxonomy using the provided script. >> It is ONLY present as this "genbank common name". >> [...] >> I could go through and replace all of the instances of "genbank common >> name" with "common name" and see if this fixes it. > I think we need to first discuss how we want to treat the 'common name' > versus 'genbank common name' classes in BioPerl. > > So question for everyone: do we need to have both available (in which > case we need to add an accessor in Bio::Species), or only 'common name', > or should 'genbank common name' override 'common name' if both are > present and have different values. Bio::Species (via Bio::Taxon) has the common_names() method, for which common_name() is an alias that in scalar context returns the first of possibly many common names, one of which may be the genbank common name. See: http://www.bioperl.org/wiki/Core_1.5.2_new_features#Implementation_changes From hlapp at gmx.net Mon Aug 24 17:54:13 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 13:54:13 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92CADD.10901@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> Message-ID: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >> This points to a problem in Bio::Species::scientific_name(), given >> that binomial() is correct. Could you file this as a bug report? > > What code creates the Bio::Species object here? I suspect this code > isn't aware of changes in Bio::Species since BioPerl 1.5.2. I see. Any pointer to what would tell me what I need to change or is everything in the Bio::Species POD? BTW what the Bioperl-db code does is instantiate the blank object and then populate it through its accessors (mostly the classification() array). If what it has been doing in the past is now considered incorrect, at least it doesn't raise any warning that would alert one to that ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From robert.bradbury at gmail.com Mon Aug 24 18:38:08 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 24 Aug 2009 14:38:08 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: As a really "off-the-wall" suggestion, you might see if somehow the "name" being pulled is the SwissProt name rather than the species name. I run into this when I'm fetching FASTA sequences from SwissProt in that the sequence identifier names are non-standard for some of the early "standard" species, e.g. "HUMAN", # Homo sapiens "MOUSE", # Mus musculus "RAT", # Rattus norvegicus "BOVIN", # Bos taurus "HORSE", # Equus caballus "PIG", # Sus scrofa "RABIT", # Oryctolagus cuniculus "SHEEP", # Ovis aries "YEAST", # Saccharomyces cerevisiae (Baker's yeast) etc. Eventually they largely adopted the 3+2 letter species derived name, but the early "standard" names are anomalies. You might run a test on a newly sequenced species (Gorilla, Opossum, Armadillo, Dog, etc.) to see if you get a "standard" species name. Robert Bradbury From dan.bolser at gmail.com Mon Aug 24 19:13:26 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 24 Aug 2009 20:13:26 +0100 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> <2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> Message-ID: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > From bix at sendu.me.uk Mon Aug 24 19:12:05 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 20:12:05 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> Message-ID: <4A92E605.5090706@sendu.me.uk> Hilmar Lapp wrote: > > On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: > >>> This points to a problem in Bio::Species::scientific_name(), given >>> that binomial() is correct. Could you file this as a bug report? >> >> What code creates the Bio::Species object here? I suspect this code >> isn't aware of changes in Bio::Species since BioPerl 1.5.2. > > I see. Any pointer to what would tell me what I need to change or is > everything in the Bio::Species POD? ... I won't guarantee the perfection of the POD ;) > BTW what the Bioperl-db code does is instantiate the blank object and > then populate it through its accessors (mostly the classification() > array). If what it has been doing in the past is now considered > incorrect, at least it doesn't raise any warning that would alert one to > that ... Yuh... If you point out the code that creates the Bio::Species I can look into it for you and suggest what needs changing and why it doesn't work (or if it's a bug in Bio::Species). I can't remember things clearly right now, though classification() I guess was supposed to be backwards compatible. From cjfields at illinois.edu Mon Aug 24 19:52:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 14:52:56 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92E605.5090706@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> Message-ID: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>> This points to a problem in Bio::Species::scientific_name(), >>>> given that binomial() is correct. Could you file this as a bug >>>> report? >>> >>> What code creates the Bio::Species object here? I suspect this >>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >> I see. Any pointer to what would tell me what I need to change or >> is everything in the Bio::Species POD? > > ... I won't guarantee the perfection of the POD ;) > > >> BTW what the Bioperl-db code does is instantiate the blank object >> and then populate it through its accessors (mostly the >> classification() array). If what it has been doing in the past is >> now considered incorrect, at least it doesn't raise any warning >> that would alert one to that ... > > Yuh... If you point out the code that creates the Bio::Species I can > look into it for you and suggest what needs changing and why it > doesn't work (or if it's a bug in Bio::Species). I can't remember > things clearly right now, though classification() I guess was > supposed to be backwards compatible. Sendu, I think it's related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 Bio::DB::BioSQL::SpeciesAdaptor and Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in question i think. chris From bix at sendu.me.uk Mon Aug 24 20:01:29 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 24 Aug 2009 21:01:29 +0100 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> Message-ID: <4A92F199.2030900@sendu.me.uk> Chris Fields wrote: > > On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: > >> Hilmar Lapp wrote: >>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>> This points to a problem in Bio::Species::scientific_name(), given >>>>> that binomial() is correct. Could you file this as a bug report? >>>> >>>> What code creates the Bio::Species object here? I suspect this code >>>> isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>> I see. Any pointer to what would tell me what I need to change or is >>> everything in the Bio::Species POD? >> >> ... I won't guarantee the perfection of the POD ;) >> >> >>> BTW what the Bioperl-db code does is instantiate the blank object and >>> then populate it through its accessors (mostly the classification() >>> array). If what it has been doing in the past is now considered >>> incorrect, at least it doesn't raise any warning that would alert one >>> to that ... >> >> Yuh... If you point out the code that creates the Bio::Species I can >> look into it for you and suggest what needs changing and why it >> doesn't work (or if it's a bug in Bio::Species). I can't remember >> things clearly right now, though classification() I guess was supposed >> to be backwards compatible. > > Sendu, I think it's related to this: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 > > Bio::DB::BioSQL::SpeciesAdaptor and > Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in > question i think. Ah, yes, well there you go then. So it is a classification() issue. Judging by what I said in that bug, looks like the db code needs to be changed to put the full scientific name in the first element it passes to classification. From cjfields at illinois.edu Mon Aug 24 20:27:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 15:27:23 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <4A92F199.2030900@sendu.me.uk> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> Message-ID: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>> Hilmar Lapp wrote: >>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>> given that binomial() is correct. Could you file this as a bug >>>>>> report? >>>>> >>>>> What code creates the Bio::Species object here? I suspect this >>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>> I see. Any pointer to what would tell me what I need to change or >>>> is everything in the Bio::Species POD? >>> >>> ... I won't guarantee the perfection of the POD ;) >>> >>> >>>> BTW what the Bioperl-db code does is instantiate the blank object >>>> and then populate it through its accessors (mostly the >>>> classification() array). If what it has been doing in the past is >>>> now considered incorrect, at least it doesn't raise any warning >>>> that would alert one to that ... >>> >>> Yuh... If you point out the code that creates the Bio::Species I >>> can look into it for you and suggest what needs changing and why >>> it doesn't work (or if it's a bug in Bio::Species). I can't >>> remember things clearly right now, though classification() I guess >>> was supposed to be backwards compatible. >> Sendu, I think it's related to this: >> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >> Bio::DB::BioSQL::SpeciesAdaptor and >> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules in >> question i think. > > Ah, yes, well there you go then. So it is a classification() issue. > Judging by what I said in that bug, looks like the db code needs to > be changed to put the full scientific name in the first element it > passes to classification. Yup. I believe the only blocking issue with implementing it was potential backwards-compat problems with databases loaded using old behavior and then being updated post-1.5.2 (new behavior). I would think this only affects sequence data loaded w/o taxonomy preloaded, but I'm not sure. I suggest, if you can fix it, go ahead make the necessary change. We can then post a big warning to BioSQL and here about the problem, something along the lines of 'bioperl-db in svn may be backwards incompatible with species information loaded in previous versions; it may eat your first born' or similar. It's an absolutely necessary fix, and may effectively kill a bunch of other db/species-related bugs. chris From Kevin.M.Brown at asu.edu Mon Aug 24 21:48:35 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 24 Aug 2009 14:48:35 -0700 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com><990CEF10B1AD4BD5BE9977FD62DB3437@NewLife><2c8757af0908240550n7242c68era49ce752cf39fd86@mail.gmail.com> <2c8757af0908241213r55ac8799ub41eb885272a13e3@mail.gmail.com> Message-ID: <1A4207F8295607498283FE9E93B775B4062D2655@EX02.asurite.ad.asu.edu> You can use Bio::SimpleAlign for those tasks, but you, the programmer, have to remember that you didn't front pad the sequence and so can't utilize certain functions blindly. I've used SimpleAlign with LocatableSeq objects and wrote a few custom methods that did things like creating slices from the simplealign for each locatableseq. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Bolser Sent: Monday, August 24, 2009 12:13 PM To: Chris Fields Cc: bioperl-l at lists.open-bio.org; Mark A. Jensen; Paolo Pavan Subject: Re: [Bioperl-l] Bio::SimpleAlign constructor? Thanks for these clarifications Chris. Basically I'm looking for an object that will easily let me edit a multiple sequence alignment, including: adding sequences (with given alignments), opening gaps, extracting columns (with linked sequences), transferring features, etc. etc. For example, I may want to analyse a set of short reads aligned against the human genome. Somehow it felt natural to represent the position of the aligned read as a Bio::LocatableSeq (with the alignment details being captured by a sequence string (including gaps) representing the read and the reference sequence - basically because that is what the aligner gives me). Now, you're saying Bio::LocatableSeq is not suitable for that purpose, which is fine. But the question is, how should I be doing this? Adding megabases of gaps to thousands of short reads feels wrong... is there a 'correct' way to do this currently in BioPerl? I think the source of my confusion was that SimpleAlign takes Bio::LocatableSeq as input, and I thought that was 'the way' to represent sequences in the MSA. I'll keep hacking at what I need to get done and I'll post the code. I'm just wondering how much 'alignment editing' could be usefully done by a suitable object within BP? Thanks again for your help, Dan. 2009/8/24 Chris Fields : > Dan, all, > > Bio::SimpleAlign doesn't align anything for you. It makes no assumptions > about the data being added, beyond possibly checking for the seqs to be > flush prior to analyses. > > Here's the reason why: > > The object doesn't 'know' the seqs map across from one to the other as > below: > >> ... >> ## REF tacattaaagacccg >> ## SEQ1 taca.taaa...... >> ## SEQ2 .....taaaga.ccg >> >> my $aln = Bio::SimpleAlign->new(); >> >> $aln->gap_char('.'); >> >> my $r = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' ); >> my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' >> ); >> my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' >> ); >> >> $aln->add_seq( $r ); >> $aln->add_seq( $s1 ); >> $aln->add_seq( $s2 ); > > Above, you are making the assumption that SimpleAlign 'knows' where to match > the start of $s1 and $s2 to the ref sequence $r. LocatableSeq::start() does > NOT indicate that (the LocatableSeq docs, and their usage, should indicate > that). > > Think about HSP alignments in a BLAST report; the start/end/strand > coordinates are where the sequence in the alignment maps to the original > query or hit sequence. They don't indicate where the hit maps to the query > (the alignment itself does that in a column-wise fashion). > > I'm not sure, maybe it needs to be more explicit in the documentation, but > SimpleAlign does not align the sequences for you (and it shouldn't be > expected to). There are much better (faster, more accurate) ways to do > that. > >> if($CLUDGE){ >> foreach(($r, $s1, $s2)){ >> $_->seq( '.' x ($_->start - 1) . $_->seq ) >> } >> } >> >> ## Prepare an 'output stream' for the alignment: >> my $aliWriter = Bio::AlignIO-> >> new( -fh => \*STDOUT, >> -format => 'clustalw', >> ); >> >> warn "\nOUTPUT:\n"; >> $aliWriter->write_aln($aln); > > ... > >> I was calling the "fill in the gaps yourself" step a CLUDGE because I >> had expected the alignment object to take care of this for me. Is >> there any reason that it couldn't do this 'CLUDGE' automatically? It >> seems strange that it insists on being passed locatable sequence >> objects, but then largely ignore the given location. >> >> Would it not be possible to have this happen when the sequences are >> written out from the alignment? I think it should still be possible to >> index the column number via the (gapless) sequence number... or did I >> get confused? There are two levels of confusion here (on my part), 1) >> the concepts behind the objects and 2) the implementation details. > > Mentioned above (no assumptions on how locatableseqs map to one another). > WYSIWYG. There is nothing precluding you from writing up code to do that, > though it doesn't belong in SimpleAlign. Maybe Bio::Align::Utilities for > post-processing padding, or Bio::Tools::PurePerlAlign for a pure perl > alignment implementation (there are, believe it or not, pure perl > implementations of Smith-Waterman and Needleman-Wunsch. > >> Thanks for any hints on how to understand or potentially how to fix >> these problems. >> >> Cheers, >> Dan. > > > Not that SimpleAlign and LocatableSeqs don't have their share of problems. > However, I don't think you can expect this behavior to change with the > refactors. > > chris > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Tue Aug 25 00:12:18 2009 From: hartzell at alerce.com (George Hartzell) Date: Mon, 24 Aug 2009 17:12:18 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl Message-ID: <19091.11362.190209.844074@already.dhcp.gene.com> There's a warning at Ensembl about the perl api code depending on an old version of bioperl (1.2.3) http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html Does anyone have current information about that dependency? My quick-n-dirty tests suggest that one can't build an app that uses both new Bioperl and the ensembl api without ensembl picking up the newer bioperl libraries (or your app getting the older ones). It's not clear what parts of the ensembl world depend on the older BioPerl. Anyone have any recipes to make it work? Any info on a possible modernization of the ensembl code? Thanks, g. From cjfields at illinois.edu Tue Aug 25 02:29:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 21:29:38 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <19091.11362.190209.844074@already.dhcp.gene.com> References: <19091.11362.190209.844074@already.dhcp.gene.com> Message-ID: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > > There's a warning at Ensembl about the perl api code depending on an > old version of bioperl (1.2.3) > > http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html > > Does anyone have current information about that dependency? > > My quick-n-dirty tests suggest that one can't build an app that uses > both new Bioperl and the ensembl api without ensembl picking up the > newer bioperl libraries (or your app getting the older ones). It's > not clear what parts of the ensembl world depend on the older BioPerl. I've asked this question several times of the ensembl folk w/o an adequate response. My general feeling is even they may not really know for sure (though I recall ewan saying something about feature/ annotation changes around then, and maybe something about the blastreporter). Saying that, the ensembl perl API worked for me using bioperl-live (and bioperl 1.6) as of a couple months ago. You might eventually run into some issues; if so report them back here and to the ensembl list. > Anyone have any recipes to make it work? > > Any info on a possible modernization of the ensembl code? That is completely up to the ensembl folks. bioperl 1.2.3 is full enough of bugs, and I don't plan on backporting any changes to that branch (seems kind of silly, as that branch is now about six yrs old). > Thanks, > > g. np! -chris From hlapp at gmx.net Tue Aug 25 03:17:29 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 24 Aug 2009 23:17:29 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> [...] >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). The code change is for retrieving data, right? So I'm not sure how it would break backwards compatibility, unless one has taxon entries created before the change (i.e., about 3 years ago?) and through loading sequences rather than through loading the NCBI taxonomy. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue Aug 25 04:10:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:10:15 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Right, that's what I thought as well, but I just wasn't clear on that. So, basically we're saying, as long as the code change is on the retrieving side, everything's okay? Then I'm pretty sure I know how to fix it, at least partly. I can probably squeeze that in unless Sendu's working on it. Sendu? chris From cjfields at illinois.edu Tue Aug 25 04:28:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 24 Aug 2009 23:28:26 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> Message-ID: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> On Aug 24, 2009, at 10:17 PM, Hilmar Lapp wrote: > > On Aug 24, 2009, at 4:27 PM, Chris Fields wrote: > >> >> On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: >> >>> [...] >>> Ah, yes, well there you go then. So it is a classification() >>> issue. Judging by what I said in that bug, looks like the db code >>> needs to be changed to put the full scientific name in the first >>> element it passes to classification. >> >> >> Yup. I believe the only blocking issue with implementing it was >> potential backwards-compat problems with databases loaded using old >> behavior and then being updated post-1.5.2 (new behavior). > > The code change is for retrieving data, right? So I'm not sure how > it would break backwards compatibility, unless one has taxon entries > created before the change (i.e., about 3 years ago?) and through > loading sequences rather than through loading the NCBI taxonomy. > > -hilmar Okay, if possible I would like you or Sendu to review that last commit I made to bioperl-db. It includes Sendu's patch; I commented out sections that were modifying the genus/species when loaded in, but there are a few TODO's I noted as well (everything is in populate_from_row()). 02species.t is now failing but I think it's based on the same old behavior; I'll look into it. chris From geoeco at rambler.ru Tue Aug 25 07:01:24 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:01:24 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <94c73820908240553m72540519pd86bf78e29041462@mail.gmail.com> Message-ID: <1074529971.1251183684.50392744.40754@mcgi70.rambler.ru> Hi Rohit, Thanks a lot for your comments, it actually worked well, but in fact i only want to extract species names as I want to have it in a separate file together with a fasta file with sequences. So, thanks a lot again! Anna * Rohit Ghai [Mon, 24 Aug 2009 14:53:03 +0200]: > hi > > I think you forgot to add the "seq" in the builder.. thats why the file > is > empty. > Also, the species name, though being parsed, is nowhere in the output. > Here's a version > using fasta output that you can probably customize further. This also > takes > the full > name of the organism and adds to the description line in the output. > > use strict; > use Bio::SeqIO; > use Bio::Seq::SeqBuilder; > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > my $infile = shift or die $usage; > my $infileformat = 'Genbank' ; > my $outfile = shift or die $usage; > my $outfileformat = 'fasta'; > my $i = 0; > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > '-format' => $outfileformat); > > my $builder = $seq_in->sequence_builder(); > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species','seq','description'); > > while(my $seq = $seq_in->next_seq()) { > > my $desc = $seq->description(); > my $species_string = $seq->species()->binomial('FULL'); > $desc = $desc . " [$species_string]"; > $seq->description($desc); > $seq_out->write_seq($seq); > } > > exit; > > > On Mon, Aug 24, 2009 at 11:20 AM, Anna Kostikova > wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact I > only > > need a first line under ORGANISM tag (e.i. genus + species). I though > that > > it would be possible to do with the SeqBuilder object by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From geoeco at rambler.ru Tue Aug 25 07:03:56 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:03:56 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <6B4871D9-5DB0-4762-A613-3561B40CE099@illinois.edu> Message-ID: <734135890.1251183836.48962856.71827@mcgi59.rambler.ru> hello Chris, Well, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? Thanks, Anna * Chris Fields [Mon, 24 Aug 2009 07:55:56 -0500]: > Anna, > > It's stored in the Bio::Species object. I have to say, though, I > think you're using a stick of dynamite for a scalpel here; if you only > need ORGANISM parse it out directly (it's much faster). Or am I > missing something? > > chris > > On Aug 24, 2009, at 4:20 AM, Anna Kostikova wrote: > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From geoeco at rambler.ru Tue Aug 25 07:09:43 2009 From: geoeco at rambler.ru (Anna Kostikova) Date: Tue, 25 Aug 2009 11:09:43 +0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> Message-ID: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> hello Hilmar, Thanks for your comments. Actually, my final aim is to get 2 files: first one is a fasta file with all the sequences, and the seconds one is simply a list of species names extracted from the same Genbank file. So that's why I though it would be a good thing to put all together into one script with bioperl objects. Is there a better way to do it? the reason, why I don't want a simple parsing for species names is that i also want to be able to which gene has been sequenced while (my $inseq = $seq_in->next_seq) { if ($inseq->desc =~ m/5\.8S ribosomal RNA/) { $seq_out->write_seq($inseq); } } and only it is 5.8s rRNA I want to extract the species name and a sequences. And I thought that with direct parsing it would be much longer code. Am I wrong? i am a newbie both in bioperl and bioinformatics, so all comments would be appreciated:) Anna * Hilmar Lapp [Mon, 24 Aug 2009 10:47:34 -0400]: > Hi Anna, > > sequence formats all have some varying amount of information that must > be present or otherwise the syntax is invalid. If what you need is a > two-column table of display_id and species name, then I would simply > write that, and not squeeze it into a standard sequence format. > (Unless you actually do want the sequence too, in which case you need > to add it as a wanted slot; even in that case though, writing a three- > column table might serve you better.) > > -hilmar > > On Aug 24, 2009, at 5:20 AM, Anna Kostikova wrote: > > > > > Dear all, > > > > I am trying to extract species taxonomy from ORGANISM line. In fact > > I only need a first line under ORGANISM tag (e.i. genus + species). > > I though that it would be possible to do with the SeqBuilder object > > by stating > > > > $builder->add_wanted_slot('display_id','species'); > > > > the problem is, however, that I've got an empty file as a result. > > What might be wrong with the script (see below)? > > Thanks a lot in advance for any ideas, > > > > ------------------------------------------- > > > > #!/usr/bin/perl > > use strict; > > use Bio::SeqIO; > > use Bio::Seq::SeqBuilder; > > > > my $usage = "genbank_to_fasta_cleaning.pl infile outfile \n"; > > my $infile = shift or die $usage; > > my $infileformat = 'Genbank' ; > > my $outfile = shift or die $usage; > > my $outfileformat = 'raw'; > > my $i = 0; > > > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > > '-format' => $infileformat); > > > > my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > > '-format' => $outfileformat); > > > > my $builder = $seq_in->sequence_builder(); > > > > $builder->want_none(); > > $builder->add_wanted_slot('display_id','species'); > > > > while(my $seq = $seq_in->next_seq()) { > > $seq_out->write_seq($seq); > > } > > > > exit; > > > > ---------------------------------------------------- > > > > Anna > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Tue Aug 25 11:34:18 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:34:18 -0400 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> <7B75612C-3020-4A79-B318-723F02395E5C@gmx.net> <7F55170D-068F-4752-B89C-5BE156699EF4@illinois.edu> Message-ID: <4A8C2A89-C212-4969-8B01-3DA7D7DE7862@gmx.net> On Aug 25, 2009, at 12:28 AM, Chris Fields wrote: > Okay, if possible I would like you or Sendu to review that last > commit I made to bioperl-db. Will do. > [...] > 02species.t is now failing but I think it's based on the same old > behavior; I'll look into it. I would expect that if the classification array is now different, so the test will need changing to expect the "new" behavior. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Aug 25 11:52:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 25 Aug 2009 07:52:11 -0400 Subject: [Bioperl-l] extracting ORGANISM line from genbank file In-Reply-To: <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> References: <818886575.1251105613.158913056.48679@mcgi36.rambler.ru> <958C2D2D-D806-41F4-B8EA-81C1811D68A9@gmx.net> <718902846.1251184183.168806680.60067@mcgi37.rambler.ru> Message-ID: <3B23691B-B165-4CC3-889E-04DE45AB1627@gmx.net> Hi Anna: On Aug 25, 2009, at 3:09 AM, Anna Kostikova wrote: > Actually, my final aim is to get 2 files: first one is a fasta file > with all the sequences, and the seconds one is simply a list of > species names Then I'd change your script to write two files: one with the sequences in FASTA format (you can use Bio::SeqIO for that), and the second one in the format you need it (one species name per line?). (Right now you are writing one file in Genbank format, which is quite unlike the above, right?) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From whs at ebi.ac.uk Tue Aug 25 11:04:23 2009 From: whs at ebi.ac.uk (William Spooner) Date: Tue, 25 Aug 2009 12:04:23 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> Message-ID: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> On 25 Aug 2009, at 03:29, Chris Fields wrote: > On Aug 24, 2009, at 7:12 PM, George Hartzell wrote: > >> >> There's a warning at Ensembl about the perl api code depending on an >> old version of bioperl (1.2.3) >> >> http://www.ensembl.org/info/docs/webcode/install/ensembl-code.html >> >> Does anyone have current information about that dependency? >> >> My quick-n-dirty tests suggest that one can't build an app that uses >> both new Bioperl and the ensembl api without ensembl picking up the >> newer bioperl libraries (or your app getting the older ones). It's >> not clear what parts of the ensembl world depend on the older >> BioPerl. > > I've asked this question several times of the ensembl folk w/o an > adequate response. My general feeling is even they may not really > know for sure (though I recall ewan saying something about feature/ > annotation changes around then, and maybe something about the > blastreporter). > > Saying that, the ensembl perl API worked for me using bioperl-live > (and bioperl 1.6) as of a couple months ago. You might eventually > run into some issues; if so report them back here and to the ensembl > list. I'm not sure of the full list of dependencies, but my feeling is that most are related to the Ensembl application/web code; the blast interface in particular. I can support Chris's findings that the API works (AFAIK) with bioperl-live, but this is obviously untested. > >> Anyone have any recipes to make it work? >> >> Any info on a possible modernization of the ensembl code? > > That is completely up to the ensembl folks. bioperl 1.2.3 is full > enough of bugs, and I don't plan on backporting any changes to that > branch (seems kind of silly, as that branch is now about six yrs old). It would be nice if someone at Ensembl could compile a list of BioPerl dependencies. At least that would give a feel for the scope of the problem... Will From ak at ebi.ac.uk Tue Aug 25 13:43:19 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 25 Aug 2009 14:43:19 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <20090825134319.GE12422@qux.windows.ebi.ac.uk> [cut] > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... > > Will Hi Will, and list, These are the BioPerl modules that the Ensembl Core API "use" or otherwise directly call (scanned our current HEAD code): Bio::Annotation::DBLink in Bio::EnsEMBL::DBEntry Bio::Tools::CodonTable in Bio::EnsEMBL::Utils::TranscriptAlleles in Bio::EnsEMBL::PredictionTranscript in Bio::EnsEMBL::Transcript.pm Bio::LocatableSeq in Bio::EnsEMBL::DnaDnaAlignFeature Bio::PrimarySeqI in Bio::EnsEMBL::Slice Bio::Root::IO in Bio::EnsEMBL::Utils::Converter Bio::Root::Root in Bio::EnsEMBL::Utils::EasyArgv Bio::Seq in Bio::EnsEMBL::Utils::PolyA in Bio::EnsEMBL::Intron in Bio::EnsEMBL::Exon in Bio::EnsEMBL::Transcript in Bio::EnsEMBL::Translation in Bio::EnsEMBL::Utils::TranscriptAlleles Bio::SeqFeature::FeaturePair in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeature::Generic in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair Bio::SeqFeatureI in Bio::EnsEMBL::SeqFeatureI Bio::SimpleAlign in Bio::EnsEMBL::DnaDnaAlignFeature Bio::Species in Bio::EnsEMBL::DBSQL::MetaContainer I have not looked at the other Ensembl APIs (Variation, FuncGen, Compara, Web, Pipeline, etc.), and I might possibly have missed references to some BioPerl modules. I have also not indicated the relative importance of any of these modules (clearly Bio::Seq is central, but I don't know how widely the code that accesses Bio::SeqFeature::Generic is used) or investigated if any of the references to BioPerl modules occur in deprecated code. As far as I know, there are currently no plans to get rid of these dependencies. Or there might be, only they are not very far up the priority list right now. I would be happy to look at conservative patches, but can not promise snappy response times. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer -{ }- European Bioinformatics Institute (EMBL-EBI) -{ }- Wellcome Trust Genome Campus, Hinxton -{ }- Cambridge CB10 1SD, United Kingdom -{ }- From cjfields at illinois.edu Tue Aug 25 14:07:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 25 Aug 2009 09:07:52 -0500 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <20090825134319.GE12422@qux.windows.ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <20090825134319.GE12422@qux.windows.ebi.ac.uk> Message-ID: <9D26C8FA-6D74-42C2-A2BD-4EFF529DA05A@illinois.edu> Andreas, Thanks for the response, been waiting for something a bit more official for a while now. We can definitely help you patch these as needed when problems arise, just let us know, or file a bug report listing issues. Scanning through there will be a could of future trouble spots: 1) We are very likely deprecating Bio::Species in favor of Bio::Taxon (that may be relatively easy to map, as Bio::Species now delegates to Bio::Taxon and similar anyway). 2) We will be refactoring Bio::SimpleAlign/LocatableSeq. There are too many corner cases where assumptions are made. We'll try to stick with the current API, but there may be a few delegating methods. More significantly, we're also planning a significant restructuring of bioperl prior to 1.7, basically splitting it into several (more easily maintainable) parts. The exact nature of these is still a bit fuzzy (we have to sort out dependencies) but we do plan on making a bundle package to assemble a complete old-style 'monolithic' bioperl, just a bit more customizable. It's very likely the versioning scheme will stay the same for the core (root) set of modules, but the others may end up having their own versioning for monitoring dependencies. chris On Aug 25, 2009, at 8:43 AM, Andreas K?h?ri wrote: > [cut] >> >> It would be nice if someone at Ensembl could compile a list of >> BioPerl dependencies. At least that would give a feel for the scope >> of the problem... >> >> Will > > Hi Will, and list, > > These are the BioPerl modules that the Ensembl Core API "use" or > otherwise directly call (scanned our current HEAD code): > > Bio::Annotation::DBLink > in Bio::EnsEMBL::DBEntry > > Bio::Tools::CodonTable > in Bio::EnsEMBL::Utils::TranscriptAlleles > in Bio::EnsEMBL::PredictionTranscript > in Bio::EnsEMBL::Transcript.pm > > Bio::LocatableSeq > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::PrimarySeqI > in Bio::EnsEMBL::Slice > > Bio::Root::IO > in Bio::EnsEMBL::Utils::Converter > > Bio::Root::Root > in Bio::EnsEMBL::Utils::EasyArgv > > Bio::Seq > in Bio::EnsEMBL::Utils::PolyA > in Bio::EnsEMBL::Intron > in Bio::EnsEMBL::Exon > in Bio::EnsEMBL::Transcript > in Bio::EnsEMBL::Translation > in Bio::EnsEMBL::Utils::TranscriptAlleles > > Bio::SeqFeature::FeaturePair > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeature::Generic > in Bio::EnsEMBL::Utils::Converter::ens_bio_featurePair > > Bio::SeqFeatureI > in Bio::EnsEMBL::SeqFeatureI > > Bio::SimpleAlign > in Bio::EnsEMBL::DnaDnaAlignFeature > > Bio::Species > in Bio::EnsEMBL::DBSQL::MetaContainer > > > I have not looked at the other Ensembl APIs (Variation, FuncGen, > Compara, Web, Pipeline, etc.), and I might possibly have missed > references to some BioPerl modules. I have also not indicated > the relative importance of any of these modules (clearly Bio::Seq > is central, but I don't know how widely the code that accesses > Bio::SeqFeature::Generic is used) or investigated if any of the > references to BioPerl modules occur in deprecated code. > > As far as I know, there are currently no plans to get rid of these > dependencies. Or there might be, only they are not very far up the > priority list right now. I would be happy to look at conservative > patches, but can not promise snappy response times. > > > Regards, > Andreas > > -- > Andreas K?h?ri, Ensembl Software Developer -{ }- > European Bioinformatics Institute (EMBL-EBI) -{ }- > Wellcome Trust Genome Campus, Hinxton -{ }- > Cambridge CB10 1SD, United Kingdom -{ }- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From acpatel at usa.net Tue Aug 25 03:54:01 2009 From: acpatel at usa.net (Anand C. Patel) Date: Mon, 24 Aug 2009 22:54:01 -0500 Subject: [Bioperl-l] Fwd: problem with bioperl (where's the Mus?) In-Reply-To: <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> References: <5E0196C5-A50D-4AB0-8FA8-F1C7B690D5EA@gmail.com> <4B576C38-7BA0-463A-A5B4-DDB2ABE99E86@illinois.edu> <5C140444-750F-4AF2-A60D-2A864FA44250@gmx.net> <0976C391-BFB6-482B-AB88-377D1098CB0A@gmail.com> <5390B21A-10BE-4FB8-9CD8-3A9D5B1B1627@usa.net> <17AE22F0-BBBD-4D8B-86A1-2E2CBAEB9230@usa.net> <4529B2EA-BF03-4463-90A3-D644DF9057FE@gmx.net> <4A92CADD.10901@sendu.me.uk> <227E76A9-CC6D-4006-B4BE-102E62A0B34B@gmx.net> <4A92E605.5090706@sendu.me.uk> <8BF25032-CFB1-48E5-8074-C0EEE1BBFC83@illinois.edu> <4A92F199.2030900@sendu.me.uk> <8F8463C4-4251-42E8-A5DB-A25AFC86CF4A@illinois.edu> Message-ID: <9BA4272D-E7A1-4530-B8D8-B6156823BFDB@usa.net> I preloaded the NCBI taxonomy into the biosql database using the provided script before adding the sequences from genbank format text file (downloaded directly from genbank) using the script provided by bioperl-db, which would be what created the Bio::Species objects (I'd assume) from the text files, prior to inserting them into the database. Hope this helps, Anand On Aug 24, 2009, at 3:27 PM, Chris Fields wrote: > > On Aug 24, 2009, at 3:01 PM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 24, 2009, at 2:12 PM, Sendu Bala wrote: >>>> Hilmar Lapp wrote: >>>>> On Aug 24, 2009, at 1:16 PM, Sendu Bala wrote: >>>>>>> This points to a problem in Bio::Species::scientific_name(), >>>>>>> given that binomial() is correct. Could you file this as a bug >>>>>>> report? >>>>>> >>>>>> What code creates the Bio::Species object here? I suspect this >>>>>> code isn't aware of changes in Bio::Species since BioPerl 1.5.2. >>>>> I see. Any pointer to what would tell me what I need to change >>>>> or is everything in the Bio::Species POD? >>>> >>>> ... I won't guarantee the perfection of the POD ;) >>>> >>>> >>>>> BTW what the Bioperl-db code does is instantiate the blank >>>>> object and then populate it through its accessors (mostly the >>>>> classification() array). If what it has been doing in the past >>>>> is now considered incorrect, at least it doesn't raise any >>>>> warning that would alert one to that ... >>>> >>>> Yuh... If you point out the code that creates the Bio::Species I >>>> can look into it for you and suggest what needs changing and why >>>> it doesn't work (or if it's a bug in Bio::Species). I can't >>>> remember things clearly right now, though classification() I >>>> guess was supposed to be backwards compatible. >>> Sendu, I think it's related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092#c4 >>> Bio::DB::BioSQL::SpeciesAdaptor and >>> Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver are the two modules >>> in question i think. >> >> Ah, yes, well there you go then. So it is a classification() issue. >> Judging by what I said in that bug, looks like the db code needs to >> be changed to put the full scientific name in the first element it >> passes to classification. > > > Yup. I believe the only blocking issue with implementing it was > potential backwards-compat problems with databases loaded using old > behavior and then being updated post-1.5.2 (new behavior). I would > think this only affects sequence data loaded w/o taxonomy preloaded, > but I'm not sure. > > I suggest, if you can fix it, go ahead make the necessary change. > We can then post a big warning to BioSQL and here about the problem, > something along the lines of 'bioperl-db in svn may be backwards > incompatible with species information loaded in previous versions; > it may eat your first born' or similar. It's an absolutely > necessary fix, and may effectively kill a bunch of other db/species- > related bugs. > > chris > From dan.bolser at gmail.com Tue Aug 25 15:16:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 25 Aug 2009 16:16:14 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? Message-ID: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Hi, Can some one set $wgEnableMWSuggest on the BioPerl wiki please? http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest I generally find this a great feature to have on any MW install. Can we also create a page (usually "BioPerl:Configuration" (or '$wgSiteName:Configuration')) to report details of the specific MW configuration settings used on the wiki? This is also a good place for people to request configuration changes to tweak the way the wiki works. Cheers, Dan. From jason at bioperl.org Tue Aug 25 17:17:44 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 25 Aug 2009 10:17:44 -0700 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: Can you send sysadmin request mail to the helpdesk - support at open-bio.org so mauricio or someone can have it in the queue. [aside] I've had to stop doing OBF sysadmin work so we are definitely looking for someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris Dagdigian who do mediawiki and sysadmin support. We've reached a bit of crunch where there are lots of things to tweak and customize for the various flavors of MW installs that the projects want but we don't have enough dedicated admins to really support this. Most of us have gotten into these projects to support our own bioinformatics programming not sysadmin tasks so there is a bit of gap here. Some of us (me) were not trained as sysadmin but jumped in and figured out how to help and do it - and learned valuable life skills... =) We're discussing plans to upgrade the machines in the future which would improve performance and reliability we hope and also use this opportunity to streamline the MW installs to be a more easily maintained wikifarm. [/aside] -jason On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > Hi, > > Can some one set $wgEnableMWSuggest on the BioPerl wiki please? > > http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest > > > I generally find this a great feature to have on any MW install. Can > we also create a page (usually "BioPerl:Configuration" (or > '$wgSiteName:Configuration')) to report details of the specific MW > configuration settings used on the wiki? This is also a good place for > people to request configuration changes to tweak the way the wiki > works. > > > Cheers, > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Tue Aug 25 13:45:59 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 25 Aug 2009 14:45:59 +0100 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> Message-ID: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> > > It would be nice if someone at Ensembl could compile a list of > BioPerl dependencies. At least that would give a feel for the scope > of the problem... I just downloaded ? ensembl ? ensembl-compara ? ensembl-variation ? ensembl-functgenomics from their website and did a regex on the files for /^use (Bio::.+);/ which reveals (filtering out Bio::EnsEMBL::*): Bio::AlignIO Bio::Annotation::DBLink Bio::Das::ProServer::SourceAdaptor Bio::Das::ProServer::SourceAdaptor::Transport::generic Bio::Index::Fastq Bio::LocatableSeq Bio::Location::Simple Bio::MAGE::Experiment::Experiment Bio::MAGE::XMLUtils Bio::Perl Bio::PrimarySeq Bio::PrimarySeqI Bio::Root::Root Bio::Root::RootI Bio::Search::HSP::EnsemblHSP Bio::Seq Bio::SeqFeature::FeaturePair Bio::SeqFeature::Generic Bio::SeqFeatureI Bio::SeqIO Bio::SimpleAlign Bio::Species Bio::Tools::CodonTable Bio::Tools::Run::Phylo::PAML::Codeml Bio::TreeIO does that help? (I have the list broken down by which module/script contains which if that helps also) cheers adam From hartzell at alerce.com Tue Aug 25 20:22:20 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 13:22:20 -0700 Subject: [Bioperl-l] code review on LocatableSeq performance fix. Message-ID: <19092.18428.494334.482303@already.dhcp.gene.com> [For better or worse] I use pairs of locatable seq's to represent alignments between cDNAs (spliced mRNA) and genomic sequence. I end up using column_from_residue_number a lot to map features back and forth between the coordinate system. My sequences tend to be fairly long, and the current implementation of column_from_residue_number (which splits the sequences into arrays of individual characters) performs very badly on them. I've included below a small variation on a patch that I've been using for a while (when I pulled it up to the current bioperl-live I changed a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). Instead of creating whopping big arrays and then looping over them it breaks the sequence down into runs of residues/gaps and strides across them. It also unwinds the strandedness test and avoids the cute trick of using an anonymous sub (which saves a couple of lines in the source file but adds *signficant* overhead every time around the loop). All hail Devel::NYTProf. Chris et al.'s comments about the mysteries and vagaries of Bio::LocatableSeq makes me leary of just committing it. Anyone want to comment on it? g. Index: Bio/LocatableSeq.pm =================================================================== --- Bio/LocatableSeq.pm (revision 16001) +++ Bio/LocatableSeq.pm (working copy) @@ -423,27 +423,47 @@ unless $resnumber =~ /^\d+$/ and $resnumber > 0; if ($resnumber >= $self->start() and $resnumber <= $self->end()) { - my @residues = split //, $self->seq; - my $count = $self->start(); - my $i; - my ($start,$end,$inc,$test); - my $strand = $self->strand || 0; - # the following bit of "magic" allows the main loop logic to be the - # same regardless of the strand of the sequence - ($start,$end,$inc,$test)= ($strand == -1)? - (scalar(@residues-1),0,-1,sub{$i >= $end}) : - (0,scalar(@residues-1),1,sub{$i <= $end}); + my @chunks; + my $column_incr; + my $current_column; + my $current_residue = $self->start - 1; + my $seq = $self->seq; + my $strand = $self->strand || 0; - for ($i=$start; $test->(); $i+= $inc) { - if ($residues[$i] ne '.' and $residues[$i] ne '-') { - $count == $resnumber and last; - $count++; - } - } - # $i now holds the index of the column. - # The actual column number is this index + 1 + if ($strand == -1) { +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = -1; + $current_column = (CORE::length $seq) + 1; + } + else { +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; + $column_incr = 1; + $current_column = 0; + } - return $i+1; + while (my $chunk = shift @chunks) { +# if ($chunk =~ m|^[\.\-]|o) { + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { + $current_column += $column_incr * CORE::length($chunk); + } + else { + if ($current_residue + CORE::length($chunk) < $resnumber) { + $current_column += $column_incr * CORE::length($chunk); + $current_residue += CORE::length($chunk); + } + else { + if ($strand == -1) { + $current_column -= $resnumber - $current_residue; + } + else { + $current_column += $resnumber - $current_residue; + } + return $current_column; + } + } + } } $self->throw("Could not find residue number $resnumber"); From hartzell at alerce.com Tue Aug 25 21:07:43 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 25 Aug 2009 14:07:43 -0700 Subject: [Bioperl-l] Modern BioPerl vs. Ensembl In-Reply-To: <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> References: <19091.11362.190209.844074@already.dhcp.gene.com> <23AD692F-69C0-415C-A14A-F01CCCCFA378@illinois.edu> <33863A38-5673-42A6-B82D-FEB7B2AEF39F@ebi.ac.uk> <1CA4E49D-7093-4C63-AD11-8D72960EE93D@sgul.ac.uk> Message-ID: <19092.21151.457226.192791@already.dhcp.gene.com> Adam Witney writes: > > > > It would be nice if someone at Ensembl could compile a list of > > BioPerl dependencies. At least that would give a feel for the scope > > of the problem... > > I just downloaded > > $,1s"(B ensembl > $,1s"(B ensembl-compara > $,1s"(B ensembl-variation > $,1s"(B ensembl-functgenomics > > from their website and did a regex on the files for > > /^use (Bio::.+);/ > > which reveals (filtering out Bio::EnsEMBL::*): > > Bio::AlignIO > Bio::Annotation::DBLink > Bio::Das::ProServer::SourceAdaptor > Bio::Das::ProServer::SourceAdaptor::Transport::generic > Bio::Index::Fastq > Bio::LocatableSeq > Bio::Location::Simple > Bio::MAGE::Experiment::Experiment > Bio::MAGE::XMLUtils > Bio::Perl > Bio::PrimarySeq > Bio::PrimarySeqI > Bio::Root::Root > Bio::Root::RootI > Bio::Search::HSP::EnsemblHSP > Bio::Seq > Bio::SeqFeature::FeaturePair > Bio::SeqFeature::Generic > Bio::SeqFeatureI > Bio::SeqIO > Bio::SimpleAlign > Bio::Species > Bio::Tools::CodonTable > Bio::Tools::Run::Phylo::PAML::Codeml > Bio::TreeIO > > does that help? (I have the list broken down by which module/script > contains which if that helps also) What would be most useful to me would be to understand where they *need* to use release 1.2.3. Is there something magical about their use of e.g. Bio::Seq. It's worth noting that your technique won't pick up various modules that are loaded on demand by e.g. Bio::SearchIO. g. From maj at fortinbras.us Wed Aug 26 11:39:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 07:39:40 -0400 Subject: [Bioperl-l] code review on LocatableSeq performance fix. In-Reply-To: <19092.18428.494334.482303@already.dhcp.gene.com> References: <19092.18428.494334.482303@already.dhcp.gene.com> Message-ID: <55514878273F4E3F8D9E438FD2F3AB7D@NewLife> I think it's great. column_from_residue_number doesn't have any secret side effects, and the patch preserves nice integer in, nice integer out, and input and output both are 1-origin indices as far as I can tell. I say go for it- MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, August 25, 2009 4:22 PM Subject: [Bioperl-l] code review on LocatableSeq performance fix. > > [For better or worse] I use pairs of locatable seq's to represent > alignments between cDNAs (spliced mRNA) and genomic sequence. > > I end up using column_from_residue_number a lot to map features back > and forth between the coordinate system. > > My sequences tend to be fairly long, and the current implementation of > column_from_residue_number (which splits the sequences into arrays of > individual characters) performs very badly on them. > > I've included below a small variation on a patch that I've been using > for a while (when I pulled it up to the current bioperl-live I changed > a couple of regexps to use $GAP_SYMBOLS and $RESIDUE_SYMBOLS). It > passes the t/Seq/LocatableSeq.t tests and Works For Me (tm). > > Instead of creating whopping big arrays and then looping over them it > breaks the sequence down into runs of residues/gaps and strides across > them. It also unwinds the strandedness test and avoids the cute trick > of using an anonymous sub (which saves a couple of lines in the source > file but adds *signficant* overhead every time around the loop). > > All hail Devel::NYTProf. > > Chris et al.'s comments about the mysteries and vagaries of > Bio::LocatableSeq makes me leary of just committing it. > > Anyone want to comment on it? > > g. > > Index: Bio/LocatableSeq.pm > =================================================================== > --- Bio/LocatableSeq.pm (revision 16001) > +++ Bio/LocatableSeq.pm (working copy) > @@ -423,27 +423,47 @@ > unless $resnumber =~ /^\d+$/ and $resnumber > 0; > > if ($resnumber >= $self->start() and $resnumber <= $self->end()) { > - my @residues = split //, $self->seq; > - my $count = $self->start(); > - my $i; > - my ($start,$end,$inc,$test); > - my $strand = $self->strand || 0; > - # the following bit of "magic" allows the main loop logic to be the > - # same regardless of the strand of the sequence > - ($start,$end,$inc,$test)= ($strand == -1)? > - (scalar(@residues-1),0,-1,sub{$i >= $end}) : > - (0,scalar(@residues-1),1,sub{$i <= $end}); > + my @chunks; > + my $column_incr; > + my $current_column; > + my $current_residue = $self->start - 1; > + my $seq = $self->seq; > + my $strand = $self->strand || 0; > > - for ($i=$start; $test->(); $i+= $inc) { > - if ($residues[$i] ne '.' and $residues[$i] ne '-') { > - $count == $resnumber and last; > - $count++; > - } > - } > - # $i now holds the index of the column. > - # The actual column number is this index + 1 > + if ($strand == -1) { > +# @chunks = reverse $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = reverse $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = -1; > + $current_column = (CORE::length $seq) + 1; > + } > + else { > +# @chunks = $seq =~ m/[^\.\-]+|[\.\-]+/go; > + @chunks = $seq =~ m/[$RESIDUE_SYMBOLS]+|[$GAP_SYMBOLS]+/go; > + $column_incr = 1; > + $current_column = 0; > + } > > - return $i+1; > + while (my $chunk = shift @chunks) { > +# if ($chunk =~ m|^[\.\-]|o) { > + if ($chunk =~ m|^[$GAP_SYMBOLS]|o) { > + $current_column += $column_incr * CORE::length($chunk); > + } > + else { > + if ($current_residue + CORE::length($chunk) < $resnumber) { > + $current_column += $column_incr * CORE::length($chunk); > + $current_residue += CORE::length($chunk); > + } > + else { > + if ($strand == -1) { > + $current_column -= $resnumber - $current_residue; > + } > + else { > + $current_column += $resnumber - $current_residue; > + } > + return $current_column; > + } > + } > + } > } > > $self->throw("Could not find residue number $resnumber"); > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Aug 26 14:59:24 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 26 Aug 2009 16:59:24 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis Message-ID: <4A954DCC.4050200@pasteur.fr> Hi, I am playing with Bio::Restriction::* objects and find it very useful. Especially I am filtering output for blunt and cohesive enzymes. However, there's an exception thrown when I use 'cutters' method from B::R::Analysis : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bad end parameter (34). End must be less than the total length of sequence (total=7) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::PrimarySeq::subseq /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 STACK: Bio::Restriction::Analysis::_cuts /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 STACK: Bio::Restriction::Analysis::cut /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 STACK: Bio::Restriction::Analysis::cutters /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion lib/Bio/Restriction/Analysis/blunt.pm:86 STACK: Bio::Restriction::Analysis::blunt::cut_in_frames lib/Bio/Restriction/Analysis/blunt.pm:65 STACK: ./check_phase.pl:213 ----------------------------------------------------------- The problem with this enzyme is that the cut site is over the enzyme recognition site (from Rebase withrefm.907): <1>BceSI <2> <3>SSAAGCG(27/27) <4> <5>Bacillus cereus <6>ATCC 10987 <7> <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. Lett., vol. 202, pp. 189-193. Xu, S.-Y., Unpublished observations. For this enzyme, here are the values stored into B::R::Enzyme object ($e): $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN $e->cut => 34 $e->string => SSAAGCG $e->seq->seq => SSAAGCG So my question is, wouldn't be faire to set B::PrimarySeq::seq with value of $e->site when such enzyme are seen in the source file. NOTE from B::R::Analysis::_enzymes_sites (commented): # The following should not be an exception, both Type I and Type III # enzymes cut outside of their recognition sequences #if ($site < 0 || $site > length($enz->string)) { # $self->throw("This is (probably) not your fault.\nGot a cut site of $site and a # sequence of ".$enz->string); # } And this is exactly the problem I'm facing! In _enzymes_sites the code is trying to subseq our sequence to get before and after seq as : $beforeseq=$enz->seq->subseq(1, $site); $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); and this throws an error as the cutting site is far over (pos 34) the enzyme know recognition site SSAAGCG (length=7). Has anybody a clue on how to fix/patch it? Thanks for any reply Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Aug 26 15:20:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 10:20:59 -0500 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: <07222470-41ED-4E17-9383-65A7D02CE9E1@illinois.edu> What version of Bioperl are you using? Mark Jensen did some refactoring of this code after the 1.6.0 release that should appear in 1.6.1; I'll be working on the first alpha for that release starting Friday. chris On Aug 26, 2009, at 9:59 AM, Emmanuel Quevillon wrote: > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type > III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Wed Aug 26 15:38:44 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Wed, 26 Aug 2009 11:38:44 -0400 Subject: [Bioperl-l] Generalized reciprocal blast Message-ID: I would like to know whether or not anyone has attempted to create a "generalized" reciprocal blast component for BioPerl? One sees papers all the time where they discuss running reciprocal blasts to compare a new species to an old "standard" species or a set of species or running an all-to-all set of comparisons to match up all of the "known" proteins from species and determine which are outliers (and therefore "novel"). There are also accumulating merged sets in NCBI HomoloGene (which seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) and Ensembl (which seems to be working with a much larger set of 40-50 genomes some of which may be somewhat incomplete and are certainly poorly "explored". I have, I believe, seen code "fragments" from various authors, perhaps some on the BioPerl list, which perform some major subset of a typical "reciprocal blast". Now what I am looking for is a relatively generalizable some-to-some reciprocal blast utility. I want to be able to specify the genes (or gene family), e.g. some of the ~150 known DNA repair genes. It would be helpful to also specify how "tolerant" the blast "true reciprocal" criteria are. There are some genes where there is a very strict 1-to-1 relationship across many genomes. But for genes which involve relatively standard domains, e.g. "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for example its more like 5-to-5 and it would be really nice to be able to specify the strictness or quality level [1] for "matching" genes (and even which genes are to be excluded because they are known to be false homologues). Then to top this off I want to be able to combine known public e.g. (HomoloGene / Uniigene / Ensembl) databases with perhaps local private databases or database subsets (e.g. emerging or specialized genomes). The goal here of course to determine the precise phylogenetic relationships between all of the DNA repair genes and how there may be gain / loss / evolution of function that can be related to species characteristics (size, longevity, etc.). Is there a generalized reciprocal blast component in BioPerl? Or is it a "build-it-yourself" situation (that I have to believe has been built probably a few dozen times by various researchers / organizations / companies)? Thanks, Robert Bradbury 1. This would be handled in BioPerl with a customizable user function which could be tailored to handle specific cases -- for example a function which when handed a set of 100 potential "matches" could go through those 100 matches, identify common domains, and then "re-rate" matches based on considerations such as the type and number of common domains, domains being in the same order, etc. I.e. criteria which may be difficult to completely generalize across entire genomes but are fairly obvious if you are looking at a graphical replication of a gene set in HomoloGene. From jason at bioperl.org Wed Aug 26 15:55:04 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 26 Aug 2009 08:55:04 -0700 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: Robert - BioPerl is has traditionally been a toolkit for building these types of pipelines and not intended to necessarily be a place for larger systems. That said, BRH is a pretty easy algorithm that could be applied with the tools in place, the main issue is what kind of lookup table you want to do for establishing the BRH. Hashes are okay, but I think BDB or Sqlite end up being more scalable and allow for persistence. Really, I would use something like OrthoMCL rather than reciprocal BLAST to identify families anyways. It uses Bioperl under the hood for parsing - though it suffers from some pretty inefficient management of the lookup table for the BRH part of the algorithm - it can be run on your own customized datasets to integrate public and private data. You might also find better luck in building good alignments for the key members of your target gene family of interest and then using a profile HMM (or even just the new HMMER3 jackhmmer or phmmer which don't require a MSA) to identify the full set of homologs in all the databases. If this is the only set of families you care about it is a lot less computational work to go through and pull these out with an HMM or HMMER search and build trees from these results rather than dealing with the computational time of the all-vs-all DB searches that you are proposing. -jason On Aug 26, 2009, at 8:38 AM, Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal > blasts to > compare a new species to an old "standard" species or a set of > species or > running an all-to-all set of comparisons to match up all of the > "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI > HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" > genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly > poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, > perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes > (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be > helpful > to also specify how "tolerant" the blast "true reciprocal" criteria > are. > There are some genes where there is a very strict 1-to-1 > relationship across > many genomes. But for genes which involve relatively standard > domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in > mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes > (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic > relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics > (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is > it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user > function which > could be tailored to handle specific cases -- for example a function > which > when handed a set of 100 potential "matches" could go through those > 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, > domains being > in the same order, etc. I.e. criteria which may be difficult to > completely > generalize across entire genomes but are fairly obvious if you are > looking > at a graphical replication of a gene set in HomoloGene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Wed Aug 26 15:20:41 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 11:20:41 -0400 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: <4A954DCC.4050200@pasteur.fr> References: <4A954DCC.4050200@pasteur.fr> Message-ID: Hi Emmanuel-- This may be fixed in the latest version of Bio::Restriction, which is not available in the standard 1.6 distribution. I suggest you try replacing the Bio/Restriction directory in your distribution with the current bioperl-live modules. You can get these by using Subversion: $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction ./Restriction If you're brave, better might be to obtain the latest trunk and reinstall; $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live $ cd bioperl-live $ perl Build.PL $ ./Build $ ./Build test $ ./Build install Please update the list with your progress- cheers Mark ----- Original Message ----- From: "Emmanuel Quevillon" To: Sent: Wednesday, August 26, 2009 10:59 AM Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis > Hi, > > I am playing with Bio::Restriction::* objects and find it very useful. > Especially I am filtering output for blunt and cohesive enzymes. > However, there's an exception thrown when I use 'cutters' method > from B::R::Analysis : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bad end parameter (34). End must be less than the total length > of sequence (total=7) > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::PrimarySeq::subseq > /usr/local/share/perl/5.10.0/Bio/PrimarySeq.pm:388 > STACK: Bio::Restriction::Analysis::_enzyme_sites > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:891 > STACK: Bio::Restriction::Analysis::_cuts > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:788 > STACK: Bio::Restriction::Analysis::cut > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:366 > STACK: Bio::Restriction::Analysis::cutters > /usr/local/share/perl/5.10.0/Bio/Restriction/Analysis.pm:681 > STACK: Bio::Restriction::Analysis::blunt::_load_simple_digestion > lib/Bio/Restriction/Analysis/blunt.pm:86 > STACK: Bio::Restriction::Analysis::blunt::cut_in_frames > lib/Bio/Restriction/Analysis/blunt.pm:65 > STACK: ./check_phase.pl:213 > ----------------------------------------------------------- > > The problem with this enzyme is that the cut site is over the enzyme > recognition site (from Rebase withrefm.907): > > <1>BceSI > <2> > <3>SSAAGCG(27/27) > <4> > <5>Bacillus cereus > <6>ATCC 10987 > <7> > <8>Hegna, I.K., Bratland, H., Kolsto, A., (2001) FEMS Microbiol. > Lett., vol. 202, pp. 189-193. > Xu, S.-Y., Unpublished observations. > > > For this enzyme, here are the values stored into B::R::Enzyme object > ($e): > > $e->site => SSAAGCGNNNNNNNNNNNNNNNNNNNNNNNNNNN > $e->cut => 34 > $e->string => SSAAGCG > $e->seq->seq => SSAAGCG > > > So my question is, wouldn't be faire to set B::PrimarySeq::seq with > value of $e->site when such enzyme are seen in the source file. > > NOTE from B::R::Analysis::_enzymes_sites (commented): > > # The following should not be an exception, both Type I and Type III > # enzymes cut outside of their recognition sequences > #if ($site < 0 || $site > length($enz->string)) { > # $self->throw("This is (probably) not your fault.\nGot a cut > site of $site and a # sequence of ".$enz->string); > # } > > And this is exactly the problem I'm facing! > In _enzymes_sites the code is trying to subseq our sequence to get > before and after seq as : > > $beforeseq=$enz->seq->subseq(1, $site); > $afterseq=$enz->seq->subseq($site+1, $enz->seq->length); > > and this throws an error as the cutting site is far over (pos 34) > the enzyme know recognition site SSAAGCG (length=7). > > Has anybody a clue on how to fix/patch it? > > Thanks for any reply > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Aug 26 16:03:59 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 12:03:59 -0400 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: re:aside -- I can help with this; I promise not to break anything. cheers MAJ ----- Original Message ----- From: "Jason Stajich" To: "Dan Bolser" Cc: "BioPerl List" Sent: Tuesday, August 25, 2009 1:17 PM Subject: Re: [Bioperl-l] $wgEnableMWSuggest on the wiki please? > Can you send sysadmin request mail to the helpdesk - support at open-bio.org > so mauricio or someone can have it in the queue. > > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking > for someone to help with the ALL VOLUNTEER team of now just Mauricio > and Chris Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak > and customize for the various flavors of MW installs that the projects > want but we don't have enough dedicated admins to really support > this. Most of us have gotten into these projects to support our own > bioinformatics programming not sysadmin tasks so there is a bit of gap > here. Some of us (me) were not trained as sysadmin but jumped in and > figured out how to help and do it - and learned valuable life > skills... =) > > We're discussing plans to upgrade the machines in the future which > would improve performance and reliability we hope and also use this > opportunity to streamline the MW installs to be a more easily > maintained wikifarm. > > [/aside] > > -jason > On Aug 25, 2009, at 8:16 AM, Dan Bolser wrote: > >> Hi, >> >> Can some one set $wgEnableMWSuggest on the BioPerl wiki please? >> >> http://www.mediawiki.org/wiki/Manual:$wgEnableMWSuggest >> >> >> I generally find this a great feature to have on any MW install. Can >> we also create a page (usually "BioPerl:Configuration" (or >> '$wgSiteName:Configuration')) to report details of the specific MW >> configuration settings used on the wiki? This is also a good place for >> people to request configuration changes to tweak the way the wiki >> works. >> >> >> Cheers, >> Dan. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Wed Aug 26 16:25:21 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 26 Aug 2009 18:25:21 +0200 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <628aabb70908260925q25039506nab6e1c661f704e2a@mail.gmail.com> Hi Robert, Just to add another comment on this: The problem of identifying orthologs is quite a bit trickier than it looks, in part due to the many-to-many relationships you noted. There is a whole body of literature on this topic -- here's a recent review that includes OrthoMCL that Jason mentioned and others: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000262 (disclaimer: I work in a lab that offers one of the many attempts to solve this problem) So I would say that although it is possible to make a customizable function as you describe, there are several existing approaches (read: downloadable code you can run on your data) that would probably give better results. Dave From hsa_rim at yahoo.co.in Wed Aug 26 19:56:38 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 01:26:38 +0530 (IST) Subject: [Bioperl-l] Latest Cytoband files Message-ID: <484629.15190.qm@web94612.mail.in2.yahoo.com> Hi, Can anybody tell me how can I get latest cytoband files with stain information for homo spaiens, mus musculus and others. I am using 36.3 version of RefSeq for Humans and 36.1 version of RefSeq for mus musculus. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From cjfields at illinois.edu Wed Aug 26 20:36:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 15:36:31 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates Message-ID: All, I just pushed one very key bit for nextgen sequence analysis to svn, mainly parsing of all three FASTQ variants. These can be called by using: # grabs the FASTQ parser, specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq-illumina', -file => 'mydata.fq'); # same, explicitly specifies the Illumina variant my $in = Bio::SeqIO->new(-format => 'fastq', -variant => 'illumina', -file => 'mydata.fq'); # simple 'fastq' format defaults to 'sanger' variant my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>mydata.fq'); FASTQ works for both input and output. As mentioned before, the next_dataset() method also exists for getting simple hashrefs, see the module documentation for more. This was one of the few remaining blockers for the 1.6.1 point release. I'll run a clean checkout of main trunk to test, then work on merging everything over from trunk starting Friday and push out 1.6.0_1 (first alpha) beginning of next week to get some CPAN Tester information. If everything looks fine the final point release will follow soon after. Cheers! chris From rmb32 at cornell.edu Wed Aug 26 20:56:20 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 26 Aug 2009 13:56:20 -0700 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <4A95A174.3070706@cornell.edu> Hurray! You rock Chris! R From lsbrath at gmail.com Wed Aug 26 21:08:06 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 26 Aug 2009 17:08:06 -0400 Subject: [Bioperl-l] rendering graphics from genbank files. Message-ID: <69367b8f0908261408g6750c1d2we3409a016fe186b7@mail.gmail.com> Hi, I am running into to problems rendering the 5'UTR and 3'UTR features in the graphic. I get an error message saying that these are string literals. Better yet, how do I add the 5'UTR and 3'UTR regions to the CDS feature when the only features in my genbank file are mRNA, CDS, and gene? What I want is to display the gene structure. I am using the last template provided in bioperl howto graphics. Mgavi From biopython at maubp.freeserve.co.uk Wed Aug 26 21:16:08 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 22:16:08 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: Message-ID: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> On Wed, Aug 26, 2009 at 9:36 PM, Chris Fields wrote: > All, > > I just pushed one very key bit for nextgen sequence analysis to svn, mainly > parsing of all three FASTQ variants. ?These can be called by using: > > ?# grabs the FASTQ parser, specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq-illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# same, explicitly specifies the Illumina variant > ?my $in = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -variant ? => 'illumina', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> 'mydata.fq'); > > ?# simple 'fastq' format defaults to 'sanger' variant > ?my $out = Bio::SeqIO->new(-format ? ?=> 'fastq', > ? ? ? ? ? ? ? ? ? ? ? ? ? -file ? ? ?=> '>mydata.fq'); > > FASTQ works for both input and output. ?As mentioned before, the > next_dataset() method also exists for getting simple hashrefs, see the > module documentation for more. > > This was one of the few remaining blockers for the 1.6.1 point release. > ... ?If everything looks fine the final point release will follow soon after. It is looking much better than yesterday - nice work :) However, there are a few rough edges still. =========================== Evil wrapping =========================== Chris - Did you get the zip file of FASTQ examples I sent off list? One of these was the evil_wrapping.fastq file already in Biopython CVS/git (under a new name). This is intended as a real torture test, with line wrapped quality strings where plenty of the lines start with "+" or "@" characters. Bioperl doesn't like this file at all - but I have not dug into why. =========================== Sanger To Illumina 1.3+ =========================== When mapping a Sanger FASTQ file with very high scores to Illumina, these don't get the maximum value imposes (ASCII 126, tidle). e.g. $ ./biopython_sanger2illumina < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:676: UserWarning: Data loss - max PHRED quality 62 in Illumina FASTQ warnings.warn("Data loss - max PHRED quality 62 in Illumina FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ But, with bioperl-live SVN, $ ./bioperl_sanger2illumina < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for illumina:63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ You are using "@" (ASCI 64), which in this context means a PHRED score of zero. =========================== Sanger To Solexa =========================== Likewise when mapping a Sanger FASTQ file with very high scores to Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, tidle). For example, $ ./biopython_sanger2solexa < sanger_93.fastq /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; But, $ ./bioperl_sanger2solexa < sanger_93.fastq --------------------- WARNING --------------------- MSG: Quality values not found for solexa:0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 --------------------------------------------------- @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFEDB@><< i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 (an odd thing to happen - getting the lowest score wouldn't surprise me so much). Furthermore, notice that PHRED scores 0 and 1 have both been mapped to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. =========================== Still, things are looking up :) Peter From maj at fortinbras.us Wed Aug 26 21:03:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 26 Aug 2009 17:03:13 -0400 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <4A95A174.3070706@cornell.edu> References: <4A95A174.3070706@cornell.edu> Message-ID: <1E03634D20424F659F417AE7F5D26039@NewLife> +1 ----- Original Message ----- From: "Robert Buels" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 26, 2009 4:56 PM Subject: Re: [Bioperl-l] Next-Gen and the next point release - updates > Hurray! You rock Chris! > > R > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sac at bioperl.org Wed Aug 26 22:33:16 2009 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 26 Aug 2009 15:33:16 -0700 Subject: [Bioperl-l] MGED meeting in Phoenix, AZ, Oct 5-8 Message-ID: <8f200b4c0908261533y74c42b1aif662ef13a8fe6711@mail.gmail.com> The MGED Society's annual meeting is of potential interest to anyone working with functional genomics data sets, or interested in best practices for analyzing and annotating their functional genomics experiments. The meeting topic is "Next-Gen Sequencing and Translational Genomics" and as usual, they've got a great line-up of speakers (included below). It's in Phoenix, AZ Oct 5-8, early registration ends on 5 Sep. (Note that MGED has expanded its reach beyond just microarrays.) For more information on registration and abstract submission, go to * http://www.mgedmeeting.org* For hotel accommodations, go to * http://www.starwoodmeeting.com/StarGroupsWeb/res?id=0903232443&key=42DE2* Keynotes *Hank Greely* Deane F. and Kate Edelman Johnson Professor of Law Stanford Law School *Elaine Mardis* Associate Professor, Genetics, Molecular Microbiology Washington University in St. Louis School of Medicine *Daniel Von Hoff* Director, Clinical Translational Research Division Translational Genomics Research Institute (TGen) Plenary Speakers: *Steven Brenner* Associate Professor, Plant and Microbial Biology University of California, Berkeley *Lynda Chin* Associate Professor, Dermatology Dana Farber Cancer Institute, Harvard Medical School *David Craig* Associate Director, Neurogenomics Division Translational Genomics Research Institute (TGen) *Michael Eisen* Scientist, Lawrence Berkeley National Lab and Associate Professor Department of Molecular and Cellular Biology, University of California, Berkeley *Gad Getz* Head of Cancer Genome Analysis at the Broad Institute of MIT and Harvard *Mathieu Lupien* Assistant Professor, Genetics Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center *Joanna Mountain* Senior Director, Research 23andMe, Inc. *Dana Pe'er* Assistant Professor, Biology and Computer Science Columbia University Biological Sciences *John Quackenbush* Professor of Computational Biology & Bioinformatics, Biostatistics Dana Farber Cancer Institute, Harvard School of Public Health *Cole Trapnell* Ph. D. Student, Computer Science University of Maryland, College Park From cjfields at illinois.edu Thu Aug 27 02:52:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 26 Aug 2009 21:52:13 -0500 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: On Aug 26, 2009, at 4:16 PM, Peter wrote: > It is looking much better than yesterday - nice work :) > However, there are a few rough edges still. Not unexpected, actually. > =========================== > Evil wrapping > =========================== > Chris - Did you get the zip file of FASTQ examples I sent off list? > One of > these was the evil_wrapping.fastq file already in Biopython CVS/git > (under > a new name). This is intended as a real torture test, with line > wrapped > quality strings where plenty of the lines start with "+" or "@" > characters. > Bioperl doesn't like this file at all - but I have not dug into why. Now fixed; I've saved this as very_tricky.fastq, but it's the same file. > =========================== > Sanger To Illumina 1.3+ > =========================== > When mapping a Sanger FASTQ file with very high scores to Illumina, > these don't get the maximum value imposes (ASCII 126, tidle). e.g. ... Yes, I know where that one is going wrong. Fixed now for bounds for the above. Partly related to the below. > =========================== > Sanger To Solexa > =========================== > Likewise when mapping a Sanger FASTQ file with very high scores to > Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, > tidle). For example, > > $ ./biopython_sanger2solexa < sanger_93.fastq > /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764: > UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ > warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ") > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFECB@>;; > > But, > > $ ./bioperl_sanger2solexa < sanger_93.fastq > > --------------------- WARNING --------------------- > MSG: Quality values not found for > solexa: > 0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 > --------------------------------------------------- > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ > [ZYXWVUTSRQPONMLKJHGFEDB@><< > > i.e. You've mapped the high value scores to "<", ASCII 60, thus > Solexa -4 > (an odd thing to happen - getting the lowest score wouldn't surprise > me so > much). This one is fixed, it was the same bounding issue as above. > Furthermore, notice that PHRED scores 0 and 1 have both been mapped > to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning > Solexa -5. The two conversions to solexa are still failing. I'm not sure but I think it's something fairly simple, but I can't work on it until Friday (got too many other things on my plate ATM). If I get stumped I'll post a message. > =========================== > > Still, things are looking up :) > > Peter Yes they are, much more so that previously. I'll add these to the tests. chris From tuco at pasteur.fr Thu Aug 27 08:28:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Thu, 27 Aug 2009 10:28:41 +0200 Subject: [Bioperl-l] Exception thrown with Bio::Restriction::Analysis In-Reply-To: References: <4A954DCC.4050200@pasteur.fr> Message-ID: <4A9643B9.7000709@pasteur.fr> Mark A. Jensen wrote: > Hi Emmanuel-- > This may be fixed in the latest version of Bio::Restriction, which is not > available in the standard 1.6 distribution. I suggest you try replacing the > Bio/Restriction directory in your distribution with the current > bioperl-live > modules. You can get these by using Subversion: > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/trunk/Bio/Restriction > ./Restriction Hi Mark, Thanks for pointing me to this svn repo. I've just updated the Bio::Restriction::* part just to test it. I don't get any error anymore. I just need to continue working on this with my ideas. I'll let you know if I encounter any other problem. Cheers Emmanuel > > If you're brave, better might be to obtain the latest trunk and reinstall; > > $ svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > $ cd bioperl-live > $ perl Build.PL > $ ./Build > $ ./Build test > $ ./Build install > > Please update the list with your progress- > cheers > Mark >> -- >> ------------------------- >> Emmanuel Quevillon >> Biological Software and Databases Group >> Institut Pasteur >> +33 1 44 38 95 98 >> tuco at_ pasteur dot fr >> ------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From dan.bolser at gmail.com Thu Aug 27 10:34:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 27 Aug 2009 11:34:00 +0100 Subject: [Bioperl-l] $wgEnableMWSuggest on the wiki please? In-Reply-To: References: <2c8757af0908250816g48ae9dc6mf6e64c2f122e602@mail.gmail.com> Message-ID: <2c8757af0908270334kcb3dfc4w17553e65f7e0e4b5@mail.gmail.com> 2009/8/25 Jason Stajich : > Can you send sysadmin request mail to the helpdesk - support at open-bio.org?so > mauricio or someone can have it in the queue. OK. > [aside] > I've had to stop doing OBF sysadmin work so we are definitely looking for > someone to help with the ALL VOLUNTEER team of now just Mauricio and Chris > Dagdigian who do mediawiki and sysadmin support. > > We've reached a bit of crunch where there are lots of things to tweak and > customize for the various flavors of MW installs that the projects want but > we don't have enough dedicated admins to really support this. ?Most of us I know how you feel! > have gotten into these projects to support our own bioinformatics > programming not sysadmin tasks so there is a bit of gap here. Some of us > (me) were not trained as sysadmin but jumped in and figured out how to help > and do it - and learned valuable life skills... =) > > We're discussing plans to upgrade the machines in the future which would > improve performance and reliability we hope and also use this opportunity to > streamline the MW installs to be a more easily maintained wikifarm. Sounds like a good idea. There are also extensions that put more of the MW config on the website itself (restricted to admins of course). Dan. From hsa_rim at yahoo.co.in Thu Aug 27 11:14:03 2009 From: hsa_rim at yahoo.co.in (shafeeq rim) Date: Thu, 27 Aug 2009 16:44:03 +0530 (IST) Subject: [Bioperl-l] Mapping of genome with cytoband Message-ID: <29549.68962.qm@web94610.mail.in2.yahoo.com> Hi, I need gene , mrna , cds , sts and exon files as per the mapping with cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and .gbk files but the genes and other features are not coming across the whole chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from .gbk / .gbs files the locations on chromosome 1 genes show only half way on the ideogram graph. Thanks See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ From biopython at maubp.freeserve.co.uk Thu Aug 27 11:55:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 27 Aug 2009 12:55:55 +0100 Subject: [Bioperl-l] Next-Gen and the next point release - updates In-Reply-To: References: <320fb6e00908261416p666b7ab7w8174eb5a48f38c61@mail.gmail.com> Message-ID: <320fb6e00908270455y2a80907chfae8007df60e72e2@mail.gmail.com> On Thu, Aug 27, 2009 at 3:52 AM, Chris Fields wrote: > > On Aug 26, 2009, at 4:16 PM, Peter wrote: > >> It is looking much better than yesterday - nice work :) >> However, there are a few rough edges still. > > Not unexpected, actually. > >> =========================== >> Evil wrapping >> =========================== >> Chris - Did you get the zip file of FASTQ examples I sent off list? One of >> these was the evil_wrapping.fastq file already in Biopython CVS/git (under >> a new name). This is intended as a real torture test, with line wrapped >> quality strings where plenty of the lines start with "+" or "@" >> characters. >> Bioperl doesn't like this file at all - but I have not dug into why. > > Now fixed; I've saved this as very_tricky.fastq, but it's the same file. Looks good. >> =========================== >> Sanger To Illumina 1.3+ >> =========================== >> When mapping a Sanger FASTQ file with very high scores to Illumina, >> these don't get the maximum value imposes (ASCII 126, tidle). e.g. > > ... > > Yes, I know where that one is going wrong. ?Fixed now for bounds for the > above. ?Partly related to the below. Looks good. >> =========================== >> Sanger To Solexa >> =========================== >> Likewise when mapping a Sanger FASTQ file with very high scores to >> Solexa FASTQ, these don't get the maximum value imposes (ASCII 126, >> tidle). For example, >> ... >> i.e. You've mapped the high value scores to "<", ASCII 60, thus Solexa -4 >> (an odd thing to happen - getting the lowest score wouldn't surprise me so >> much). > > This one is fixed, it was the same bounding issue as above. Yes, the high score truncation looks good. >> Furthermore, notice that PHRED scores 0 and 1 have both been mapped >> to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning Solexa -5. > > The two conversions to solexa are still failing. ?I'm not sure but I think > it's something fairly simple, but I can't work on it until Friday (got too > many other things on my plate ATM). ?If I get stumped I'll post a message. Actually it's not just PHRED 0 and 1 that look wrong, all of the low scores are messed up. I could repeat this using the sanger_93.fastq file, but to avoid email line wrapping here I'm using a smaller example file with PHRED scores in the range 40 to 0 only: $ cat sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! Biopython: $ python ./biopython_sanger2solexa.py < sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFECB@>;; BioPerl SVN (with Chris' latest fixes): $ ./bioperl_sanger2solexa.pl < sanger_faked.fastq --------------------- WARNING --------------------- MSG: Data loss for solexa: following values exceed max 62 0 --------------------------------------------------- @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFDCA?=~ The last ten characters are wrong (i.e. PHRED score 0 to 9, which is precisely the range where the PHRED/Solexa mapping is non trivial). Also note that data loss warning is misleading (0 is less than 62). Plus you get the exactly same problems with Illumina to Solexa. This should narrow it down - the bug is in mapping PHRED scores (from either Sanger or Illumina 1.3+ files) to the Solexa encoding. Peter From sanjaysingh765 at gmail.com Thu Aug 27 13:59:13 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Thu, 27 Aug 2009 19:29:13 +0530 Subject: [Bioperl-l] query about libwww-perl collection Message-ID: hello, i want to use libwww-perl collection to query BLINK with multiple queries. it works in very good way for single but how can i used it for multiple queries...lz help me out regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From bosborne11 at verizon.net Thu Aug 27 15:10:30 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 11:10:30 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: > > ----- Original Message ----- From: "Hilmar Lapp" > ... >> As for the FASTA example, I can understand - I've heard repeatedly >> from people that one of the things that they are missing is >> documentation for every SeqIO format we support (such as GenBank, >> UniProt, FASTA, etc) about where to find a particular piece of the >> format in the object model. > .... > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help create > our list of action items. > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Aug 27 17:38:45 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:38:45 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations Message-ID: <4A96C4A5.9090406@cornell.edu> Hi all, Recently a user came into #bioperl looking to truncate an annotated sequence (leaving the region between e.g. 150 to 250 nt), and have the annotations from the original sequence be remapped onto the new truncated sequence. Poking through code, I came across an undocumented function trunc() that from the comments looks like it was written by Jason as part of a master plan to implement this very functionality. Just wondering, what's the status of that? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rmb32 at cornell.edu Thu Aug 27 17:40:41 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 27 Aug 2009 10:40:41 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C4A5.9090406@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> Message-ID: <4A96C519.3020001@cornell.edu> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 Rob Robert Buels wrote: > Hi all, > > Recently a user came into #bioperl looking to truncate an annotated > sequence (leaving the region between e.g. 150 to 250 nt), and have the > annotations from the original sequence be remapped onto the new > truncated sequence. > > Poking through code, I came across an undocumented function trunc() that > from the comments looks like it was written by Jason as part of a master > plan to implement this very functionality. > > Just wondering, what's the status of that? > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 27 18:20:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 13:20:42 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <4A96C519.3020001@cornell.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> Message-ID: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> It's not implemented completely. As Jason mentioned in the bug report, it was meant to be part of an overall system to truncate sequences with remapped features, but the implementation in place is substandard. It's open for implementation if anyone wants to take it up. I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal with this in a more elegant and lightweight way, and is probably the direction I would take. YMMV. chris On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 > > Rob > > Robert Buels wrote: >> Hi all, >> Recently a user came into #bioperl looking to truncate an annotated >> sequence (leaving the region between e.g. 150 to 250 nt), and have >> the annotations from the original sequence be remapped onto the new >> truncated sequence. >> Poking through code, I came across an undocumented function trunc() >> that from the comments looks like it was written by Jason as part >> of a master plan to implement this very functionality. >> Just wondering, what's the status of that? >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Aug 27 18:41:28 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 11:41:28 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> Message-ID: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Yeah one thought that we batted around at a hackathon many moons ago had been to use Bio::DB::SeqFeature in a lightweight way under the hood to represent sequences in layers more rather than the arbitrary data model that is setup by focusing on handling GenBank records. A lot of the architecture development (that is like 10-15 years old now!) was initially just focused on round-tripping the sequence files. We more recently felt like a new model was more appropriate. With the fast SQLite implementation that Lincoln has put in for DB::SeqFeature we could in theory map every sequence into a SQLite DB and then have the power of the interface. Some more bells and whistles might be needed but the basic API is respected AFAIK and it prevents needing to store whole sequences in memory. The SeqIO->DB::SeqFeature loading would need some finessing so that as parsed the sequence object could be updated efficiently. Actually this might also help reduce the number of objects needed to be created by basically efficiently serializing sequences into the DB on parsing (and with some simple caching this could make for pretty fast system). Since disk is basically not a limitation now could be an interesting experiment? Maybe it is too out there, but if not it could be something major enough that it has to go in a bioperl-2/ bioperl-ng. It sort of assumes the data model of Bio::DB::SeqFeature is adequate for all the messiness of sequence data formats and one problem for some people has been the seq file format => GFF in order to load it into a SeqFeature DB for Gbrowse... So I don't know what are the boundary cases here. Certainly for FASTA it should be straightforward. -jason On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > It's not implemented completely. As Jason mentioned in the bug > report, it was meant to be part of an overall system to truncate > sequences with remapped features, but the implementation in place is > substandard. It's open for implementation if anyone wants to take > it up. > > I should point out, though, in my opinion Bio::DB::GFF/SeqFeature > deal with this in a more elegant and lightweight way, and is > probably the direction I would take. YMMV. > > chris > > On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: > >> Looks like bug 1572 is related to this: http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >> >> Rob >> >> Robert Buels wrote: >>> Hi all, >>> Recently a user came into #bioperl looking to truncate an >>> annotated sequence (leaving the region between e.g. 150 to 250 >>> nt), and have the annotations from the original sequence be >>> remapped onto the new truncated sequence. >>> Poking through code, I came across an undocumented function >>> trunc() that from the comments looks like it was written by Jason >>> as part of a master plan to implement this very functionality. >>> Just wondering, what's the status of that? >>> Rob >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From lsbrath at gmail.com Thu Aug 27 19:04:36 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 27 Aug 2009 15:04:36 -0400 Subject: [Bioperl-l] rendering the 5' & 3' UTR in a graphic Message-ID: <69367b8f0908271204p7f153be1p6673faac931b646d@mail.gmail.com> Hello, I am able to render all of the features except the 5' & 3' UTR. This is how the features part of the Genbank file looks: FEATURES Location/Qualifiers source 1..185000 /note="locus_tag=Nbl1" /organism="Mus musculus" gene 142646..153328 /note="locus_tag=Nbl1" /gene="ENSMUSG00000041120" /note="neuroblastoma, suppression of tumorigenicity 1 [Source:MGI;Acc:MGI:104591]" 5'UTR 142646..150000 /note="Nbl1" mRNA join(142646..142794,149973..150167,150269..150380, 152019..153328) /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" CDS join(150001..150167,150269..150380,152019..152276) /db_xref="CCDS:CCDS18839.1" /db_xref="MGI:Nbl1" /db_xref="Vega_mouse_transcript:OTTMUST00000022949" /protein_id="ENSMUSP00000045608" /gene="ENSMUSG00000041120" /note="transcript_id=ENSMUST00000042844" misc_feature 150001..152276 /note="deletion" 3'UTR 152277..153328 /gene="Nbl1" ORIGIN - 1 GACCAGAGCC ACTCGCTAGG AGTCACACCG AGCCTGGGGG TCCGAAGGGA ACAGCATCAA He is the code: # file: embl2picture.pl # This is code example 6 in the Graphics-HOWTO # Author: Lincoln Stein use strict; #use lib "$ENV{HOME}/projects/bioperl-live"; use Bio::Graphics; use Bio::SeqIO; use constant USAGE =>< Render a GenBank/EMBL entry into drawable form. Return as a GIF or PNG image on standard output. File must be in embl, genbank, or another SeqIO- recognized format. Only the first entry will be rendered. Example to try: embl2picture.pl factor7.embl | display - END my $file = shift or die USAGE; my $io = Bio::SeqIO->new(-file=>$file) or die USAGE; my $seq = $io->next_seq or die USAGE; my $wholeseq = Bio::SeqFeature::Generic->new( -start => 1, -end => $seq->length, -display_name => $seq->display_name ); # script reads the features from the sequence object by calling all_SeqFeatures() my @features = $seq->all_SeqFeatures; # sorts each feature by its primary tag into a hash # of array references named %sorted_features my %sorted_features; my %want = map {$_ =>1} qw/source CDS gene utr5prime utr3prime mRNA misc_feature/; for my $f (@features) { #get cds, primer_bind, and genes features only my $tag = $f->primary_tag; # create a hash of $f keys and $tag values #push @{$sorted_features{$tag}},$f if ($tag =~ /CDS|gene|mRNA|source|misc_feature|5'UTR|3'UTR/); push @{$sorted_features{$tag}},$f if ($want{$tag}); } # we create the Bio::Graphics::Panel object. # As in previous examples, we specify the width of the image, # as well as some extra white space to pad out the left and right borders. my $panel = Bio::Graphics::Panel->new( -length => $seq->length, -key_style => 'between', -width => 400, -pad_left => 10, -pad_right => 10, ); # We now add two tracks, one for the scale # and the other for the sequence as a whole. $panel->add_track($wholeseq, -glyph => 'arrow', -bump => 0, -double => 1, -tick => 2, -bgcolor => 'blue', -label => 1, ); =cut $panel->add_track($wholeseq, -glyph => 'generic', -bgcolor => 'blue', -label => 1, ); =cut # Locate primary tag of "CDS" and create a track using a glyph # at creation time. After we handle this special case, we remove # the CDS feature type from the %sorted_features associative array. if ($sorted_features{CDS}) { $panel->add_track($sorted_features{CDS}, -glyph => 'transcript2', -bgcolor => 'orange', -fgcolor => 'black', -font2color => 'red', -key => 'CDS', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'CDS'}; } # Locate primary tag of "mRNA" and create a track using a glyph # at creation time. After we handle this special case, we remove # the mRNA feature type from the %sorted_features associative array. if ($sorted_features{mRNA}) { $panel->add_track($sorted_features{mRNA}, -glyph => 'transcript2', -bgcolor => 'red', -fgcolor => 'black', -font2color => 'red', -key => 'mRNA', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'mRNA'}; } #=cut # Locate primary tag of "5'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 5'UTR feature type from the %sorted_features associative array. if ($sorted_features{utr5prime}) { $panel->add_track($sorted_features{utr5prime}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => 'utr5prime', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{utr5prime}; } =cut # Locate primary tag of "3'UTR" and create a track using a glyph # at creation time. After we handle this special case, we remove # the 3'UTR feature type from the %sorted_features associative array. if ($sorted_features{3\'UTR}) { $panel->add_track($sorted_features{'3\'UTR'}, -glyph => 'transcript2', -bgcolor => '', -fgcolor => 'black', -font2color => 'red', -key => '3\'UTR', -bump => +1, -height => 12, -label => \&gene_label, -description => \&gene_description, ); delete $sorted_features{'3\'UTR'}; } =cut # general case # Create a track for each feature type. In order to distinguish the tracks by color, # we initialize an array of 9 color names and simply cycle through them my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua); my $idx = 0; for my $tag (sort keys %sorted_features) { my $features = $sorted_features{$tag}; $panel->add_track($features, -glyph => 'generic', -bgcolor => $colors[$idx++ % @colors], -fgcolor => 'black', -font2color => 'red', -key => "${tag}s", -bump => +1, -height => 8, # -description option to point to a subroutine # that will generate more informative description strings. -description => \&generic_description, ); } binmode(STDOUT); print $panel->png; exit 0; sub gene_label { my $feature = shift; my @notes; foreach (qw(product gene)) { @notes = eval {$feature->get_tag_values($_)}; last; } $notes[0]; } sub gene_description { my $feature = shift; my @notes; foreach (qw(note)) { # Notice that we place calls to get_tag_values() inside eval{} blocks # in order to avoid having an exception raised if the feature does not # have a tag with the desired value. @notes = eval{$feature->get_tag_values($_)}; last; } return unless @notes; substr($notes[0],30) = '...' if length $notes[0] > 30; $notes[0]; } sub generic_description { my $feature = shift; my $description; foreach ($feature->get_all_tags) { my @values = $feature->get_tag_values($_); $description .= $_ eq 'note' ? "@values" : "$_=@values; "; } $description =~ s/; $//; # get rid of last $description; } sub fp_utr{ my $five_prime_utr = '5\'UTR'; return $five_prime_utr; } This is how the image currently looks: Any ideas why I am unable to render the 5' & 3' UTR features? From jorvis at gmail.com Thu Aug 27 19:23:05 2009 From: jorvis at gmail.com (Joshua Orvis) Date: Thu, 27 Aug 2009 15:23:05 -0400 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: I should weigh in here since I am the above-mentioned 'user' who posed the question in #bioperl. To clarify, to train one particular gene finder I need to take a full genbank file with annotation for a whole genome and create separate gbk records, one for each gene. Each record will then contain the gene, exon coordinates for the CDS and sequence for the gene. I can iterate through the features of the full record and do the math myself for each spliced coordinate, making/writing individual records as I go, but thought I would see if BioPerl had any mechanism to extract a region of an annotated record and treat the starting base of that extraction as position 1, recoordinating all the other features that were present. Then I could just iterate through the features of the whole entry, extracting regions for each gene as I see them. Hopefully this makes sense. Joshua On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > > Yeah one thought that we batted around at a hackathon many moons ago had > been to use Bio::DB::SeqFeature in a lightweight way under the hood to > represent sequences in layers more rather than the arbitrary data model that > is setup by focusing on handling GenBank records. A lot of the architecture > development (that is like 10-15 years old now!) was initially just focused > on round-tripping the sequence files. We more recently felt like a new model > was more appropriate. With the fast SQLite implementation that Lincoln has > put in for DB::SeqFeature we could in theory map every sequence into a > SQLite DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is respected > AFAIK and it prevents needing to store whole sequences in memory. The > SeqIO->DB::SeqFeature loading would need some finessing so that as parsed > the sequence object could be updated efficiently. > > Actually this might also help reduce the number of objects needed to be > created by basically efficiently serializing sequences into the DB on > parsing (and with some simple caching this could make for pretty fast > system). Since disk is basically not a limitation now could be an > interesting experiment? Maybe it is too out there, but if not it could be > something major enough that it has to go in a bioperl-2/bioperl-ng. It > sort of assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some people has > been the seq file format => GFF in order to load it into a SeqFeature DB for > Gbrowse... So I don't know what are the boundary cases here. Certainly for > FASTA it should be straightforward. > > -jason > > On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: > > It's not implemented completely. As Jason mentioned in the bug report, it >> was meant to be part of an overall system to truncate sequences with >> remapped features, but the implementation in place is substandard. It's >> open for implementation if anyone wants to take it up. >> >> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >> with this in a more elegant and lightweight way, and is probably the >> direction I would take. YMMV. >> >> chris >> >> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >> >> Looks like bug 1572 is related to this: >>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>> >>> Rob >>> >>> Robert Buels wrote: >>> >>>> Hi all, >>>> Recently a user came into #bioperl looking to truncate an annotated >>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>> annotations from the original sequence be remapped onto the new truncated >>>> sequence. >>>> Poking through code, I came across an undocumented function trunc() that >>>> from the comments looks like it was written by Jason as part of a master >>>> plan to implement this very functionality. >>>> Just wondering, what's the status of that? >>>> Rob >>>> >>> >>> >>> -- >>> Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Aug 27 20:00:24 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 27 Aug 2009 13:00:24 -0700 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: So when I did this for the retraining of AUGUSTUS I loaded all my gene models in Bio::DB::GFF as GFF3 and then just extracted each locus I needed +/- some surrounding sequence context and wrote it out as genbank file. There might have been one or two problems collapsing the features back into Genbank's concept of a CDS as a single-feature rather than individual, but I just make a split-location and added the sub-pieces to it. It was only a few lines of code to do it right - the flatten/unflatten being one of the most annoying parts maybe we could work out to streamline. -jason On Aug 27, 2009, at 12:23 PM, Joshua Orvis wrote: > I should weigh in here since I am the above-mentioned 'user' who > posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate > gbk > records, one for each gene. Each record will then contain the gene, > exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the > math myself > for each spliced coordinate, making/writing individual records as I > go, but > thought I would see if BioPerl had any mechanism to extract a region > of an > annotated record and treat the starting base of that extraction as > position > 1, recoordinating all the other features that were present. Then I > could > just iterate through the features of the whole entry, extracting > regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich > wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons >> ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood >> to >> represent sequences in layers more rather than the arbitrary data >> model that >> is setup by focusing on handling GenBank records. A lot of the >> architecture >> development (that is like 10-15 years old now!) was initially just >> focused >> on round-tripping the sequence files. We more recently felt like a >> new model >> was more appropriate. With the fast SQLite implementation that >> Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence >> into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is >> respected >> AFAIK and it prevents needing to store whole sequences in memory. >> The >> SeqIO->DB::SeqFeature loading would need some finessing so that as >> parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed >> to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). Since disk is basically not a limitation now could be an >> interesting experiment? Maybe it is too out there, but if not it >> could be >> something major enough that it has to go in a bioperl-2/bioperl- >> ng. It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate >> for all >> the messiness of sequence data formats and one problem for some >> people has >> been the seq file format => GFF in order to load it into a >> SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. >> Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> It's not implemented completely. As Jason mentioned in the bug >> report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is >>> substandard. It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature >>> deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an >>>>> annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and >>>>> have the >>>>> annotations from the original sequence be remapped onto the new >>>>> truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function >>>>> trunc() that >>>>> from the comments looks like it was written by Jason as part of >>>>> a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY 14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Thu Aug 27 20:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 27 Aug 2009 15:19:56 -0500 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: On Aug 27, 2009, at 1:41 PM, Jason Stajich wrote: > Yeah one thought that we batted around at a hackathon many moons ago > had been to use Bio::DB::SeqFeature in a lightweight way under the > hood to represent sequences in layers more rather than the arbitrary > data model that is setup by focusing on handling GenBank records. A > lot of the architecture development (that is like 10-15 years old > now!) was initially just focused on round-tripping the sequence > files. We more recently felt like a new model was more appropriate. > With the fast SQLite implementation that Lincoln has put in for > DB::SeqFeature we could in theory map every sequence into a SQLite > DB and then have the power of the interface. > > Some more bells and whistles might be needed but the basic API is > respected AFAIK and it prevents needing to store whole sequences in > memory. The SeqIO->DB::SeqFeature loading would need some finessing > so that as parsed the sequence object could be updated efficiently. Exactly my thought. Probably worth pushing the FeatureHolderI interface into something like a SeqFeature::Collection. What about annotation? Maybe add that to the 'source' feature? Also makes me think Seq needs to be RangeI (or potentially locatable to another sequence). Bio::DB::SF::Segment is. I'm thinking the old way of doing it (parsing a file) is still possible, but underneath would be an Bio::Index or similar, and the returned Bio::Seq would have a backend Bio::Index/ Bio::SeqFeature::Collection database (the latter maybe being lazily implemented). > Actually this might also help reduce the number of objects needed to > be created by basically efficiently serializing sequences into the > DB on parsing (and with some simple caching this could make for > pretty fast system). Since disk is basically not a limitation now > could be an interesting experiment? Yes. > Maybe it is too out there, but if not it could be something major > enough that it has to go in a bioperl-2/bioperl-ng. It sort of > assumes the data model of Bio::DB::SeqFeature is adequate for all > the messiness of sequence data formats and one problem for some > people has been the seq file format => GFF in order to load it into > a SeqFeature DB for Gbrowse... So I don't know what are the boundary > cases here. Certainly for FASTA it should be straightforward. > > -jason Well, one could possibly test something like this on a branch, or with their own Bio::Seq, or in Biome ;> Just sayin'.... chris From maj at fortinbras.us Fri Aug 28 00:58:34 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 27 Aug 2009 20:58:34 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> Message-ID: <4C2E185C74CF449495BC8FDC26419702@NewLife> Thanks Brian; these are really valuable insights and suggestions. Of course, the "todo list" is not "mine", but the community's (otherwise, I would have used Post-its), and I have added your action items to it. My thinking about a survey is twofold. Intermittent users may, likely will, have different issues than the usual suspects here on the list, or they will put those issues in a different way--likely with more expression of affect, which I personally think is key. It seems to me that documentation is the public face of this project, and hearing visceral reactions from "the public" will help us (or me) prioritize. The other fold is, this kind of data is better acquired a) actively, rather than passively ("Please respond to this thread") and b) anonymously. Obviously, it can't be active in the sense of spamming, but we could reduce the energy barrier by providing something clickable with a few textboxes to the list. cheers MAJ ----- Original Message ----- From: Brian Osborne To: Mark A. Jensen Cc: BioPerl List ; Chris Fields Sent: Thursday, August 27, 2009 11:10 AM Subject: Re: [Bioperl-l] on BP documentation Mark, Sorry, I'm a bit late here. I took a look at the Documentation Project page, it is well-reasoned. However, I didn't see any list of action items there. You do talk at the end of about soliciting comments, and you've already done this, and a user survey. A survey is not necessary, the issues are well understood already. More to the point, you understand them and just as in coding, "the one doing the work wins the argument". Here's what my own list of action items would look like: - Merge FAQ and Scrapbook -- FAQ is unused or underused and contains code snippets -- Too much information or too many sections is as bad as too little - Write Align/AlignIO HOWTO -- This is the "missing HOWTO" - Use Dobfuscator links to reveal method documentation -- Most notably in SeqIO HOWTO -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it seems to work but I've heard a rumor... - Condense and streamline installation documents -- Remove outdated -- Still too many pages and too much text -- There are incorrectly labelled links taking you to the wrong place -- Remove any text or page that duplicates information in an INSTALL file, link to this file instead - Seriously prune the Main Page -- Wiki's encourage a proliferation of pages and links, the Main Page is a great example of far too much information -- Remove many redundant or little used links -- Try to prettify, in any way possible - we have created, sadly, the world's ugliest Main Page! - Revise the SeqIO HOWTO -- The first HOWTO, and it looks like it -- Link this HOWTO to the all the Format pages (Category:Formats) - Feature-Annotation HOWTO -- Write script that annotates every single SeqIO format, showing where each bit of text ends up -- This script runs automatically when you open the HOWTO or click its link, always up-to-date -- Probably trickier than I think! - The "Random Page" exercise -- Spend some time clicking this link, you will certainly find things to merge and delete. You will also find nice documentation that you didn't know existed and is probably never read! The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". Again, one starting page for every single reasonably general question, like "See the Installation page". Not "Starting on the Main Page you could click on Getting Bioperl or Getting Started or Quick Start or Installing Bioperl or Installation or Downloads or ...." (you get the idea). Brian O. On Aug 15, 2009, at 3:53 PM, Mark A. Jensen wrote: ----- Original Message ----- From: "Hilmar Lapp" ... As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Aug 28 02:00:01 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 27 Aug 2009 22:00:01 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <4C2E185C74CF449495BC8FDC26419702@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <62D1EBDA-E69E-4655-A1F2-86D9DC1E86BD@verizon.net> <4C2E185C74CF449495BC8FDC26419702@NewLife> Message-ID: <047387CF-C3AD-4E2E-8FB8-091AB23D5FEE@verizon.net> Mark, As you wish. As I said, the one who does the work calls the shots, this is not a democracy. The fundamental problem is, and I speak with some experience here, that detailed examination of documentation is of so little interest that participation in the survey will be limited ("the usual suspects"), and the results will be skewed. You're not going to get reactions from "the public", the thousands of Bioperl users. But, if you feel comfortable with the notion that a survey will justify your actions, do it. But honestly, I know that you already know what to do. Brian O. On Aug 27, 2009, at 8:58 PM, Mark A. Jensen wrote: > My thinking about a survey is twofold. Intermittent users may, > likely will, have different issues than the usual suspects here on > the list, or they will put those issues in a different way--likely > with more expression of affect, which I personally think is key. It > seems to me that documentation is the public face of this project, > and hearing visceral reactions from "the public" will help us (or > me) prioritize. The other fold is, this kind of data is better > acquired a) actively, rather than passively ("Please respond to this > thread") and b) anonymously. Obviously, it can't be active in the > sense of spamming, but we could reduce the energy barrier by > providing something clickable with a few textboxes to the list. From David.Messina at sbc.su.se Fri Aug 28 08:40:47 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 28 Aug 2009 10:40:47 +0200 Subject: [Bioperl-l] on BP documentation Message-ID: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> > - Use Dobfuscator links to reveal method documentation > -- Most notably in SeqIO HOWTO Do you mean to click on a method name in a HOWTO and open up the Deobfuscator view of that method's documentation? I like that. > -- Does Deobfuscator have a bug or two that need to be fixed? I use > it, it seems to work but I've heard a rumor... It's true -- sometimes the Deobfuscator claims that a method isn't documented when it is. Mark, I can commit to fixing this. It's long overdue, so I'm happy to use your doc push as an impetus. Dave From maj at fortinbras.us Fri Aug 28 11:31:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 28 Aug 2009 07:31:05 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> References: <3AA817F4-20B4-4041-BFAD-E19B792D5D13@sbc.su.se> Message-ID: Dave-- thanks for stepping up- MAJ ----- Original Message ----- From: "Dave Messina" To: "Brian Osborne" Cc: "Mark A. Jensen" ; "BioPerl List" ; "Chris Fields" Sent: Friday, August 28, 2009 4:40 AM Subject: Re: [Bioperl-l] on BP documentation > >> - Use Dobfuscator links to reveal method documentation >> -- Most notably in SeqIO HOWTO > > Do you mean to click on a method name in a HOWTO and open up the Deobfuscator > view of that method's documentation? I like that. > > >> -- Does Deobfuscator have a bug or two that need to be fixed? I use it, it >> seems to work but I've heard a rumor... > > It's true -- sometimes the Deobfuscator claims that a method isn't documented > when it is. > > Mark, I can commit to fixing this. It's long overdue, so I'm happy to use > your doc push as an impetus. > > > Dave > > > From fgarret at ub.edu Fri Aug 28 16:37:54 2009 From: fgarret at ub.edu (Filipe Garrett) Date: Fri, 28 Aug 2009 18:37:54 +0200 Subject: [Bioperl-l] splice alignment Message-ID: <4A9807E2.4080608@ub.edu> Hi all, I need to analyse the 1st, 2nd and 3rd positions of an alignment separately. I've been through BioPerl pages but couldn't find no direct way to do it. The closest I fond was "slice" (AlignI) but it just extracts a contiguous subsequence. Is there any subroutine that does the job? Or maybe a more generic one, so we can select the columns to be extracted; eg: @aln_pos = qw/1,4,7,10,13,14,17,20/; $aln_1 = $aln->get_pos(@aln_pos); thanks in adv, FG -- Filipe G. Vieira Departament de Genetica Universitat de Barcelona Av. Diagonal, 645 08028 Barcelona SPAIN Phone: +34 934 035 306 Fax: +34 934 034 420 fgarret at ub.edu http://www.ub.edu/molevol/ From mmorley at mail.med.upenn.edu Fri Aug 28 21:18:28 2009 From: mmorley at mail.med.upenn.edu (Michael Morley) Date: Fri, 28 Aug 2009 17:18:28 -0400 Subject: [Bioperl-l] How to plot coverage using Bio::DB::Sam and Bio::Graphics? Message-ID: <4A9849A4.7060702@mail.med.upenn.edu> Have a few questions some perhaps too simple which I know I should have been able to find the answers but have eluded me. Problem: What I want to do visualize coverage (Illumina RNA-seq) across a gene for 40 or so samples. I thought about gbrowse but what I was hoping to was to use Bio::Graphics and created a few PNGs of the genes I'm interested in, nothing too fancy. My current attempt: So I've used Bio::DB::Sam (thank you LDS!!,great package) as following.. Works perfect. my $features = $sam->features(-type=>'coverage',-seq_id=>$chrom,-start=>$genomest,-end=>$genomest); Then I tried this: $panel->add_track($features, -glyph => 'xyplot', -graph_type=>'histogram', ); After poking at the return of '-type=converge', I don't think this is possible directly but any ideas how I can do it? The coverage is too deep in the region to plot every sequence in the alignment, I was able to do it just was not useful. One last question.. I also would like to plot the gene model as well. If I simply grab the genbank file for refseq NM###, the features only have exon,cds,etc and coordinates based off the mRNA seq. So how does one get the genomic info and then create the track for a gene/transcript as you would see in gbrowse? Any help I'd greatly appreciate it! -Michael From roy.chaudhuri at gmail.com Sat Aug 29 13:22:53 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Sat, 29 Aug 2009 23:22:53 +1000 Subject: [Bioperl-l] truncating a sequence and remapping annotations In-Reply-To: References: <4A96C4A5.9090406@cornell.edu> <4A96C519.3020001@cornell.edu> <8BF593F8-EF5A-4675-81BB-F7A22401A29C@illinois.edu> <433B7A51-9B93-43A0-AC8D-3D1C01F8995B@bioperl.org> Message-ID: <1372eece0908290622mc21f297w503225242d82ada9@mail.gmail.com> Hi Joshua, A couple of years ago I did implement (in a fairly hacky way) a trunc_with_features method that does exactly this. It was incorporated into Bio::SeqUtils and is still there as far as I know. Maybe it would be suitable for your purposes? Roy. 2009/8/28 Joshua Orvis : > I should weigh in here since I am the above-mentioned 'user' who posed the > question in #bioperl. > > To clarify, to train one particular gene finder I need to take a full > genbank file with annotation for a whole genome and create separate gbk > records, one for each gene. ?Each record will then contain the gene, exon > coordinates for the CDS and sequence for the gene. > > I can iterate through the features of the full record and do the math myself > for each spliced coordinate, making/writing individual records as I go, but > thought I would see if BioPerl had any mechanism to extract a region of an > annotated record and treat the starting base of that extraction as position > 1, recoordinating all the other features that were present. ?Then I could > just iterate through the features of the whole entry, extracting regions for > each gene as I see them. > > Hopefully this makes sense. > > Joshua > > On Thu, Aug 27, 2009 at 2:41 PM, Jason Stajich wrote: > >> >> Yeah one thought that we batted around at a hackathon many moons ago had >> been to use Bio::DB::SeqFeature in a lightweight way under the hood to >> represent sequences in layers more rather than the arbitrary data model that >> is setup by focusing on handling GenBank records. ?A lot of the architecture >> development (that is like 10-15 years old now!) was initially just focused >> on round-tripping the sequence files. We more recently felt like a new model >> was more appropriate. ?With the fast SQLite implementation that Lincoln has >> put in for DB::SeqFeature we could in theory map every sequence into a >> SQLite DB and then have the power of the interface. >> >> Some more bells and whistles might be needed but the basic API is respected >> AFAIK and it prevents needing to store whole sequences in memory. ?The >> SeqIO->DB::SeqFeature loading would need some finessing so that as parsed >> the sequence object could be updated efficiently. >> >> Actually this might also help reduce the number of objects needed to be >> created by basically efficiently serializing sequences into the DB on >> parsing (and with some simple caching this could make for pretty fast >> system). ?Since disk is basically not a limitation now could be an >> interesting experiment? ?Maybe it is too out there, but if not it could be >> something major enough that it has to go in a bioperl-2/bioperl-ng. ? It >> sort of assumes the data model of Bio::DB::SeqFeature is adequate for all >> the messiness of sequence data formats and one problem for some people has >> been the seq file format => GFF in order to load it into a SeqFeature DB for >> Gbrowse... So I don't know what are the boundary cases here. ?Certainly for >> FASTA it should be straightforward. >> >> -jason >> >> On Aug 27, 2009, at 11:20 AM, Chris Fields wrote: >> >> ?It's not implemented completely. ?As Jason mentioned in the bug report, it >>> was meant to be part of an overall system to truncate sequences with >>> remapped features, but the implementation in place is substandard. ?It's >>> open for implementation if anyone wants to take it up. >>> >>> I should point out, though, in my opinion Bio::DB::GFF/SeqFeature deal >>> with this in a more elegant and lightweight way, and is probably the >>> direction I would take. ?YMMV. >>> >>> chris >>> >>> On Aug 27, 2009, at 12:40 PM, Robert Buels wrote: >>> >>> ?Looks like bug 1572 is related to this: >>>> http://bugzilla.open-bio.org/show_bug.cgi?id=1572 >>>> >>>> Rob >>>> >>>> Robert Buels wrote: >>>> >>>>> Hi all, >>>>> Recently a user came into #bioperl looking to truncate an annotated >>>>> sequence (leaving the region between e.g. 150 to 250 nt), and have the >>>>> annotations from the original sequence be remapped onto the new truncated >>>>> sequence. >>>>> Poking through code, I came across an undocumented function trunc() that >>>>> from the comments looks like it was written by Jason as part of a master >>>>> plan to implement this very functionality. >>>>> Just wondering, what's the status of that? >>>>> Rob >>>>> >>>> >>>> >>>> -- >>>> Robert Buels >>>> Bioinformatics Analyst, Sol Genomics Network >>>> Boyce Thompson Institute for Plant Research >>>> Tower Rd >>>> Ithaca, NY ?14853 >>>> Tel: 503-889-8539 >>>> rmb32 at cornell.edu >>>> http://www.sgn.cornell.edu >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From adlai at refenestration.com Sun Aug 30 16:16:41 2009 From: adlai at refenestration.com (adlai burman) Date: Sun, 30 Aug 2009 18:16:41 +0200 Subject: [Bioperl-l] Install on host server Message-ID: Hey there, I have an embarrassingly silly question. I have BioPerl set up and working on my computer. Does anyone here know if there is a standard way to ask one's hosting server to install BioPerl so you can use it within a web page? Barring that, is there a standard way to set it up for your own domain on a hosting server that knows nothing about BioPerl? Thanks, Adlai From ymc at yahoo.com Mon Aug 31 06:10:10 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Sun, 30 Aug 2009 23:10:10 -0700 (PDT) Subject: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? Message-ID: <472878.20951.qm@web30402.mail.mud.yahoo.com> Hi Chris I added a check for LocatableSeq in dpAlign.pm. It will now create an Bio::Seq object internally to copy the sequence in LocatableSeq but taking out all the gaps. This should make it behave properly. I commited the updated Bio/Tools/dpAlign.pm to SVN. In dpAlign.pm, I also added a note saying what will happen if you supplied LocatableSeq to the functions in this module. With regard to that warning, I think the person who reported the bug misused the instantiator of LocatableSeq. He/she can't use the length of the sequence with gaps as the "end". The "end" should be the length without gaps. Let me know if you have any questions or concerns. Have a great day! Yee Man --- On Wed, 8/19/09, Yee Man Chan wrote: > From: Yee Man Chan > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for CPAN, was Re: Problems with Bioperl-ext package on WinVista? > To: "Chris Fields" > Cc: "Robert Buels" , "BioPerl List" > Date: Wednesday, August 19, 2009, 8:01 PM > I noticed that the $qalseq is a > LocatableSeq with gaps. I don't think my program was written > to support LocatableSeq with gaps. If I removed the gaps, > then I would have the scores agree with each other which > should be the desired outcome. > > --------------------- WARNING --------------------- > MSG: In sequence ABC|9986984 residue count gives end value > 104. > Overriding value [101] with value 104 for > Bio::LocatableSeq::end(). > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTTCGGGTCCGGCCCGAA > --------------------------------------------------- > Getting score for ABC|9944760 -> ABC|9986984 > = 291 > Getting score for ABC|9986984 -> ABC|9944760 > = 291 > > Do you think I should check for this LocatableSeq type and > give an error or should I remove the gaps if this is a > LocatableSeq? > > Yee Man > > > --- On Wed, 8/19/09, Chris Fields > wrote: > > > From: Chris Fields > > Subject: Re: [Bioperl-l] Packaging Bio::Ext::HMM for > CPAN, was Re:? Problems with Bioperl-ext package on > WinVista? > > To: "Yee Man Chan" > > Cc: "Robert Buels" , > "BioPerl List" > > Date: Wednesday, August 19, 2009, 7:49 AM > > I'll have a look.? It's probably > > something that hasn't been updated to deal with > > LocatableSeq's pathological end point checking. > > > > chris > > > > On Aug 19, 2009, at 4:01 AM, Yee Man Chan wrote: > > > > > > > > I tried that sample script that reportedly caused > the > > dpAlign "bug" but I can't reproduced it. All I get is > a > > warning from LocatableSeq. > > > ------------------------------------------- > > > [ymc at dev Align]$ PERL_DL_NONLAZY=1 /usr/bin/perl > > "-Iblib/lib" "-Iblib/arch" > > "-I/home/ymc/bioperl/bioperl-live/trunk" test.pl > > > > > > --------------------- WARNING > --------------------- > > > MSG: In sequence ABC|9944760 residue count gives > end > > value 101. > > > Overriding value [104] with value 101 for > > Bio::LocatableSeq::end(). > > > > > > TTGCCATTCTTTCGAAGCGCATTCCCTCTCGTGGCGCTGGCTTCCAGGATCTTTTGGAAGCGCATTCGACGCAACACACCTGCCCGTTT-GGG-CCGGCCC-AA > > > > --------------------------------------------------- > > > Getting score for ABC|9944760 -> ABC|9986984 > > > = 300 > > > Getting score for ABC|9986984 -> ABC|9944760 > > > = 303 > > > ------------------------------------------ > > > > > > Does the test script crash in your machine? > > > > > > Yee Man > > > > > > --- On Tue, 8/18/09, Chris Fields > > wrote: > > > > > >> From: Chris Fields > > >> Subject: Re: Packaging Bio::Ext::HMM for > CPAN, was > > Re: [Bioperl-l] Problems with Bioperl-ext package on > > WinVista? > > >> To: "Robert Buels" > > >> Cc: "Yee Man Chan" , > > "BioPerl List" > > >> Date: Tuesday, August 18, 2009, 10:28 PM > > >> On Aug 18, 2009, at 11:37 PM, Robert > > >> Buels wrote: > > >> > > >>> Yee Man Chan wrote: > > >>>> Is it going to be an arrangement > similar > > to > > >> bioconductor? If so, I suppose then it makes > > sense. But you > > >> might want to develop scripts to > automatically > > download and > > >> install new modules to make it user > friendly. > > >>> Yes, we are probably going to make a > > Task::BioPerl or > > >> something similar. > > >>> > > >>>> What do you mean by Bio-Ext is going > away? > > I > > >> notice quite many people using dpAlign. So > if > > Bio-Ext is > > >> going away, then at least dpAlign should > become > > another spin > > >> off. > > >>> By going away, I meant that everything > in > > there is > > >> going to be spinned off.? Except modules > that > > are no > > >> longer maintainable, if there are any in > there. > > >>> > > >>> Rob > > >> > > >> dpAlign could become another spinoff, yes, if > it's > > used > > >> (and works fine).? The problematic code > dealt > > with pSW, > > >> alignment statistics, and staden io_lib > support > > (the latter > > >> which is fairly bit rotted now): > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2668 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=1857 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2069 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2074 > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2329 > > >> > > >> dpAlign has it's own bug: > > >> > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2384 > > >> > > >> chris > > >> > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > From tuco at pasteur.fr Mon Aug 31 14:13:41 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Mon, 31 Aug 2009 16:13:41 +0200 Subject: [Bioperl-l] Can't add track to Panel Bio::Graphics Message-ID: <4A9BDA95.2020109@pasteur.fr> Hi, I'm trying to create png image using Bio::Graphics. I followed the Howto available at bioperl.org. I'm stacked when trying to add new track to my panel. So far, I can create the panel, add 2 tracks, then, probably mistaking, I can add more tracks to my panel. Here is the code. my $panel = Bio::Graphics::Panel->new( -length => $self->seq()->length(), -width => 800, -pad_top => 5, -pad_bottom => 5, -pad_left => 5, -pad_right => 5, #-key_style => 'between', ); my $bsg = Bio::SeqFeature::Generic->new( -start => 1, -seq => $self->seq()->seq(), -end => $self->seq()->length(), -display_name => $self->seq()->id(). " (".$self->seq->length()." na)", ); $bsg->attach_seq($self->seq()); #Display the reference sequence ############ #### Those 2 tracks are well displayed on the final image ########### $panel->add_track($bsg, -glyph => 'dna', -label => 1); $panel->add_track($bsg, -glyph => 'arrow', -tick => 2, -fgcolor => 'black'); #Build, if present, the single cut if(keys %$spositions){ #Create the specail track for the single cut my $strack = $panel->add_track( -glyph => 'crossbox', -label => 1, -fgcolor => 'red', -key => 'Single cut', -connector => 'dashed', ); foreach my $enz (sort { $a cmp $b } keys %{$spositions->{$strand}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->start(), -end => $spositions->{$strand}->{$enz}->{$enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $enz, -start => $spositions->{$strand}->{$enz}->{$enz}->end(), -end => $spositions->{$strand}->{$enz}->{$enz}->end()); $strack->add_feature($bsfg); $strack->add_feature($bsfg2); } } #Build, if present, the double cut if(keys %$dpositions){ my $dtrack = $panel->add_track( -glyph => 'crossbox', -label => 1, -key => 'Double cut', -connector => 'dashed', ); foreach my $couple (sort { $a cmp $b } keys %{$dpositions->{$strand}}){ foreach my $cc_enz (sort { $a cmp $b } keys %{$dpositions->{$strand}->{$couple}}){ my $bsfg = Bio::SeqFeature::Generic->new( -display_name => $couple, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->start(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->start()); my $bsfg2 = Bio::SeqFeature::Generic->new( -display_name => $cc_enz, -start => $dpositions->{$strand}->{$couple}->{$cc_enz}->end(), -end => $dpositions->{$strand}->{$couple}->{$cc_enz}->end()); $dtrack->add_feature($bsfg); $dtrack->add_feature($bsfg2); } } } print $panel->png(); Can somebody tell me what I'm missing or doing wrong? Thanks for any help Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From marcelo011982 at gmail.com Mon Aug 31 18:12:58 2009 From: marcelo011982 at gmail.com (Marcelo Iwata) Date: Mon, 31 Aug 2009 15:12:58 -0300 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> Message-ID: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> done: #!/usr/bin/perl -w use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'Rpp2Blast.txt'); ... while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION #---------------------------------------------- my $accGB = $hit->description(); $accGB =~ m/(gb=.*?\s)/; #---------------------------------------------- print MYFILE ... $1,"\t" , #numero de acesso ao genbank ... $hsp->hit->end, "\t","\n"; ... } } } On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata wrote: > hi all.. > I was doing a script that take some information of the results of blastn > files. > Everythig was ok, but i have some dificult to pic the Genbank code number > (the 'gb' below). > I tried > > $obj->each_accession_number > $hit->name > > And some variation of this. > > > > ------------------------------ > >gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water stressed 5h > segment 1 gmrtDrNS01 > Glycine max cDNA 3', mRNA sequence /clone_end=3' > /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 > Length = 853 > > Score = 1336 bits (674), Expect = 0.0 > Identities = 793/832 (95%), Gaps = 8/832 (0%) > Strand = Plus / Minus > > > Query: 294858 aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt > 294917 > |||||||||||| |||||| ||||||||||||||||| |||||||||||||||||||| > Sbjct: 853 aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc > 794 > ---------------------------------------- > > > But, i still don't get it. > > thank you > with regards > Miwata > From jason at bioperl.org Mon Aug 31 19:49:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 31 Aug 2009 12:49:08 -0700 Subject: [Bioperl-l] Genbank code from Blast results In-Reply-To: <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> References: <1c9f28970908181134o7353d702sd919120a841f488b@mail.gmail.com> <1c9f28970908311112m60285494x239069e683235015@mail.gmail.com> Message-ID: <4DBC8ED9-6D98-414A-A361-3FAB3EEE955C@bioperl.org> if you run blastall with -I T (show GI's in defline) you will also be able to get the genbank identifier out with $hit->ncbi_gi through some automagic parsing of the ID line -jason On Aug 31, 2009, at 11:12 AM, Marcelo Iwata wrote: > done: > > #!/usr/bin/perl -w > use strict; > use Bio::SearchIO; > > my $in = new Bio::SearchIO(-format => 'blast', > -file => 'Rpp2Blast.txt'); > ... > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #EXTRACT THE GENBANK CODE NUMBER FROM DESCRIPTION > #---------------------------------------------- > my $accGB = $hit->description(); > $accGB =~ m/(gb=.*?\s)/; > #---------------------------------------------- > > > print MYFILE > ... > > $1,"\t" , #numero de acesso ao genbank > ... > $hsp->hit->end, "\t","\n"; > ... > > } > } > } > > > > On Tue, Aug 18, 2009 at 3:34 PM, Marcelo Iwata >wrote: > >> hi all.. >> I was doing a script that take some information of the results of >> blastn >> files. >> Everythig was ok, but i have some dificult to pic the Genbank code >> number >> (the 'gb' below). >> I tried >> >> $obj->each_accession_number >> $hit->name >> >> And some variation of this. >> >> >> >> ------------------------------ >>> gnl|UG|Gma#S23062791 gmrtDrNS01_07-B_M13R_E11_087.s1 Water >>> stressed 5h >> segment 1 gmrtDrNS01 >> Glycine max cDNA 3', mRNA sequence /clone_end=3' >> /gb=CX702616 /gi=58015874 /ug=Gma.18455 /len=853 >> Length = 853 >> >> Score = 1336 bits (674), Expect = 0.0 >> Identities = 793/832 (95%), Gaps = 8/832 (0%) >> Strand = Plus / Minus >> >> >> Query: 294858 >> aaattaacaatgagactccagagtatgtgaggtcctttgaatttgatagcaaattgatgt >> 294917 >> |||||||||||| |||||| ||||||||||||||||| >> |||||||||||||||||||| >> Sbjct: 853 >> aaattaacaatgtgactcccgagtatgtgaggtccttgaaatttgatagcaaattgatgc >> 794 >> ---------------------------------------- >> >> >> But, i still don't get it. >> >> thank you >> with regards >> Miwata >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Russell.Smithies at agresearch.co.nz Mon Aug 31 21:43:25 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 1 Sep 2009 09:43:25 +1200 Subject: [Bioperl-l] Mapping of genome with cytoband In-Reply-To: <29549.68962.qm@web94610.mail.in2.yahoo.com> References: <29549.68962.qm@web94610.mail.in2.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB81F183@exchsth.agresearch.co.nz> Have you tried getting the data from UCSC (or the test site: http://genome-test.cse.ucsc.edu ) If you use Galaxy to get the data then convert to gff, it may save a bit of work. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shafeeq rim > Sent: Thursday, 27 August 2009 11:14 p.m. > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Mapping of genome with cytoband > > Hi, > > I need gene , mrna , cds , sts and exon files as per the mapping with > cytobands.Lets say for 37.1 version NCBI data. I am checking with the .gbs and > .gbk files but the genes and other features are not coming across the whole > chromosome.i.e, for chromosome 1 suppose. When I use the gene coordinates from > .gbk / .gbs files the locations on chromosome 1 genes show only half way on > the ideogram graph. > > Thanks > > > > See the Web's breaking stories, chosen by people like you. Check out > Yahoo! Buzz. http://in.buzz.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================