From w.bryant at ucl.ac.uk Mon Jun 1 04:06:58 2009 From: w.bryant at ucl.ac.uk (Will Bryant) Date: Mon, 01 Jun 2009 09:06:58 +0100 Subject: [Bioperl-l] Extract genomic data from GenBank Message-ID: <4A238C22.9090604@ucl.ac.uk> I'm trying to retrieve the complete GenBank format sequence file for a specified bacterium using get_Seq_by_gi, but I keep getting 'gi does not exist' errors, even when trying the example gi '405830'. The script was running fine September last year, but when I came back to it this week it wasn't working. Am I missing something obvious? In case it's important, I'm using ActivePerl 5.10.0, bioperl 1.5.2_100 Code: #!/usr/bin/perl -w use strict; use Bio::Perl; use Bio::DB::GenBank; my $gb = new Bio::DB::GenBank(-db => 'genome', -format => 'genbank'); my $straincomp = $gb->get_Seq_by_gi('405830'); my $seqout = 0; #my $set_output_file = '$seqout = Bio::SeqIO->new( -format => \'genbank\', -file => \'>c:\\phd\\modelling\\working\\gi'.$ARGV[0].'_data.gb\');'; #print $set_output_file; eval ($set_output_file); $seqout -> write_seq($straincomp); Error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: gi does not exist STACK: Error::throw STACK: Bio::Root::Root::throw c:/perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::DB::WebDBSeqI::get_Seq_by_gi c:/perl/site/lib/Bio/DB/WebDBSeqI.pm:209 STACK: c:\phd\modelling\perl_scripts\retrieve_genome_data.pl:12 ----------------------------------------------------------- Many thanks, Will Bryant. From David.Messina at sbc.su.se Mon Jun 1 05:04:40 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 1 Jun 2009 11:04:40 +0200 Subject: [Bioperl-l] Extract genomic data from GenBank In-Reply-To: <4A238C22.9090604@ucl.ac.uk> References: <4A238C22.9090604@ucl.ac.uk> Message-ID: <628aabb70906010204y46139e1dy702fd53380adecf7@mail.gmail.com> Hey Will, I think there have been API changes in GenBank's remote query interface that have occurred after 1.5.2_100 of BioPerl was written. Try upgrading to BioPerl 1.6 and see if that works for you. (Note that I've only glanced at your code -- I'm assuming that's not the problem since it worked fine for you before.) Dave From fontanez at fas.harvard.edu Mon Jun 1 08:41:06 2009 From: fontanez at fas.harvard.edu (Kristina Fontanez) Date: Mon, 1 Jun 2009 08:41:06 -0400 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: References: <2023E087846042178215CF9EBDE12C75@NewLife> <4A205502.2030701@sendu.me.uk> <024B0302-7885-4005-851D-5D582122ED06@fas.harvard.edu> <4A205D46.4090105@sendu.me.uk> Message-ID: <855163D8-6B40-4DF4-84B6-C14611D1CA42@fas.harvard.edu> Hey everyone- Thanks for all the advice. I reinstalled Xcode tools, installed Fink and downloaded bioperl successfully. It's now working smoothly. Thanks again, Kristina --------------------------------------------------------------- Kristina Fontanez PhD candidate Department of Organismic and Evolutionary Biology Cavanaugh lab Harvard University 16 Divinity Ave. Cambridge, MA 02138 tel: 617-495-1138 fax: 617-496-6933 email: fontanez at fas.harvard.edu On May 29, 2009, at 10:40 PM, Chris Fields wrote: Kristina, You aren't running as superuser: > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez $ cpan You'll need to run cpan using 'sudo cpan' if installing modules anywhere requiring superuser permissions. chris On May 29, 2009, at 5:10 PM, Sendu Bala wrote: > Kristina Fontanez wrote: >> Hello everyone- >> Sendu - I took your advice but doing Install Bundle::CPAN did not >> take care of the dependencies. It still failed. See attached txt >> file with my terminal output. Does anyone have any idea how this >> might be? > > From reading the output it seems like perhaps you don't have 'make' > or there is something wrong when using it. If you're on a mac you > may need to install the dev tools. Someone else want to jump in here > with advice? > > Also, check your CPAN configuration to ensure it is trying to use > the correct make commands. ('o conf' etc.) > > >> If I wanted to wipe all perl from my computer and simply start >> over, how might this be accomplished? > > Don't do that. At least not until you know you have a working make > setup. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Jun 1 10:55:50 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Jun 2009 10:55:50 -0400 Subject: [Bioperl-l] a HOWTO for Tiling Message-ID: <13190185F84E43BDA99993CEB44394C4@NewLife> Hi All Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of B::S::Tiling, use cases, code snippets, design, implementation and algorithm discussions. We're just about ready to port over to core from bioperl-dev; please shout out if this is not a good idea. cheers and thanks for all input-- Mark From cjfields at illinois.edu Mon Jun 1 11:21:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 10:21:30 -0500 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: References: <2023E087846042178215CF9EBDE12C75@NewLife> Message-ID: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> A autogenerated passthrough Makefile.PL is generated with the distribution: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.0/Makefile.PL We may remove that in future releases, but it should work regardless (i.e. call Module::Build and Build.PL). I'm pretty convinced that the issue was permissions-based at heart. Note Kristina ran 'cpan' instead of 'sudo cpan' to invoke the shell, so the shell is using current user config instead of su for installation. You need to use 'sudo' to install anything /Library/Perl on Mac (unless you are already 'root', but on recent OS X version logging in as 'root' is turned off). I just noticed nothing is mentioned along these lines in the installation docs, so we'll need to update those. chris On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote: > Hi Kristina, > > [Don't forget to reply-all, so the list stays in the loop. Many many > more helpers > there.] > > Apparently cpan can't make the Makefile, but can download and expand > the > library directories, in your .cpan directory (see edited highlights > below). > > Let's appeal to the BioPerl brethren/sestren---answers? > > MAJ > > > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan > Terminal does not support AddHistory. > > cpan shell -- CPAN exploration and modules installation (v1.7602) > ReadLine support available (try 'install Bundle::CPAN') > > cpan> install Test::Harness > CPAN: Storable loaded ok > Going to read /Users/kristinafontanez/.cpan/Metadata > Database was generated on Fri, 29 May 2009 11:27:00 GMT > Running install for module Test::Harness > Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz > CPAN: Digest::MD5 loaded ok > CPAN: Compress::Zlib loaded ok > Checksum for /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ > ANDYA/Test-Harness-3.17.tar.gz ok > Scanning cache /Users/kristinafontanez/.cpan/build for sizes > Test-Harness-3.17/ > Test-Harness-3.17/Build.PL > ... > Test-Harness-3.17/xt/perls/sample-tests/ > Test-Harness-3.17/xt/perls/sample-tests/perl_version > Removing previously used /Users/kristinafontanez/.cpan/build/Test- > Harness-3.17 > > CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz > > Checking if your kit is complete... > Looks good > Writing Makefile for Test::Harness > -- NOT OK > Running make test > Can't test without successful make > Running make install > make had returned bad status, install seems impossible > > cpan> install File::HomeDir > ...[more of same]... > > > ----- Original Message ----- From: "Kristina Fontanez" > > To: "Mark A. Jensen" > Sent: Friday, May 29, 2009 3:56 PM > Subject: Re: [Bioperl-l] problem with bioperl install > > >> Mr. Jensen- >> >> Thank you for your help but unfortunately the installation of >> Test::Harness etc didn't work. I copied my terminal output and >> attached the file. Any advice on what's still going wrong? >> >> Thanks, >> Kristina >> > > > -------------------------------------------------------------------------------- > > >> >> >> >> >> --------------------------------------------------------------- >> Kristina Fontanez >> PhD candidate >> Department of Organismic and Evolutionary Biology >> Cavanaugh lab >> Harvard University >> 16 Divinity Ave. >> Cambridge, MA 02138 >> >> tel: 617-495-1138 >> fax: 617-496-6933 >> email: fontanez at fas.harvard.edu >> >> >> >> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote: >> >> The message says you are first updating your CPAN.pm. >> That module needs modules you don't have, so >> >> use cpan to install the dependencies you don't have, viz. >>> Test::Harness >>> File::HomeDir >> >> $ cpan >>> install Test::Harness >> etc. >> Then install CPAN.pm again (or run the Bioperl install again). >> >> Lather, rinse, repeat the install of Bioperl until it completes >> without errors. >> >> ----- Original Message ----- From: "Kristina Fontanez" > > >> To: >> Sent: Friday, May 29, 2009 3:07 PM >> Subject: [Bioperl-l] problem with bioperl install >> >> >>> Hello- >>> >>> I am trying to install bioperl and I ran into some problems. See >>> list below. >>> >>> >>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz >>> >>> Checking if your kit is complete... >>> Looks good >>> Warning: prerequisite File::HomeDir 0.69 not found. >>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56. >>> Writing Makefile for CPAN >>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/ >>> CPAN-1.94.tar.gz] ----- >>> Test::Harness >>> File::HomeDir >>> >>> >>> How can I fix this? >>> >>> >>> Thanks, >>> Kristina >>> --------------------------------------------------------------- >>> Kristina Fontanez >>> PhD candidate >>> Department of Organismic and Evolutionary Biology >>> Cavanaugh lab >>> Harvard University >>> 16 Divinity Ave. >>> Cambridge, MA 02138 >>> >>> tel: 617-495-1138 >>> fax: 617-496-6933 >>> email: fontanez at fas.harvard.edu >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jun 1 12:14:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 11:14:07 -0500 Subject: [Bioperl-l] a HOWTO for Tiling In-Reply-To: <13190185F84E43BDA99993CEB44394C4@NewLife> References: <13190185F84E43BDA99993CEB44394C4@NewLife> Message-ID: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu> I think, as long is it doesn't significantly impact SearchIO performance wise (from reading the HOWTO I can't see how it will), I say commit away. In fact, I consider this a bug fix that should be in the next 1.6 point release. We should add deprecation warnings where needed for 1.7... chris On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote: > Hi All > Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an > exhibition of B::S::Tiling, use cases, code snippets, design, > implementation and algorithm discussions. We're just about ready to > port over to core from bioperl-dev; please shout out if this is not > a good idea. > cheers and thanks for all input-- > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.bolser at gmail.com Mon Jun 1 12:27:30 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 1 Jun 2009 17:27:30 +0100 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> Message-ID: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> 2009/6/1 Chris Fields : ... > for installation. ?You need to use 'sudo' to install anything /Library/Perl > on Mac (unless you are already 'root', but on recent OS X version logging in ... local::lib is supposed to take care of this. Is this broken on Mac? Building stuff as root is generally considered to be bad. > I just noticed nothing is mentioned along these lines in the installation > docs, so we'll need to update those. I tried to write down a clear 'recipe' for getting things installed (this was actually on the GMod wiki). I really think the install docs could be improved. Sometimes less verbose is better. Dan > chris > > On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote: > >> Hi Kristina, >> >> [Don't forget to reply-all, so the list stays in the loop. Many many more >> helpers >> there.] >> >> Apparently cpan can't make the Makefile, but can download and expand the >> library directories, in your .cpan directory (see edited highlights >> below). >> >> Let's appeal to the BioPerl brethren/sestren---answers? >> >> MAJ >> >> >> term dump: >> >> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan >> Terminal does not support AddHistory. >> >> cpan shell -- CPAN exploration and modules installation (v1.7602) >> ReadLine support available (try 'install Bundle::CPAN') >> >> cpan> install Test::Harness >> CPAN: Storable loaded ok >> Going to read /Users/kristinafontanez/.cpan/Metadata >> Database was generated on Fri, 29 May 2009 11:27:00 GMT >> Running install for module Test::Harness >> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz >> CPAN: Digest::MD5 loaded ok >> CPAN: Compress::Zlib loaded ok >> Checksum for >> /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ANDYA/Test-Harness-3.17.tar.gz >> ok >> Scanning cache /Users/kristinafontanez/.cpan/build for sizes >> Test-Harness-3.17/ >> Test-Harness-3.17/Build.PL >> ... >> Test-Harness-3.17/xt/perls/sample-tests/ >> Test-Harness-3.17/xt/perls/sample-tests/perl_version >> Removing previously used >> /Users/kristinafontanez/.cpan/build/Test-Harness-3.17 >> >> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz >> >> Checking if your kit is complete... >> Looks good >> Writing Makefile for Test::Harness >> ?-- NOT OK >> Running make test >> Can't test without successful make >> Running make install >> make had returned bad status, install seems impossible >> >> cpan> install File::HomeDir >> ...[more of same]... >> >> >> ----- Original Message ----- From: "Kristina Fontanez" >> >> To: "Mark A. Jensen" >> Sent: Friday, May 29, 2009 3:56 PM >> Subject: Re: [Bioperl-l] problem with bioperl install >> >> >>> Mr. Jensen- >>> >>> Thank you for your help but unfortunately the installation of >>> Test::Harness etc didn't work. I copied my terminal output and >>> attached the file. Any advice on what's still going wrong? >>> >>> Thanks, >>> Kristina >>> >> >> >> >> -------------------------------------------------------------------------------- >> >> >>> >>> >>> >>> >>> --------------------------------------------------------------- >>> Kristina Fontanez >>> PhD candidate >>> Department of Organismic and Evolutionary Biology >>> Cavanaugh lab >>> Harvard University >>> 16 Divinity Ave. >>> Cambridge, MA 02138 >>> >>> tel: 617-495-1138 >>> fax: 617-496-6933 >>> email: fontanez at fas.harvard.edu >>> >>> >>> >>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote: >>> >>> The message says you are first updating your CPAN.pm. >>> That module needs modules you don't have, so >>> >>> use cpan to install the dependencies you don't have, viz. >>>> >>>> ?Test::Harness >>>> ?File::HomeDir >>> >>> $ cpan >>>> >>>> install Test::Harness >>> >>> etc. >>> Then install CPAN.pm again (or run the Bioperl install again). >>> >>> Lather, rinse, repeat the install of Bioperl until it completes >>> without errors. >>> >>> ----- Original Message ----- From: "Kristina Fontanez" >>> >> > >>> To: >>> Sent: Friday, May 29, 2009 3:07 PM >>> Subject: [Bioperl-l] problem with bioperl install >>> >>> >>>> Hello- >>>> >>>> I am trying to install bioperl and I ran into some problems. See >>>> list ?below. >>>> >>>> >>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz >>>> >>>> Checking if your kit is complete... >>>> Looks good >>>> Warning: prerequisite File::HomeDir 0.69 not found. >>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56. >>>> Writing Makefile for CPAN >>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/ >>>> CPAN-1.94.tar.gz] ----- >>>> ?Test::Harness >>>> ?File::HomeDir >>>> >>>> >>>> How can I fix this? >>>> >>>> >>>> Thanks, >>>> Kristina >>>> --------------------------------------------------------------- >>>> Kristina Fontanez >>>> PhD candidate >>>> Department of Organismic and Evolutionary Biology >>>> Cavanaugh lab >>>> Harvard University >>>> 16 Divinity Ave. >>>> Cambridge, MA 02138 >>>> >>>> tel: 617-495-1138 >>>> fax: 617-496-6933 >>>> email: fontanez at fas.harvard.edu >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Jun 1 13:15:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 12:15:42 -0500 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> Message-ID: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> On Jun 1, 2009, at 11:27 AM, Dan Bolser wrote: > 2009/6/1 Chris Fields : > > ... >> for installation. You need to use 'sudo' to install anything / >> Library/Perl >> on Mac (unless you are already 'root', but on recent OS X version >> logging in > ... > > local::lib is supposed to take care of this. Is this broken on Mac? > Building stuff as root is generally considered to be bad. You can install to a local lib, yes, but cpan needs to be manually configured to do this; I don't think it is automatically configured to do so in OS X, eg. it defaults to /Library/Perl. Frankly, I sidestep the whole issue with my own custom perl installation, but that's me. >> I just noticed nothing is mentioned along these lines in the >> installation >> docs, so we'll need to update those. > > I tried to write down a clear 'recipe' for getting things installed > (this was actually on the GMod wiki). I really think the install docs > could be improved. Sometimes less verbose is better. > > Dan True, but I would much rather have reasonable instructions that outline most installation issues than ones that aren't detailed enough. My thought is to strip down the INSTALL doc that comes with BioPerl down to the essentials and point to the wiki for the more detailed ones (including problems encountered). It's too hard to maintain both and backport the wiki into plain text. chris From maj at fortinbras.us Mon Jun 1 15:03:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Jun 2009 15:03:05 -0400 Subject: [Bioperl-l] a HOWTO for Tiling In-Reply-To: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu> References: <13190185F84E43BDA99993CEB44394C4@NewLife> <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu> Message-ID: Thanks, Chris-- Bio::Search::Tiling is now ported to core; the snapshot of the ported version is in bioperl-dev/tags/tiling-port-to-core-060109. Bunch o' tests performed by t/SearchIO/Tiling.t; bunch more if one sets BIOPERL_TILING_EXHAUSTIVE_TESTS . Cry 'Havoc!' and let slip the dogs of war... MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Sendu Bala" ; "Dave Messina" ; "BioPerl List" Sent: Monday, June 01, 2009 12:14 PM Subject: Re: [Bioperl-l] a HOWTO for Tiling >I think, as long is it doesn't significantly impact SearchIO performance wise >(from reading the HOWTO I can't see how it will), I say commit away. In fact, >I consider this a bug fix that should be in the next 1.6 point release. We >should add deprecation warnings where needed for 1.7... > > chris > > On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote: > >> Hi All >> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of >> B::S::Tiling, use cases, code snippets, design, implementation and algorithm >> discussions. We're just about ready to port over to core from bioperl-dev; >> please shout out if this is not a good idea. >> cheers and thanks for all input-- >> Mark >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From koenvanderdrift at gmail.com Mon Jun 1 18:22:23 2009 From: koenvanderdrift at gmail.com (Koen van der Drift) Date: Mon, 1 Jun 2009 18:22:23 -0400 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> Message-ID: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com> On Jun 1, 2009, at 1:15 PM, Chris Fields wrote: > My thought is to strip down the INSTALL doc that comes with BioPerl > down to the essentials and point to the wiki for the more detailed > ones (including problems encountered). It's too hard to maintain > both and backport the wiki into plain text. Good idea, please then also update the file PLATFORMS. It has a link to a very outdated website for the installation of bioperl on OS X. And maybe a line + link to the bioperl wiki can be added that recommends the use of fink as an alternative to cpan? cheers, - Koen. From cjfields at illinois.edu Mon Jun 1 19:27:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Jun 2009 18:27:32 -0500 Subject: [Bioperl-l] problem with bioperl install In-Reply-To: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com> References: <2023E087846042178215CF9EBDE12C75@NewLife> <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu> <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com> <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu> <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com> Message-ID: <98605D05-706B-4ACB-B444-4F0A9CEC879D@illinois.edu> On Jun 1, 2009, at 5:22 PM, Koen van der Drift wrote: > > On Jun 1, 2009, at 1:15 PM, Chris Fields wrote: > >> My thought is to strip down the INSTALL doc that comes with BioPerl >> down to the essentials and point to the wiki for the more detailed >> ones (including problems encountered). It's too hard to maintain >> both and backport the wiki into plain text. > > > Good idea, please then also update the file PLATFORMS. It has a link > to a very outdated website for the installation of bioperl on OS X. > And maybe a line + link to the bioperl wiki can be added that > recommends the use of fink as an alternative to cpan? > > cheers, > > - Koen. Done. I've added a ticket on bugzilla for tracking this so it doesn't get lost: http://bugzilla.open-bio.org/show_bug.cgi?id=2846 chris From shalabh.sharma7 at gmail.com Tue Jun 2 10:44:25 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 2 Jun 2009 10:44:25 -0400 Subject: [Bioperl-l] Refseq Hits Message-ID: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> Hi All, This is not really a bioperl query, but i am really confused and need some help. I blasted some sequences against refseq database (locally). After parsing the blast result what i noticed that some description fields contain two hit names like: hit_name -> gi|71082715|ref|YP_265434.1| Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein [Candidatus Pelagibacter ubique HTCC1002] So besides giving me description for hit_name (HTCC 1062) its also giving me HTCC 1002. I will really appreciate if someone can help me out. Thanks Shalabh _________________________________________________ Shalabh Sharma Scientific Computing Professional Associate Department of Marine Sciences University of Georgia Athens, GA 30602-3636 phone: 706-542-0341 email: ssharmai at uga.edu From jonathancrabtree at gmail.com Tue Jun 2 11:04:33 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Tue, 2 Jun 2009 11:04:33 -0400 Subject: [Bioperl-l] Refseq Hits In-Reply-To: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> Message-ID: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> Hi Shalabh- I believe RefSeq is a non-redundant database, in which sequence entries with identical sequences are merged and their descriptions are concatenated in the FASTA defline. If you look up the two accession numbers/gi numbers from your search results I think you'll see that both are valid matches because their polypeptide sequences are identical: http://www.ncbi.nlm.nih.gov/protein/71082715 http://www.ncbi.nlm.nih.gov/protein/91762865 You're just getting a single match with two descriptions instead of two matches with one description, but the sequence is the same and so, therefore are the blast alignments. Jonathan On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma wrote: > Hi All, > This is not really a bioperl query, but i am really confused and > need some help. > I blasted some sequences against refseq database (locally). After parsing > the blast result what i noticed that some description fields contain two > hit > names like: > hit_name -> gi|71082715|ref|YP_265434.1| > Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique > HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein > [Candidatus Pelagibacter ubique HTCC1002] > > So besides giving me description for hit_name (HTCC 1062) its also giving > me > HTCC 1002. > I will really appreciate if someone can help me out. > > Thanks > Shalabh > _________________________________________________ > Shalabh Sharma > Scientific Computing Professional Associate > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > > phone: 706-542-0341 > email: ssharmai at uga.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Jun 2 11:15:45 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 2 Jun 2009 11:15:45 -0400 Subject: [Bioperl-l] Refseq Hits In-Reply-To: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> Message-ID: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com> Hi Jonathan, Your information is really helpful. Thanks a lot. -Shalabh On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree < jonathancrabtree at gmail.com> wrote: > > Hi Shalabh- > > I believe RefSeq is a non-redundant database, in which sequence entries > with identical sequences are merged and their descriptions are concatenated > in the FASTA defline. If you look up the two accession numbers/gi numbers > from your search results I think you'll see that both are valid matches > because their polypeptide sequences are identical: > > http://www.ncbi.nlm.nih.gov/protein/71082715 > http://www.ncbi.nlm.nih.gov/protein/91762865 > > You're just getting a single match with two descriptions instead of two > matches with one description, but the sequence is the same and so, therefore > are the blast alignments. > > Jonathan > > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma > wrote: > >> Hi All, >> This is not really a bioperl query, but i am really confused and >> need some help. >> I blasted some sequences against refseq database (locally). After parsing >> the blast result what i noticed that some description fields contain two >> hit >> names like: >> hit_name -> gi|71082715|ref|YP_265434.1| >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding >> protein >> [Candidatus Pelagibacter ubique HTCC1002] >> >> So besides giving me description for hit_name (HTCC 1062) its also giving >> me >> HTCC 1002. >> I will really appreciate if someone can help me out. >> >> Thanks >> Shalabh >> _________________________________________________ >> Shalabh Sharma >> Scientific Computing Professional Associate >> Department of Marine Sciences >> University of Georgia >> Athens, GA 30602-3636 >> >> phone: 706-542-0341 >> email: ssharmai at uga.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From tristan.lefebure at gmail.com Tue Jun 2 12:24:21 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 2 Jun 2009 12:24:21 -0400 Subject: [Bioperl-l] Creating a fastq format file? In-Reply-To: References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com> Message-ID: <200906021224.21439.tristan.lefebure@gmail.com> On Monday 27 April 2009 05:38:40 Heikki Lehvaslaiho wrote: > I convinced at least myself to the degree that I wrote > the range_convert() method - with plenty of tests. I > mention this now so that no-one else need to start > thinking through all the edge values. > > :) > > I'll contribute it to the code base once there is a > consensus of best way forward. > Heikki, This thread has been quiet for a while, but I don't see anything new in Bio::Seq::Quality. Did we reach a consensus or are you waiting for some more discussion on the subject? (I'm pretty impatient to see bioperl handling both sanger and illumina ranges on the fly!) --Tristan > -Heikki > > 2009/4/27 Heikki Lehvaslaiho : > >> I have tried to summarise this in a central place: > >> http://en.wikipedia.org/wiki/FASTQ_format > > > > Torsten, > > > > Thanks for putting this together. Very helpful. > > > > Do you have a plan of action? Let me propose one for > > BioPerl. It based on following assumptions: > > > > 1. There is multitude of different ways of coding > > quality values out there. 2. Bio::Seq::Quality is > > agnostic of any quality value range rules 3. The > > emerging open standard is the Sanger fastq > > specification 4. Open source programs use the Sanger > > fastq specs > > > > > > From these it follows that: > > > > > > 1. BioPerl should support Sanger fastq standard > > > > 1.1. it already does and there are other SeqIO modules > > for dealing with other non-fastq formats. > > > > 2. BioPerl should offer simple ways of converting > > between quality range rules > > > > 2.1. Have a generic method accessible from > > Bio::Seq::Quality with preset versions of the method > > for converting between known variants (Sanger fastq and > > the two Illumina versions) > > > > For example: > > > > range_convert ($from_lower, $from_upper, $to_lower, > > $to_upper, $value) throw if $value < $from_lower or > > $value > $from_upper return $newvalue > > > > range_convert_illumina2fastq(), > > range_convert_fastq2illumina(), > > range_convert_fastq2phred(), > > range_convert_phred2fastq().... > > > > (assuming that illumina 1.3 eq phred) > > > > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert > > Illumina qualities into Sanger fastq on the fly > > > > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the > > incoming stream of quality value range either > > automatically or be given a keyword parameter > > indicating the range. > > > > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an > > error if it detects a quality value out of range. > > > > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an > > error if it detects a quality value out of range. > > > > 2.2.4. It would be useful but not absolutely necessary > > for Bio::SeqIO::Fastq::write_seq to be able to write > > out in Illumina ranges > > > > > > What do you think? > > > > -Heikki > > > > 2009/4/26 Torsten Seemann : > >>> > This might be a good place to ask the question: > >>> > having looked at the fastq.pm page, is the fastq > >>> > format defined (only) by a "@'" followed by > >>> > >>> a > >>> > >>> > sequence line and a "+" header followed by a > >>> > quality line and the two headers have to agree? Now > >>> > that Illumina is using phred scaling, are 'Sanger' > >>> > and 'Illumina' versions the same? > >>> > >>> No they aren't the same, Illumina still encodes the > >>> ascii as value + 64 and Sanger as value + 33. > >> > >> Illumina have now CHANGED how they calculate the > >> quality value however in the last month or so... Their > >> Q range used to be -5..40 mapped to ASCII 64+, but now > >> they produce Q >= 0 and it is unclear if they start at > >> 69 or 64 now... > >> > >> I have tried to summarise this in a central place: > >> > >> http://en.wikipedia.org/wiki/FASTQ_format > >> > >> Corrections welcome! > >> > >> > >> --Torsten Seemann > >> --Victorian Bioinformatics Consortium, Dept. > >> Microbiology, Monash University, AUSTRALIA > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > -Heikki > > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > > cell: +27 (0)714328090 > > Sent from Claremont, WC, South Africa From Russell.Smithies at agresearch.co.nz Tue Jun 2 16:56:26 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Jun 2009 08:56:26 +1200 Subject: [Bioperl-l] Refseq Hits In-Reply-To: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com> References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com> <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com> <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493EB1D18@exchsth.agresearch.co.nz> The identifiers are separated by a Ctrl-A char ("\001") in the original non-redundant fasta header so you should be able to split them up again - assuming BioPerl didn't munge them. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Wednesday, 3 June 2009 3:16 a.m. > To: Jonathan Crabtree > Cc: bioperl-l > Subject: Re: [Bioperl-l] Refseq Hits > > Hi Jonathan, Your information is really helpful. Thanks a > lot. > > -Shalabh > > > On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree < > jonathancrabtree at gmail.com> wrote: > > > > > Hi Shalabh- > > > > I believe RefSeq is a non-redundant database, in which sequence entries > > with identical sequences are merged and their descriptions are concatenated > > in the FASTA defline. If you look up the two accession numbers/gi numbers > > from your search results I think you'll see that both are valid matches > > because their polypeptide sequences are identical: > > > > http://www.ncbi.nlm.nih.gov/protein/71082715 > > http://www.ncbi.nlm.nih.gov/protein/91762865 > > > > You're just getting a single match with two descriptions instead of two > > matches with one description, but the sequence is the same and so, therefore > > are the blast alignments. > > > > Jonathan > > > > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma > > wrote: > > > >> Hi All, > >> This is not really a bioperl query, but i am really confused and > >> need some help. > >> I blasted some sequences against refseq database (locally). After parsing > >> the blast result what i noticed that some description fields contain two > >> hit > >> names like: > >> hit_name -> gi|71082715|ref|YP_265434.1| > >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique > >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding > >> protein > >> [Candidatus Pelagibacter ubique HTCC1002] > >> > >> So besides giving me description for hit_name (HTCC 1062) its also giving > >> me > >> HTCC 1002. > >> I will really appreciate if someone can help me out. > >> > >> Thanks > >> Shalabh > >> _________________________________________________ > >> Shalabh Sharma > >> Scientific Computing Professional Associate > >> Department of Marine Sciences > >> University of Georgia > >> Athens, GA 30602-3636 > >> > >> phone: 706-542-0341 > >> email: ssharmai at uga.edu > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Tue Jun 2 17:05:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Jun 2009 17:05:03 -0400 Subject: [Bioperl-l] Bio::Search::Tiling Message-ID: All- Bio::Search::Tiling is now in bioperl-live, passes all tests. Thanks, Mark From shalabh.sharma7 at gmail.com Wed Jun 3 13:27:59 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 3 Jun 2009 13:27:59 -0400 Subject: [Bioperl-l] gbf to gff Message-ID: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com> Hi all, I am working on Roseobacters. Many times I've converted gbk file from GenBank to gff format but now one genome "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two gbf files: https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain So now how i can convert this genome to one gff file so i can use it in gbrowse? I would really appreciate if anyone can help me out. Thanks From scott at scottcain.net Wed Jun 3 14:11:54 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 3 Jun 2009 14:11:54 -0400 Subject: [Bioperl-l] gbf to gff In-Reply-To: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com> References: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com> Message-ID: <536f21b00906031111l4b02a846o6f281c536b77460d@mail.gmail.com> Hi Shalabh, Do you want them combined onto a single reference sequence? I'm guessing this is a circular microbial genome in two segments. Do you know how to the coordinates in one genbank file relates to the other (or are you willing to make something up)? I imagine the way I would do it would be to convert both files to gff and then write a quicky script to convert the coordinates and reference sequence name (column 1) of one file to be consistent with the other. Scott On Wed, Jun 3, 2009 at 1:27 PM, shalabh sharma wrote: > Hi all, ? ? ? ? ? ? ? ? I am working on Roseobacters. Many times I've > converted gbk file from GenBank to gff format but now one genome > "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two > gbf files: > > https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain > > So now how i can convert this genome to one gff file so i can use it in > gbrowse? > I would really appreciate if anyone can help me out. > > Thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From alperyilmaz at gmail.com Fri Jun 5 14:50:46 2009 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Fri, 5 Jun 2009 14:50:46 -0400 Subject: [Bioperl-l] GBroswe2 - feature details Message-ID: Dear all, I have a question about utilizing the tag/value pairs that were used in 9th of GFF. If my 9th column is like this: ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22 How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to print name and sequence of a BindingSite, what do I need to replace question marks below? balloon hover = Motif name: $name, Sequence: ??????? The manual is mentioning that it's possible to use user defined tag/value pairs, but I couldn't figure out how. The manual is mentioning: [feature_type:details] tag1 = formatting rule tag2 = formatting rule tag3 = formatting rule can be used to adjust formatting of a tag, but I don't how this can be used to assign value to a tag? I tried ; [cis-elements:details] bs_seq = $value (I didn't use BS_Seq, since it was mentioned, tags are case-insensitive) OR $bs_seq = $value but, I cannot use $bs_seq in hover link option after doing this. What am I doing wrong? thanks, Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 www.grassius.org From cjfields at illinois.edu Fri Jun 5 16:43:04 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Jun 2009 15:43:04 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] Bug in genbank.pm? In-Reply-To: <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu> References: <002b01c9e567$e09b0de0$a1d129a0$@edu> <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu> Message-ID: (Just so this is going to the correct list) Marcos, I'll look into it. This may have been fixed in between the releases, though. There isn't a PPM available for 1.6 yet (several prereqs were missing at the time of the 1.6 release, such as Graphviz and so on). A bug report is in the queue for this, though, as a reminder. I think those are now available, though, so we should *theoretically* be capable of getting a PPM ready. I say 'theoretically' b/c I don't have easy access to a PC running Windows (I have moved to OS X). I'll see what I can do about that in the next few weeks. In the meantime, if you need it you can download 1.6 or the 'nightly build' version (nightly snapshots of svn code) and add it to PERL5LIB or "use lib 'PATH_TO_BIOPERL';" in your scripts; it should work. Nightly builds: http://bioperl.org/DIST/nightly_builds/ chris On Jun 4, 2009, at 10:17 PM, Barbeitos, Marcos wrote: > OK, I attached the first record for both files. These are GenBank > flat files that were emailed to us and transferred from Macs to PCs, > so I am not sure if the encoding/line terminations got messed up at > some point. I converted the line terminations to Unix and the > encoding to Western European Windows, still, it didn't work. May be > worth it mention that BioEdit did understand the format after I > fixed the encoding. > > The data was erased because my boss is kind of finicky about sharing > information. However, I tested the files attached to this email and > got the same results. > > I am still using Bio-Perl 1.5.2_100 in a PC, PPM has not flagged the > availability of an upgrade from CPAN, are you releasing the PPD as > well? > > Thanks! > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thu 6/4/2009 8:05 PM > To: Barbeitos, Marcos > Cc: bioperl-guts-l at lists.open-bio.org > Subject: Re: [Bioperl-guts-l] Bug in genbank.pm? > > Marcos, > > We need the GenBank file (or the accession) you are attempting to > parse. Also, what version are you using? We have released v. 1.6 on > CPAN, and I intend on releasing 1.6.1 soon. > > chris > > On Jun 4, 2009, at 5:57 PM, Marcos S. Barbeitos wrote: > >> Hello. I am trying to parse the Info from GeneBank flat files using >> Bio::SeqIO. I got two file which are virtually identical and one of >> them >> gets parsed just fine. However, in the case of the other, the >> program >> croaks when trying to parse the features and gives me: >> >> >> >> -------------------- WARNING --------------------- >> >> MSG: Unexpected error in feature table for Skipping feature, >> attempting to >> recover >> >> --------------------------------------------------- >> >> >> >> I noticed that it does that after it reads the entry '/organism' in >> Features. The only difference I can see between the two files is the >> presence of the feature ' /organelle' and of the line BASE COUNT in >> one of >> them, but the error persists even after I remove these lines. Apart >> from >> that, there are the number of white spaces that precede the >> beginning of >> each line. Any ideas? >> >> >> >> Thanks! >> >> >> >> Marcos S. Barbeitos >> >> Post-Doc Fellow >> >> The University of Kansas >> Department of Ecology and Evolutionary Biology >> 2041 Haworth Hall >> 1200 Sunnyside Avenue >> Lawrence, Kansas 66045 >> p: 785.864.5887 >> f: 785.864.5860 >> >> >> >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > > > From Russell.Smithies at agresearch.co.nz Sun Jun 7 16:32:27 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 8 Jun 2009 08:32:27 +1200 Subject: [Bioperl-l] GBroswe2 - feature details In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493F1CA41@exchsth.agresearch.co.nz> For the first part of your question, you can use a sub to access values in your annotations: balloon hover = sub{my $f = shift; my %a = $f->attributes; my $name = $f->name; my $seq = $a{'BS_Seq'}; return "Motif name: $name, Sequence: $seq" if defined $seq; return "Motif name: $name, No sequence defined"; } For the second bit, here's the formatting rules I'm using to create hyperlinks: [Dbxref:DETAILS] URL = sub { my ($tag,$value)=@_; if ($value =~ /NCBI_gi:(.+)/){ return "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=$1"; } if ($value =~ /NCBI_Gene:(.+)/){ return "http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=$1"; } return; } And this is what the gff looks like: BTA10 refseq mRNA 10011147 10176454 0 - . ID=NM_001076052;Name=NM_001076052;Index=1;Alias=HOMER1;Note=homer homolog 1 (Drosophila);Dbxref=NCBI_gi:115496957;Dbxref=NCBI_Gene:535311; BTA10 refseq mRNA 10241506 10301142 0 + . ID=NM_001046361;Name=NM_001046361;Index=1;Alias=PAPD4,MGC138008;Note=PAP associated domain containing 4;Dbxref=NCBI_gi:114052221;Dbxref=NCBI_Gene:533862; Hopefully, this will get you going :-) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Alper Yilmaz > Sent: Saturday, 6 June 2009 6:51 a.m. > To: BioPerl List > Subject: [Bioperl-l] GBroswe2 - feature details > > Dear all, > > I have a question about utilizing the tag/value pairs that were used > in 9th of GFF. If my 9th column is like this: > > ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22 > > How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to > print name and sequence of a BindingSite, what do I need to replace > question marks below? > > balloon hover = Motif name: $name, > Sequence: ??????? > > > The manual is mentioning that it's possible to use user defined > tag/value pairs, but I couldn't figure out how. The manual is > mentioning: > [feature_type:details] > tag1 = formatting rule > tag2 = formatting rule > tag3 = formatting rule > > can be used to adjust formatting of a tag, but I don't how this can be > used to assign value to a tag? I tried ; > [cis-elements:details] > bs_seq = $value (I didn't use BS_Seq, since it was > mentioned, tags are case-insensitive) > OR > $bs_seq = $value > > but, I cannot use $bs_seq in hover link option after doing this. What > am I doing wrong? > > thanks, > > Alper Yilmaz > Post-doctoral Researcher > Plant Biotechnology Center > The Ohio State University > 1060 Carmack Rd > Columbus, OH 43210 > (614)688-4954 > www.grassius.org > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From bernd.jagla at pasteur.fr Mon Jun 8 12:24:12 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 8 Jun 2009 18:24:12 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem Message-ID: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> Hi, I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e 'install Bio::Das' This is perl, v5.8.9 built for darwin-2level (please let me know if you need anything else) I am trying to install Bio::Das 1.11 I get the following error: not ok 3 not ok 4 Can't call method "description" on an undefined value at t/01das.t line 62. When going into the sources for 01das.t and printing out $db I get: $VAR1 = \bless( { 'autotypes' => undef, 'default_dsn' => undef, 'autocategories' => undef, 'sockets' => {}, 'aggregators' => [ bless( { 'sub_parts' => [ 'coding_exon' ], 'require_whole_object' => undef, 'main_method' => 'CDS', 'method' => 'alignment' }, 'Bio::DB::GFF::Aggregator' ), bless( { 'sub_parts' => [ 'EST_match' ], 'require_whole_object' => undef, 'main_method' => 'alignment', 'method' => 'alignment' }, 'Bio::DB::GFF::Aggregator' ) ], 'timeout' => undef, 'oldstyle_api' => 1, 'default_server' => 'http://www.wormbase.org/db/seq/das' }, 'Bio::Das' ); @sources is empty And test(3, at sources) fails. Please advise. Thanks, Bernd From lincoln.stein at gmail.com Mon Jun 8 13:00:48 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 8 Jun 2009 13:00:48 -0400 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> Message-ID: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> Hi, The regression tests require an active Internet connection, as well as the DAS test server being up and running. It may be there was a temporary failure of one of those two. I just tested on my end and the regression tests ran ok, so could you try it again? Lincoln On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla wrote: > Hi, > > > > I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e > 'install Bio::Das' > This is perl, v5.8.9 built for darwin-2level > (please let me know if you need anything else) > > > > I am trying to install Bio::Das 1.11 > > > > I get the following error: > > > > not ok 3 > > not ok 4 > > Can't call method "description" on an undefined value at t/01das.t line 62. > > > > When going into the sources for 01das.t and printing out $db I get: > > > > $VAR1 = \bless( { > > 'autotypes' => undef, > > 'default_dsn' => undef, > > 'autocategories' => undef, > > 'sockets' => {}, > > 'aggregators' => [ > > bless( { > > 'sub_parts' => [ > > > 'coding_exon' > > ], > > 'require_whole_object' => > undef, > > 'main_method' => 'CDS', > > 'method' => 'alignment' > > }, 'Bio::DB::GFF::Aggregator' > ), > > bless( { > > 'sub_parts' => [ > > 'EST_match' > > ], > > 'require_whole_object' => > undef, > > 'main_method' => 'alignment', > > 'method' => 'alignment' > > }, 'Bio::DB::GFF::Aggregator' ) > > ], > > 'timeout' => undef, > > 'oldstyle_api' => 1, > > 'default_server' => 'http://www.wormbase.org/db/seq/das' > > }, 'Bio::Das' ); > > > > > > @sources is empty > > And test(3, at sources) fails. > > > > Please advise. > > > > Thanks, > > > > Bernd > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lsbrath at gmail.com Mon Jun 8 16:28:46 2009 From: lsbrath at gmail.com (lsbrath at gmail.com) Date: Mon, 08 Jun 2009 20:28:46 +0000 Subject: [Bioperl-l] fasta conversion Message-ID: <000e0cd6aa4cd53993046bdc1675@google.com> Hello! I am running into trouble while trying to convert a text file to fasta. It should be simple enough but I am getting a wierd error message. This is my script: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use File::Copy; use Bio::SeqIO; my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa"; my $maid = '13063'; opendir my $dh, "$maid_dir"; # directory to search my @files = readdir $dh; #find the _fasta file for my $f (@files){ my $fa = $maid_dir."/".$maid."_hu_1kb.fa"; my $r = $maid_dir."/".$maid."_hu_1kb.txt"; open (my $in,$r); if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta print Dumper($f); my $hu_1kb = $maid.'_hu_1kb'; #file to convert my $in = Bio::SeqIO->new(-file => $r, -format => 'raw'); my $out = Bio::SeqIO->new(-file => ">$fa", -format => 'Fasta'); while ( my $seq = $in->next_seq()) { $out->write_seq($seq); } } } I keep getting the following error message: -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is 13063 --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [13063HU] which does not look healthy STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258 STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210 STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484 STACK: Bio::Seq::SeqFactory::create C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116 STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119 ----------------------------------------------------------- Anyone out there that can help me solve this? From kjaja27 at yahoo.com Fri Jun 5 19:42:13 2009 From: kjaja27 at yahoo.com (kayj) Date: Fri, 5 Jun 2009 16:42:13 -0700 (PDT) Subject: [Bioperl-l] finding SNPs in a given region Message-ID: <23897107.post@talk.nabble.com> Hi All, Is there a way to find the SNPs in a given region, I have the start and the end base pair position, I am looking to download the SNPs in different regions, Is that possible ? This is my first time using bioperl and any help will be greatly appreciated Thanks -- View this message in context: http://www.nabble.com/finding-SNPs-in-a-given-region-tp23897107p23897107.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From kjaja27 at yahoo.com Mon Jun 8 09:49:24 2009 From: kjaja27 at yahoo.com (kayj) Date: Mon, 8 Jun 2009 06:49:24 -0700 (PDT) Subject: [Bioperl-l] How to extract SNPs Message-ID: <23924432.post@talk.nabble.com> Hi All, I have several regions on the genome each is defined with the start and the end base pair position. I am looking into using HapMap http://hapmart.hapmap.org/BioMart/martview to extract the SNPs in these region given a population. I am new to bioperl and any help will be greatly appreciated. -- View this message in context: http://www.nabble.com/How-to-extract-SNPs-tp23924432p23924432.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bernd at pasteur.fr Mon Jun 8 16:31:57 2009 From: bernd at pasteur.fr (bernd at pasteur.fr) Date: Mon, 8 Jun 2009 22:31:57 +0200 (CEST) Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> Message-ID: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> I tested the connection with wget and everything works fine. I suspect that our proxy might be the problem but all variables are set correctly (ftp_proxy, http_proxy and many more) I am not sure which environment variable are being used... I am not too familiar with all this and don't know where to look for the right configurations. Thanks, Bernd > Hi, > > The regression tests require an active Internet connection, as well as the > DAS test server being up and running. It may be there was a temporary > failure of one of those two. I just tested on my end and the regression > tests ran ok, so could you try it again? > > Lincoln > > On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > wrote: > >> Hi, >> >> >> >> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN >> -e >> 'install Bio::Das' >> This is perl, v5.8.9 built for darwin-2level >> (please let me know if you need anything else) >> >> >> >> I am trying to install Bio::Das 1.11 >> >> >> >> I get the following error: >> >> >> >> not ok 3 >> >> not ok 4 >> >> Can't call method "description" on an undefined value at t/01das.t line >> 62. >> >> >> >> When going into the sources for 01das.t and printing out $db I get: >> >> >> >> $VAR1 = \bless( { >> >> 'autotypes' => undef, >> >> 'default_dsn' => undef, >> >> 'autocategories' => undef, >> >> 'sockets' => {}, >> >> 'aggregators' => [ >> >> bless( { >> >> 'sub_parts' => [ >> >> >> 'coding_exon' >> >> ], >> >> 'require_whole_object' => >> undef, >> >> 'main_method' => 'CDS', >> >> 'method' => 'alignment' >> >> }, >> 'Bio::DB::GFF::Aggregator' >> ), >> >> bless( { >> >> 'sub_parts' => [ >> >> 'EST_match' >> >> ], >> >> 'require_whole_object' => >> undef, >> >> 'main_method' => >> 'alignment', >> >> 'method' => 'alignment' >> >> }, >> 'Bio::DB::GFF::Aggregator' ) >> >> ], >> >> 'timeout' => undef, >> >> 'oldstyle_api' => 1, >> >> 'default_server' => >> 'http://www.wormbase.org/db/seq/das' >> >> }, 'Bio::Das' ); >> >> >> >> >> >> @sources is empty >> >> And test(3, at sources) fails. >> >> >> >> Please advise. >> >> >> >> Thanks, >> >> >> >> Bernd >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Mon Jun 8 17:12:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 8 Jun 2009 17:12:03 -0400 Subject: [Bioperl-l] fasta conversion In-Reply-To: <000e0cd6aa4cd53993046bdc1675@google.com> References: <000e0cd6aa4cd53993046bdc1675@google.com> Message-ID: <4737A1AB29FA47AF8FF4913448F5FAA3@NewLife> you're getting the sequence descriptor rather than the sequence in the return from $in->next_seq. Read up on what the 'raw' format actually entails in the Bio::SeqIO pod.. cheers MAJ ----- Original Message ----- From: To: Sent: Monday, June 08, 2009 4:28 PM Subject: [Bioperl-l] fasta conversion > Hello! > > I am running into trouble while trying to convert a text file to fasta. It > should be simple enough but I am getting a wierd error message. > > This is my script: > > #!/usr/bin/perl > use strict; > use warnings; > use Data::Dumper; > use File::Copy; > use Bio::SeqIO; > > > my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa"; > my $maid = '13063'; > > opendir my $dh, "$maid_dir"; # directory to search > my @files = readdir $dh; > #find the _fasta file > for my $f (@files){ > my $fa = $maid_dir."/".$maid."_hu_1kb.fa"; > my $r = $maid_dir."/".$maid."_hu_1kb.txt"; > open (my $in,$r); > if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta > > print Dumper($f); > my $hu_1kb = $maid.'_hu_1kb'; #file to convert > my $in = Bio::SeqIO->new(-file => $r, > -format => 'raw'); > my $out = Bio::SeqIO->new(-file => ">$fa", > -format => 'Fasta'); > while ( my $seq = $in->next_seq()) { > $out->write_seq($seq); > } > } > } > > I keep getting the following error message: > > -------------------- WARNING --------------------- > MSG: seq doesn't validate, mismatch is 13063 > --------------------------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Attempting to set the sequence to [13063HU] which does not look healthy > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258 > STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210 > STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484 > STACK: Bio::Seq::SeqFactory::create > C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116 > STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119 > ----------------------------------------------------------- > > Anyone out there that can help me solve this? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From stefan.kirov at bms.com Mon Jun 8 17:26:17 2009 From: stefan.kirov at bms.com (Stefan Kirov) Date: Mon, 08 Jun 2009 17:26:17 -0400 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina> <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com> <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> Message-ID: <4A2D81F9.8060509@bms.com> bernd at pasteur.fr wrote: Try to add this line -proxy => 'http:', in t/01das.t where the Bio::Das object is created (I think line 41). Hope this works for you, it did for me. Stefan > I tested the connection with wget and everything works fine. > I suspect that our proxy might be the problem but all variables are set > correctly (ftp_proxy, http_proxy and many more) I am not sure which > environment variable are being used... > I am not too familiar with all this and don't know where to look for the > right configurations. > > Thanks, > > Bernd > > >> Hi, >> >> The regression tests require an active Internet connection, as well as the >> DAS test server being up and running. It may be there was a temporary >> failure of one of those two. I just tested on my end and the regression >> tests ran ok, so could you try it again? >> >> Lincoln >> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla >> wrote: >> >> >>> Hi, >>> >>> >>> >>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN >>> -e >>> 'install Bio::Das' >>> This is perl, v5.8.9 built for darwin-2level >>> (please let me know if you need anything else) >>> >>> >>> >>> I am trying to install Bio::Das 1.11 >>> >>> >>> >>> I get the following error: >>> >>> >>> >>> not ok 3 >>> >>> not ok 4 >>> >>> Can't call method "description" on an undefined value at t/01das.t line >>> 62. >>> >>> >>> >>> When going into the sources for 01das.t and printing out $db I get: >>> >>> >>> >>> $VAR1 = \bless( { >>> >>> 'autotypes' => undef, >>> >>> 'default_dsn' => undef, >>> >>> 'autocategories' => undef, >>> >>> 'sockets' => {}, >>> >>> 'aggregators' => [ >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> >>> 'coding_exon' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => 'CDS', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' >>> ), >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> 'EST_match' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => >>> 'alignment', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' ) >>> >>> ], >>> >>> 'timeout' => undef, >>> >>> 'oldstyle_api' => 1, >>> >>> 'default_server' => >>> 'http://www.wormbase.org/db/seq/das' >>> >>> }, 'Bio::Das' ); >>> >>> >>> >>> >>> >>> @sources is empty >>> >>> And test(3, at sources) fails. >>> >>> >>> >>> Please advise. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bernd.jagla at pasteur.fr Tue Jun 9 03:05:47 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 9 Jun 2009 09:05:47 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <4A2D81F9.8060509@bms.com> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr> <4A2D81F9.8060509@bms.com> Message-ID: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina> Great, that works!!! But since I am using Bio::Das within GBrowse I can't/don't want to change those sources. I tried setting some environment variable but that doesn't seem to work either... So far I have the set the following: FTP_PROXY=http://... HTTP_PROXY=http://... PROXYFTP=http://... PROXYHTTP=http://... ftp_proxy=http://... http_proxy=http://... PROXY=http://... Any suggestions are welcome. Thanks, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Stefan Kirov Sent: Monday, June 08, 2009 11:26 PM To: bernd at pasteur.fr Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem bernd at pasteur.fr wrote: Try to add this line -proxy => 'http:', in t/01das.t where the Bio::Das object is created (I think line 41). Hope this works for you, it did for me. Stefan > I tested the connection with wget and everything works fine. > I suspect that our proxy might be the problem but all variables are set > correctly (ftp_proxy, http_proxy and many more) I am not sure which > environment variable are being used... > I am not too familiar with all this and don't know where to look for the > right configurations. > > Thanks, > > Bernd > > >> Hi, >> >> The regression tests require an active Internet connection, as well as the >> DAS test server being up and running. It may be there was a temporary >> failure of one of those two. I just tested on my end and the regression >> tests ran ok, so could you try it again? >> >> Lincoln >> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla >> wrote: >> >> >>> Hi, >>> >>> >>> >>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN >>> -e >>> 'install Bio::Das' >>> This is perl, v5.8.9 built for darwin-2level >>> (please let me know if you need anything else) >>> >>> >>> >>> I am trying to install Bio::Das 1.11 >>> >>> >>> >>> I get the following error: >>> >>> >>> >>> not ok 3 >>> >>> not ok 4 >>> >>> Can't call method "description" on an undefined value at t/01das.t line >>> 62. >>> >>> >>> >>> When going into the sources for 01das.t and printing out $db I get: >>> >>> >>> >>> $VAR1 = \bless( { >>> >>> 'autotypes' => undef, >>> >>> 'default_dsn' => undef, >>> >>> 'autocategories' => undef, >>> >>> 'sockets' => {}, >>> >>> 'aggregators' => [ >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> >>> 'coding_exon' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => 'CDS', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' >>> ), >>> >>> bless( { >>> >>> 'sub_parts' => [ >>> >>> 'EST_match' >>> >>> ], >>> >>> 'require_whole_object' => >>> undef, >>> >>> 'main_method' => >>> 'alignment', >>> >>> 'method' => 'alignment' >>> >>> }, >>> 'Bio::DB::GFF::Aggregator' ) >>> >>> ], >>> >>> 'timeout' => undef, >>> >>> 'oldstyle_api' => 1, >>> >>> 'default_server' => >>> 'http://www.wormbase.org/db/seq/das' >>> >>> }, 'Bio::Das' ); >>> >>> >>> >>> >>> >>> @sources is empty >>> >>> And test(3, at sources) fails. >>> >>> >>> >>> Please advise. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bernd >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> Lincoln D. Stein >> Director, Informatics and Biocomputing Platform >> Ontario Institute for Cancer Research >> 101 College St., Suite 800 >> Toronto, ON, Canada M5G0A3 >> 416 673-8514 >> Assistant: Renata Musa >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Tue Jun 9 07:20:35 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 9 Jun 2009 12:20:35 +0100 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm Message-ID: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> Hi, I have been experimenting with the Bio::DB::EUtilities module, with help from the Cookbook. But I can't seem to figure out how to get the DNA sequence of a gene; all the examples seem to be fetching protein sequence. How would i go about fetching a sequence using an Entrez GeneID? thanks for any help adam From Kevin.M.Brown at asu.edu Tue Jun 9 11:25:45 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 9 Jun 2009 08:25:45 -0700 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com> <19FC487A25B6478FA4DE91B81A1FC52C@zillumina> Message-ID: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> Dumb question, but are you exporting the variables after you set them? FTP_PROXY=http://... HTTP_PROXY=http://... export FTP_PROXY HTTP_PROXY > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla > Sent: Tuesday, June 09, 2009 12:06 AM > To: 'Stefan Kirov'; bernd at pasteur.fr > Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > Great, that works!!! > But since I am using Bio::Das within GBrowse I can't/don't > want to change > those sources. I tried setting some environment variable but > that doesn't > seem to work either... > So far I have the set the following: > FTP_PROXY=http://... > HTTP_PROXY=http://... > PROXYFTP=http://... > PROXYHTTP=http://... > ftp_proxy=http://... > http_proxy=http://... > PROXY=http://... > > Any suggestions are welcome. > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Stefan Kirov > Sent: Monday, June 08, 2009 11:26 PM > To: bernd at pasteur.fr > Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > bernd at pasteur.fr wrote: > Try to add this line > -proxy => 'http:', > in t/01das.t where the Bio::Das object is created (I think line 41). > Hope this works for you, it did for me. > Stefan > > I tested the connection with wget and everything works fine. > > I suspect that our proxy might be the problem but all > variables are set > > correctly (ftp_proxy, http_proxy and many more) I am not sure which > > environment variable are being used... > > I am not too familiar with all this and don't know where to > look for the > > right configurations. > > > > Thanks, > > > > Bernd > > > > > >> Hi, > >> > >> The regression tests require an active Internet > connection, as well as > the > >> DAS test server being up and running. It may be there was > a temporary > >> failure of one of those two. I just tested on my end and > the regression > >> tests ran ok, so could you try it again? > >> > >> Lincoln > >> > >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > > >> wrote: > >> > >> > >>> Hi, > >>> > >>> > >>> > >>> I am working on a MAC 10.5.7; try to install Bio::Das > using perl -MCPAN > >>> -e > >>> 'install Bio::Das' > >>> This is perl, v5.8.9 built for darwin-2level > >>> (please let me know if you need anything else) > >>> > >>> > >>> > >>> I am trying to install Bio::Das 1.11 > >>> > >>> > >>> > >>> I get the following error: > >>> > >>> > >>> > >>> not ok 3 > >>> > >>> not ok 4 > >>> > >>> Can't call method "description" on an undefined value at > t/01das.t line > >>> 62. > >>> > >>> > >>> > >>> When going into the sources for 01das.t and printing out > $db I get: > >>> > >>> > >>> > >>> $VAR1 = \bless( { > >>> > >>> 'autotypes' => undef, > >>> > >>> 'default_dsn' => undef, > >>> > >>> 'autocategories' => undef, > >>> > >>> 'sockets' => {}, > >>> > >>> 'aggregators' => [ > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > >>> 'coding_exon' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> > 'main_method' => 'CDS', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' > >>> ), > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > 'EST_match' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> 'main_method' => > >>> 'alignment', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' ) > >>> > >>> ], > >>> > >>> 'timeout' => undef, > >>> > >>> 'oldstyle_api' => 1, > >>> > >>> 'default_server' => > >>> 'http://www.wormbase.org/db/seq/das' > >>> > >>> }, 'Bio::Das' ); > >>> > >>> > >>> > >>> > >>> > >>> @sources is empty > >>> > >>> And test(3, at sources) fails. > >>> > >>> > >>> > >>> Please advise. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> > >>> > >>> Bernd > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> -- > >> Lincoln D. Stein > >> Director, Informatics and Biocomputing Platform > >> Ontario Institute for Cancer Research > >> 101 College St., Suite 800 > >> Toronto, ON, Canada M5G0A3 > >> 416 673-8514 > >> Assistant: Renata Musa > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Jun 9 12:08:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 11:08:46 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans Message-ID: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> All, I've noticed a few methods in bioperl with names like 'no_Foo' that mean 'number of Foo' (such as SimpleAlign's no_sequences). The problem I foresee are possible ambiguities, particularly with negative boolean checks (eg 'no_Foo' could also mean 'this instance contains no Foo'), something that BioPerl also has with various settings. I suggest we alias these as num_* to disambiguate that. There's no easy way to change already in-place flag setting w/o going through a deprecation cycle, but we can promote using positive booleans where possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can leave the older 'no_*' methods as is for the time being and maybe deprecate them later. If no one has objections I'll add these in as needed. chris From SMarkel at accelrys.com Tue Jun 9 12:26:08 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Tue, 9 Jun 2009 12:26:08 -0400 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net> Chris, I just checked our code for the Sequence Analysis Collection in Pipeline Pilot. We've got a few places we'd need to make code changes, but we like your suggestion. So, no objections from us. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Tuesday, 09 June 2009 9:09 AM > To: BioPerl List > Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans > > All, > > I've noticed a few methods in bioperl with names like 'no_Foo' that > mean 'number of Foo' (such as SimpleAlign's no_sequences). The > problem I foresee are possible ambiguities, particularly with negative > boolean checks (eg 'no_Foo' could also mean 'this instance contains no > Foo'), something that BioPerl also has with various settings. > > I suggest we alias these as num_* to disambiguate that. There's no > easy way to change already in-place flag setting w/o going through a > deprecation cycle, but we can promote using positive booleans where > possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can leave > the older 'no_*' methods as is for the time being and maybe deprecate > them later. > > If no one has objections I'll add these in as needed. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jun 9 13:03:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 12:03:16 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net> Message-ID: I don't think it would require code changes right away; for the time being no_* will just alias num_*. We can probably have deprecation warnings activate when we reach a particular version. chris On Jun 9, 2009, at 11:26 AM, Scott Markel wrote: > Chris, > > I just checked our code for the Sequence Analysis Collection in > Pipeline Pilot. We've got a few places we'd need to make code > changes, but we like your suggestion. So, no objections from us. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Tuesday, 09 June 2009 9:09 AM >> To: BioPerl List >> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative >> booleans >> >> All, >> >> I've noticed a few methods in bioperl with names like 'no_Foo' that >> mean 'number of Foo' (such as SimpleAlign's no_sequences). The >> problem I foresee are possible ambiguities, particularly with >> negative >> boolean checks (eg 'no_Foo' could also mean 'this instance contains >> no >> Foo'), something that BioPerl also has with various settings. >> >> I suggest we alias these as num_* to disambiguate that. There's no >> easy way to change already in-place flag setting w/o going through a >> deprecation cycle, but we can promote using positive booleans where >> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can >> leave >> the older 'no_*' methods as is for the time being and maybe deprecate >> them later. >> >> If no one has objections I'll add these in as needed. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jun 9 12:32:51 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 9 Jun 2009 12:32:51 -0400 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: <4BA7FB5466B34B59B7C455E1173C1FA7@NewLife> +1, absolutely- MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Tuesday, June 09, 2009 12:08 PM Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans > All, > > I've noticed a few methods in bioperl with names like 'no_Foo' that > mean 'number of Foo' (such as SimpleAlign's no_sequences). The > problem I foresee are possible ambiguities, particularly with negative > boolean checks (eg 'no_Foo' could also mean 'this instance contains no > Foo'), something that BioPerl also has with various settings. > > I suggest we alias these as num_* to disambiguate that. There's no > easy way to change already in-place flag setting w/o going through a > deprecation cycle, but we can promote using positive booleans where > possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can leave > the older 'no_*' methods as is for the time being and maybe deprecate > them later. > > If no one has objections I'll add these in as needed. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Tue Jun 9 13:18:05 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 9 Jun 2009 13:18:05 -0400 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: Great suggestions, I'm all for it. -hilmar On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: > All, > > I've noticed a few methods in bioperl with names like 'no_Foo' that > mean 'number of Foo' (such as SimpleAlign's no_sequences). The > problem I foresee are possible ambiguities, particularly with > negative boolean checks (eg 'no_Foo' could also mean 'this instance > contains no Foo'), something that BioPerl also has with various > settings. > > I suggest we alias these as num_* to disambiguate that. There's no > easy way to change already in-place flag setting w/o going through a > deprecation cycle, but we can promote using positive booleans where > possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can > leave the older 'no_*' methods as is for the time being and maybe > deprecate them later. > > If no one has objections I'll add these in as needed. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florent.angly at gmail.com Tue Jun 9 14:41:51 2009 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 09 Jun 2009 11:41:51 -0700 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> Message-ID: <4A2EACEF.3090809@gmail.com> Agree! no_* is prone to misunderstandings. Also, some BioPerl code uses nof_*, which I quite like. Florent Hilmar Lapp wrote: > Great suggestions, I'm all for it. > > -hilmar > > On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: > >> All, >> >> I've noticed a few methods in bioperl with names like 'no_Foo' that >> mean 'number of Foo' (such as SimpleAlign's no_sequences). The >> problem I foresee are possible ambiguities, particularly with >> negative boolean checks (eg 'no_Foo' could also mean 'this instance >> contains no Foo'), something that BioPerl also has with various >> settings. >> >> I suggest we alias these as num_* to disambiguate that. There's no >> easy way to change already in-place flag setting w/o going through a >> deprecation cycle, but we can promote using positive booleans where >> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can >> leave the older 'no_*' methods as is for the time being and maybe >> deprecate them later. >> >> If no one has objections I'll add these in as needed. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Jun 9 14:55:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 13:55:48 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <4A2EACEF.3090809@gmail.com> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> Message-ID: We could probably alias nof_* with num_* just for consistency, but leave nof_* as is and not deprecate it (I don't think anyone would confuse nof* with no*). chris On Jun 9, 2009, at 1:41 PM, Florent Angly wrote: > Agree! no_* is prone to misunderstandings. > Also, some BioPerl code uses nof_*, which I quite like. > Florent > > Hilmar Lapp wrote: >> Great suggestions, I'm all for it. >> >> -hilmar >> >> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: >> >>> All, >>> >>> I've noticed a few methods in bioperl with names like 'no_Foo' >>> that mean 'number of Foo' (such as SimpleAlign's no_sequences). >>> The problem I foresee are possible ambiguities, particularly with >>> negative boolean checks (eg 'no_Foo' could also mean 'this >>> instance contains no Foo'), something that BioPerl also has with >>> various settings. >>> >>> I suggest we alias these as num_* to disambiguate that. There's >>> no easy way to change already in-place flag setting w/o going >>> through a deprecation cycle, but we can promote using positive >>> booleans where possible (eg 'is_foo' or 'has_foo' instead of >>> 'no_foo'). We can leave the older 'no_*' methods as is for the >>> time being and maybe deprecate them later. >>> >>> If no one has objections I'll add these in as needed. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mauricio at open-bio.org Tue Jun 9 15:33:18 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Tue, 09 Jun 2009 14:33:18 -0500 Subject: [Bioperl-l] Project Help In-Reply-To: <146497.36250.qm@web8407.mail.in.yahoo.com> References: <146497.36250.qm@web8407.mail.in.yahoo.com> Message-ID: <4A2EB8FE.4080402@open-bio.org> Hi Chirag, The OBF applied for the GSoC 2009 but unfortunately we were not accepted. However, other organizations/projects made their way into it and have been kind enough to adopt some of the ideas originally proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl mailing list so the people involved with those projects can give you more details. Regards, Mauricio. chirag matkar wrote: > Hello, > THis is Chirag Matkar wanting to know whether there were any GSOC 2009 projects underway in open Bioinformatics Foundation. > Also as i am myself a perl developer can i can some stipend or internship for building perl modules?. > > Thanking You, > Regards Chirag. > > > Explore and discover exciting holidays and getaways with Yahoo! India Travel http://in.travel.yahoo.com/ > From rmb32 at cornell.edu Tue Jun 9 15:12:54 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 09 Jun 2009 12:12:54 -0700 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> Message-ID: <4A2EB436.8020506@cornell.edu> Why not just add deprecation warnings now? Or you could add deprecation warnings now that only print if $Bio::Root::Version::VERSION >= something. Best to do it while one is thinking about it, I always say. Cause I always forget to do it later. ;-) Rob Chris Fields wrote: > We could probably alias nof_* with num_* just for consistency, but leave > nof_* as is and not deprecate it (I don't think anyone would confuse > nof* with no*). > > chris > > On Jun 9, 2009, at 1:41 PM, Florent Angly wrote: > >> Agree! no_* is prone to misunderstandings. >> Also, some BioPerl code uses nof_*, which I quite like. >> Florent >> >> Hilmar Lapp wrote: >>> Great suggestions, I'm all for it. >>> >>> -hilmar >>> >>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: >>> >>>> All, >>>> >>>> I've noticed a few methods in bioperl with names like 'no_Foo' that >>>> mean 'number of Foo' (such as SimpleAlign's no_sequences). The >>>> problem I foresee are possible ambiguities, particularly with >>>> negative boolean checks (eg 'no_Foo' could also mean 'this instance >>>> contains no Foo'), something that BioPerl also has with various >>>> settings. >>>> >>>> I suggest we alias these as num_* to disambiguate that. There's no >>>> easy way to change already in-place flag setting w/o going through a >>>> deprecation cycle, but we can promote using positive booleans where >>>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo'). We can >>>> leave the older 'no_*' methods as is for the time being and maybe >>>> deprecate them later. >>>> >>>> If no one has objections I'll add these in as needed. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Tue Jun 9 16:19:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 15:19:03 -0500 Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans In-Reply-To: <4A2EB436.8020506@cornell.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> Message-ID: On Jun 9, 2009, at 2:12 PM, Robert Buels wrote: > Why not just add deprecation warnings now? Or you could add > deprecation warnings now that only print if > $Bio::Root::Version::VERSION >= something. Best to do it while one > is thinking about it, I always say. Cause I always forget to do it > later. ;-) > > Rob Actually, that's one thing I want to implement within Root, namely the ability to do this: $self->deprecated(-message => 'method Foo is deprecated', -start_ver => $version1, -throw_ver => $version2 ); So it's essentially a noop and invisible up to start_ver (upon where it warns), then throws after, well, throw_ver. I could probably finagle that in w/o destroying things... chris > Chris Fields wrote: >> We could probably alias nof_* with num_* just for consistency, but >> leave nof_* as is and not deprecate it (I don't think anyone would >> confuse nof* with no*). >> chris >> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote: >>> Agree! no_* is prone to misunderstandings. >>> Also, some BioPerl code uses nof_*, which I quite like. >>> Florent >>> >>> Hilmar Lapp wrote: >>>> Great suggestions, I'm all for it. >>>> >>>> -hilmar >>>> >>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote: >>>> >>>>> All, >>>>> >>>>> I've noticed a few methods in bioperl with names like 'no_Foo' >>>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences). >>>>> The problem I foresee are possible ambiguities, particularly >>>>> with negative boolean checks (eg 'no_Foo' could also mean 'this >>>>> instance contains no Foo'), something that BioPerl also has with >>>>> various settings. >>>>> >>>>> I suggest we alias these as num_* to disambiguate that. There's >>>>> no easy way to change already in-place flag setting w/o going >>>>> through a deprecation cycle, but we can promote using positive >>>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of >>>>> 'no_foo'). We can leave the older 'no_*' methods as is for the >>>>> time being and maybe deprecate them later. >>>>> >>>>> If no one has objections I'll add these in as needed. >>>>> >>>>> chris >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From cjfields at illinois.edu Tue Jun 9 16:45:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 15:45:37 -0500 Subject: [Bioperl-l] deprecated(), was Re: use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> Message-ID: On Jun 9, 2009, at 3:19 PM, Chris Fields wrote: > On Jun 9, 2009, at 2:12 PM, Robert Buels wrote: > >> Why not just add deprecation warnings now? Or you could add >> deprecation warnings now that only print if >> $Bio::Root::Version::VERSION >= something. Best to do it while one >> is thinking about it, I always say. Cause I always forget to do it >> later. ;-) >> >> Rob > > Actually, that's one thing I want to implement within Root, namely > the ability to do this: > > $self->deprecated(-message => 'method Foo is deprecated', > -start_ver => $version1, > -throw_ver => $version2 > ); > > So it's essentially a noop and invisible up to start_ver (upon where > it warns), then throws after, well, throw_ver. I could probably > finagle that in w/o destroying things... > > chris Just to note, this is mainly to allow us devs the opportunity to add these to main trunk w/o having to worry about merges over to the 1.6 branch (where the version is different). We don't want the dep warnings showing up there right away, but maybe in a point release or minor version. chris From hlapp at gmx.net Tue Jun 9 19:09:26 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 9 Jun 2009 19:09:26 -0400 Subject: [Bioperl-l] Project Help In-Reply-To: <4A2EB8FE.4080402@open-bio.org> References: <146497.36250.qm@web8407.mail.in.yahoo.com> <4A2EB8FE.4080402@open-bio.org> Message-ID: <74C0D011-A5A4-4DF1-93D8-13401A18E29A@gmx.net> Hi Chirag, check out the Bio{Perl,Python,Ruby}-related projects (go to 'Accepted Projects') at http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 -hilmar On Jun 9, 2009, at 3:33 PM, Mauricio Herrera Cuadra wrote: > Hi Chirag, > > The OBF applied for the GSoC 2009 but unfortunately we were not > accepted. However, other organizations/projects made their way into > it and have been kind enough to adopt some of the ideas originally > proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl > mailing list so the people involved with those projects can give you > more details. > > Regards, > Mauricio. > > > chirag matkar wrote: >> Hello, >> THis is Chirag Matkar wanting to know whether there were any GSOC >> 2009 projects underway in open Bioinformatics Foundation. >> Also as i am myself a perl developer can i can some stipend or >> internship for building perl modules?. >> Thanking You, >> Regards Chirag. >> Explore and discover exciting holidays and getaways with >> Yahoo! India Travel http://in.travel.yahoo.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Tue Jun 9 21:13:36 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 09 Jun 2009 18:13:36 -0700 Subject: [Bioperl-l] deprecated(), was Re: use of no_* to mean 'number_of', negative booleans In-Reply-To: References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> Message-ID: <4A2F08C0.3010609@cornell.edu> Chris Fields wrote: >> Actually, that's one thing I want to implement within Root, namely the >> ability to do this: >> >> $self->deprecated(-message => 'method Foo is deprecated', >> -start_ver => $version1, >> -throw_ver => $version2 >> ); Here's a patch with tests against the svn trunk head. Is this what you had in mind? -- Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: deprecated.patch Type: text/x-diff Size: 5601 bytes Desc: not available URL: From cjfields at illinois.edu Tue Jun 9 22:54:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 21:54:47 -0500 Subject: [Bioperl-l] deprecated(), was Re: use of no_* to mean 'number_of', negative booleans In-Reply-To: <4A2F08C0.3010609@cornell.edu> References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu> <4A2EACEF.3090809@gmail.com> <4A2EB436.8020506@cornell.edu> <4A2F08C0.3010609@cornell.edu> Message-ID: <20652B6B-1BF3-477C-9619-4149748E5B9B@illinois.edu> On Jun 9, 2009, at 8:13 PM, Robert Buels wrote: > Chris Fields wrote: >>> Actually, that's one thing I want to implement within Root, namely >>> the ability to do this: >>> >>> $self->deprecated(-message => 'method Foo is deprecated', >>> -start_ver => $version1, >>> -throw_ver => $version2 >>> ); > > Here's a patch with tests against the svn trunk head. Is this what > you had in mind? > > -- > Rob Funny, I had written up almost exactly the same code, just a little rearranged. I've modified mine to follow your use of -warn_version (I also had -throw_version as a synonym of -version, JIC). Also, for the tests I created a temp class in the tests and ran tests off that. Thanks for the patch! chris From maj at fortinbras.us Wed Jun 10 00:10:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 10 Jun 2009 00:10:12 -0400 Subject: [Bioperl-l] announcing bioperl-max, a public AMI Message-ID: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> Hi All, I've built a public Amazon machine image, loaded with many many goodies, including the most recent (r15747) trunks of - bioperl-live - bioperl-run - bioperl-db/biosql The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, emboss, and more are all there (and most even pass bioperl-run tests), and perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo (r1071) and others. This is *not* a lean mean fighting machine. Please give it a try if you're so inclined. Fuller details (including image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max. Ping me if it doesn't work. Cheers, Mark From cjfields at illinois.edu Wed Jun 10 00:36:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Jun 2009 23:36:40 -0500 Subject: [Bioperl-l] announcing bioperl-max, a public AMI In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> Message-ID: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> I'll be trying that out, particularly re: bioperl-run. For bioperl-db do you have mysql or pg? Heh, I see Moose is installed. Just need svn'd parrot and git updated rakudo and we could do some damage... chris On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote: > Hi All, > > I've built a public Amazon machine image, loaded with many many > goodies, including the most recent (r15747) trunks of > - bioperl-live > - bioperl-run > - bioperl-db/biosql > The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit > by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, > emboss, and more are all there (and most even pass bioperl-run > tests), and > perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo > (r1071) and others. This is *not* a lean mean fighting machine. > > Please give it a try if you're so inclined. Fuller details (including > image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max > . > > Ping me if it doesn't work. > > Cheers, > Mark > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jun 10 00:39:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 10 Jun 2009 00:39:36 -0400 Subject: [Bioperl-l] announcing bioperl-max, a public AMI In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu> Message-ID: <6A7D85B8037848F090C35A639C84D870@NewLife> ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Wednesday, June 10, 2009 12:36 AM Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI > I'll be trying that out, particularly re: bioperl-run. For bioperl-db > do you have mysql or pg? -both (I'm all about options...) > > Heh, I see Moose is installed. Just need svn'd parrot and git updated > rakudo and we could do some damage... > bioperl-max-0.1.1, here we come... > chris > cheers MAJ > On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote: > >> Hi All, >> >> I've built a public Amazon machine image, loaded with many many >> goodies, including the most recent (r15747) trunks of >> - bioperl-live >> - bioperl-run >> - bioperl-db/biosql >> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit >> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, >> emboss, and more are all there (and most even pass bioperl-run >> tests), and >> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo >> (r1071) and others. This is *not* a lean mean fighting machine. >> >> Please give it a try if you're so inclined. Fuller details (including >> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max >> . >> >> Ping me if it doesn't work. >> >> Cheers, >> Mark >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From bernd.jagla at pasteur.fr Wed Jun 10 03:43:47 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Wed, 10 Jun 2009 09:43:47 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina> <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> Message-ID: <7F2215CBC16B48BE8C548BB69E131890@zillumina> I wrote a small test program to test the environment variables and I have them: 'SSH_CLIENT' => '157. 'FTP_PROXY' => 'http:// 'HTTP_PROXY' => 'http://cache.past 'SSH_TTY' => '/dev/ttys002', 'ftp_proxy' => 'http:// 'http_proxy' => 'http:// Using the "-proxy" works, without it doesn't. (and yes, I export the variables..) Thanks for any suggestions. Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown Sent: Tuesday, June 09, 2009 5:26 PM Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem Dumb question, but are you exporting the variables after you set them? FTP_PROXY=http://... HTTP_PROXY=http://... export FTP_PROXY HTTP_PROXY > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla > Sent: Tuesday, June 09, 2009 12:06 AM > To: 'Stefan Kirov'; bernd at pasteur.fr > Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > Great, that works!!! > But since I am using Bio::Das within GBrowse I can't/don't > want to change > those sources. I tried setting some environment variable but > that doesn't > seem to work either... > So far I have the set the following: > FTP_PROXY=http://... > HTTP_PROXY=http://... > PROXYFTP=http://... > PROXYHTTP=http://... > ftp_proxy=http://... > http_proxy=http://... > PROXY=http://... > > Any suggestions are welcome. > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Stefan Kirov > Sent: Monday, June 08, 2009 11:26 PM > To: bernd at pasteur.fr > Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > bernd at pasteur.fr wrote: > Try to add this line > -proxy => 'http:', > in t/01das.t where the Bio::Das object is created (I think line 41). > Hope this works for you, it did for me. > Stefan > > I tested the connection with wget and everything works fine. > > I suspect that our proxy might be the problem but all > variables are set > > correctly (ftp_proxy, http_proxy and many more) I am not sure which > > environment variable are being used... > > I am not too familiar with all this and don't know where to > look for the > > right configurations. > > > > Thanks, > > > > Bernd > > > > > >> Hi, > >> > >> The regression tests require an active Internet > connection, as well as > the > >> DAS test server being up and running. It may be there was > a temporary > >> failure of one of those two. I just tested on my end and > the regression > >> tests ran ok, so could you try it again? > >> > >> Lincoln > >> > >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > > >> wrote: > >> > >> > >>> Hi, > >>> > >>> > >>> > >>> I am working on a MAC 10.5.7; try to install Bio::Das > using perl -MCPAN > >>> -e > >>> 'install Bio::Das' > >>> This is perl, v5.8.9 built for darwin-2level > >>> (please let me know if you need anything else) > >>> > >>> > >>> > >>> I am trying to install Bio::Das 1.11 > >>> > >>> > >>> > >>> I get the following error: > >>> > >>> > >>> > >>> not ok 3 > >>> > >>> not ok 4 > >>> > >>> Can't call method "description" on an undefined value at > t/01das.t line > >>> 62. > >>> > >>> > >>> > >>> When going into the sources for 01das.t and printing out > $db I get: > >>> > >>> > >>> > >>> $VAR1 = \bless( { > >>> > >>> 'autotypes' => undef, > >>> > >>> 'default_dsn' => undef, > >>> > >>> 'autocategories' => undef, > >>> > >>> 'sockets' => {}, > >>> > >>> 'aggregators' => [ > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > >>> 'coding_exon' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> > 'main_method' => 'CDS', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' > >>> ), > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > 'EST_match' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> 'main_method' => > >>> 'alignment', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' ) > >>> > >>> ], > >>> > >>> 'timeout' => undef, > >>> > >>> 'oldstyle_api' => 1, > >>> > >>> 'default_server' => > >>> 'http://www.wormbase.org/db/seq/das' > >>> > >>> }, 'Bio::Das' ); > >>> > >>> > >>> > >>> > >>> > >>> @sources is empty > >>> > >>> And test(3, at sources) fails. > >>> > >>> > >>> > >>> Please advise. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> > >>> > >>> Bernd > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> -- > >> Lincoln D. Stein > >> Director, Informatics and Biocomputing Platform > >> Ontario Institute for Cancer Research > >> 101 College St., Suite 800 > >> Toronto, ON, Canada M5G0A3 > >> 416 673-8514 > >> Assistant: Renata Musa > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.jagla at pasteur.fr Wed Jun 10 04:16:08 2009 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Wed, 10 Jun 2009 10:16:08 +0200 Subject: [Bioperl-l] Bio:Das 1.11 installation problem In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina> <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu> Message-ID: To whom it may concern: I added $self->proxy($ENV{'HTTP_PROXY'}) if $ENV{'HTTP_PROXY'}; Around line 72 before: $self->proxy($proxy) if $proxy; In Das.pm. This did the trick. For completeness I also edited Fetch.pm: Around line 134: $proxy = $ENV{'HTTP_PROXY'} if $ENV{'HTTP_PROXY'}; Before: my $dest = $proxy || $request->url; Best, Bernd -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown Sent: Tuesday, June 09, 2009 5:26 PM Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem Dumb question, but are you exporting the variables after you set them? FTP_PROXY=http://... HTTP_PROXY=http://... export FTP_PROXY HTTP_PROXY > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla > Sent: Tuesday, June 09, 2009 12:06 AM > To: 'Stefan Kirov'; bernd at pasteur.fr > Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > Great, that works!!! > But since I am using Bio::Das within GBrowse I can't/don't > want to change > those sources. I tried setting some environment variable but > that doesn't > seem to work either... > So far I have the set the following: > FTP_PROXY=http://... > HTTP_PROXY=http://... > PROXYFTP=http://... > PROXYHTTP=http://... > ftp_proxy=http://... > http_proxy=http://... > PROXY=http://... > > Any suggestions are welcome. > > Thanks, > > Bernd > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Stefan Kirov > Sent: Monday, June 08, 2009 11:26 PM > To: bernd at pasteur.fr > Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem > > bernd at pasteur.fr wrote: > Try to add this line > -proxy => 'http:', > in t/01das.t where the Bio::Das object is created (I think line 41). > Hope this works for you, it did for me. > Stefan > > I tested the connection with wget and everything works fine. > > I suspect that our proxy might be the problem but all > variables are set > > correctly (ftp_proxy, http_proxy and many more) I am not sure which > > environment variable are being used... > > I am not too familiar with all this and don't know where to > look for the > > right configurations. > > > > Thanks, > > > > Bernd > > > > > >> Hi, > >> > >> The regression tests require an active Internet > connection, as well as > the > >> DAS test server being up and running. It may be there was > a temporary > >> failure of one of those two. I just tested on my end and > the regression > >> tests ran ok, so could you try it again? > >> > >> Lincoln > >> > >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla > > >> wrote: > >> > >> > >>> Hi, > >>> > >>> > >>> > >>> I am working on a MAC 10.5.7; try to install Bio::Das > using perl -MCPAN > >>> -e > >>> 'install Bio::Das' > >>> This is perl, v5.8.9 built for darwin-2level > >>> (please let me know if you need anything else) > >>> > >>> > >>> > >>> I am trying to install Bio::Das 1.11 > >>> > >>> > >>> > >>> I get the following error: > >>> > >>> > >>> > >>> not ok 3 > >>> > >>> not ok 4 > >>> > >>> Can't call method "description" on an undefined value at > t/01das.t line > >>> 62. > >>> > >>> > >>> > >>> When going into the sources for 01das.t and printing out > $db I get: > >>> > >>> > >>> > >>> $VAR1 = \bless( { > >>> > >>> 'autotypes' => undef, > >>> > >>> 'default_dsn' => undef, > >>> > >>> 'autocategories' => undef, > >>> > >>> 'sockets' => {}, > >>> > >>> 'aggregators' => [ > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > >>> 'coding_exon' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> > 'main_method' => 'CDS', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' > >>> ), > >>> > >>> bless( { > >>> > >>> 'sub_parts' => [ > >>> > >>> > 'EST_match' > >>> > >>> ], > >>> > >>> > 'require_whole_object' => > >>> undef, > >>> > >>> 'main_method' => > >>> 'alignment', > >>> > >>> 'method' => > 'alignment' > >>> > >>> }, > >>> 'Bio::DB::GFF::Aggregator' ) > >>> > >>> ], > >>> > >>> 'timeout' => undef, > >>> > >>> 'oldstyle_api' => 1, > >>> > >>> 'default_server' => > >>> 'http://www.wormbase.org/db/seq/das' > >>> > >>> }, 'Bio::Das' ); > >>> > >>> > >>> > >>> > >>> > >>> @sources is empty > >>> > >>> And test(3, at sources) fails. > >>> > >>> > >>> > >>> Please advise. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> > >>> > >>> Bernd > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> -- > >> Lincoln D. Stein > >> Director, Informatics and Biocomputing Platform > >> Ontario Institute for Cancer Research > >> 101 College St., Suite 800 > >> Toronto, ON, Canada M5G0A3 > >> 416 673-8514 > >> Assistant: Renata Musa > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ron at ron.dk Wed Jun 10 03:35:09 2009 From: ron at ron.dk (Rasmus Ory Nielsen) Date: Wed, 10 Jun 2009 09:35:09 +0200 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebase file. Message-ID: <4A2F622D.5060500@ron.dk> Hi, This is my first time using bioperl for restriction analysis, so please bear with me, if this is a FAQ. I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the script shown at the bottom of the mail. My bioperl version is bioperl-live nightly from 09-Jun-2009. The scripts throws an exception - see below. But, if I comment out the '-enzymes' argument, so it uses the built-in collection of enzymes, it works. My problem is, that I need to use some of the enzymes that are only available in rebase. So how do I get this working? Thanks for your attention. Best regards, Rasmus Ory Nielsen ############################################################ Output from the script: ############################################################ [roni at ksdhcp ~]$ ./restriction_test.pl --------------------- WARNING --------------------- MSG: The enzyme name CviKI-1 was changed to CviKI-I --------------------------------------------------- ------------- EXCEPTION ------------- MSG: Bad end parameter (11). End must be less than the total length of sequence (total=7) STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 STACK toplevel ./restriction_test.pl:30 ------------------------------------- [roni at ksdhcp ~]$ ############################################################ Output from the script with the '-enzymes' argument commented out ############################################################ [roni at ksdhcp ~]$ ./restriction_test.pl --------------------- WARNING --------------------- MSG: The enzyme name CviKI-1 was changed to CviKI-I --------------------------------------------------- $VAR1 = [ { 'seq' => 'CTCGACCGTTAGCAA', 'end' => 15, 'start' => '1' }, { 'seq' => 'AGCTTTCTACCGTTATCGT', 'end' => 34, 'start' => '16' } ]; [roni at ksdhcp ~]$ ############################################################ #!/usr/bin/perl use strict; use warnings; use Bio::PrimarySeq; use Bio::Restriction::IO; use Bio::Restriction::Analysis; use Data::Dumper; # create seq obj my $seqobj = new Bio::PrimarySeq( -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', -primary_id => 'test', -molecule => 'dna' ); # read rebase file my $rebase_io = Bio::Restriction::IO->new( -file => 'withrefm.906', -format => 'withrefm', ); my $rebase_collection = $rebase_io->read; # start restriction analysis my $restriction_analysis = Bio::Restriction::Analysis->new( -seq => $seqobj, -enzymes => $rebase_collection, # it works with this line commented out ); # retrieve fragment maps my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); print Dumper \@fragment_maps; From awitney at sgul.ac.uk Wed Jun 10 07:19:55 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 10 Jun 2009 12:19:55 +0100 Subject: [Bioperl-l] EUtilities Cookbook example fails Message-ID: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> Hi, I am going through the EUtilities Cookbook, but the last example (in section 2.3.1) fails with: Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. This is with BioPerl 1.6.0, perl v5.8.8 thanks for any help adam From hlapp at gmx.net Wed Jun 10 08:08:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 10 Jun 2009 08:08:54 -0400 Subject: [Bioperl-l] announcing bioperl-max, a public AMI In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife> Message-ID: <4B3BCEA2-DA96-46B5-9BA2-F4EDDACC3A96@gmx.net> Very cool! -hilmar On Jun 10, 2009, at 12:10 AM, Mark A. Jensen wrote: > Hi All, > > I've built a public Amazon machine image, loaded with many many > goodies, including the most recent (r15747) trunks of > - bioperl-live > - bioperl-run > - bioperl-db/biosql > The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit > by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, > emboss, and more are all there (and most even pass bioperl-run > tests), and > perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo > (r1071) and others. This is *not* a lean mean fighting machine. > > Please give it a try if you're so inclined. Fuller details (including > image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max > . > > Ping me if it doesn't work. > > Cheers, > Mark > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed Jun 10 08:28:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 07:28:44 -0500 Subject: [Bioperl-l] EUtilities Cookbook example fails In-Reply-To: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> Message-ID: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> I can reproduce that; I'll look into it. chris On Jun 10, 2009, at 6:19 AM, Adam Witney wrote: > Hi, > > I am going through the EUtilities Cookbook, but the last example (in > section 2.3.1) fails with: > > Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ > site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. > > This is with BioPerl 1.6.0, perl v5.8.8 > > thanks for any help > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 10 09:20:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 08:20:43 -0500 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm In-Reply-To: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> Message-ID: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> EntrezGene doesn't contain the sequence information; I believe it just links to the sequence in a specified nuc record with given coordinates. You can get to it, but it takes a little trickery; in essence you need to use the UID to get the gene summary information, extract that, then grab the sequence record using seqstart, seqend, and seqstrand. A dump of esummary info for UID 18131, for instance, (using $eutil- >print_all) gives this info (abbreviated somewhat): UID :18131 Name :Notch3 Description :Notch gene homolog 3 (Drosophila) Orgname :Mus musculus ... GenomicInfo GenomicInfoType ChrLoc :17 ChrAccVer :NC_000083.5 ChrStart :32303796 ChrStop :32257837 GeneWeight :23049 The genomic info section gives the accession.version, start, end, and (implicitly) the strand (ChrStop is less that ChrStart). I have added an example to the cookbook: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F chris On Jun 9, 2009, at 6:20 AM, Adam Witney wrote: > Hi, > > I have been experimenting with the Bio::DB::EUtilities module, with > help from the Cookbook. But I can't seem to figure out how to get > the DNA sequence of a gene; all the examples seem to be fetching > protein sequence. > > How would i go about fetching a sequence using an Entrez GeneID? > > thanks for any help > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 10 09:33:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 08:33:51 -0500 Subject: [Bioperl-l] EUtilities Cookbook example fails In-Reply-To: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk> References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk> Message-ID: <10B8484F-AE84-4E0A-964F-0DC964F5156C@illinois.edu> Adam, Okay, fixed that and the previous issue with 'use an undefined value as an ARRAY reference'. The previous issue appears to be due to a change in the XML output from NCBI (it used to give the IDs at one point). Also made the wiki changes for this; didn't take long to find everything. Thanks for pointing that out! If you find any more issues feel free to make the necessary changes on the wiki or point them out if they're in code. chris On Jun 10, 2009, at 8:12 AM, Adam Witney wrote: > Hi Chris, > > not sure if I should start a new thread for this, but it is related > to the EUtilities Cookbook and LinkSet.pm. > > There are several references in the Cookbook to the method > "get_linkname", however this seems to have changed in the recent > version of LinkSet.pm to "get_link_name". But one reference to the > old method name still exists in LinkSet.pm, as shown by this patch: > > --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ > LinkSet.pm 2009-02-20 12:36:37.000000000 +0000 > +++ /Users/adam/Desktop/LinkSet.pm 2009-06-10 13:58:49.000000000 +0100 > @@ -220,7 +220,7 @@ > =cut > > sub get_link_name { > - return ($_[0]->get_linknames)[0]; > + return ($_[0]->get_link_names)[0]; > } > > =head2 get_submitted_ids > > If i haven't got this all wrong entirely, I could go through and fix > the Cookbook entries if that was useful? > > adam > > > On 10 Jun 2009, at 13:28, Chris Fields wrote: > >> I can reproduce that; I'll look into it. >> >> chris >> >> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote: >> >>> Hi, >>> >>> I am going through the EUtilities Cookbook, but the last example >>> (in section 2.3.1) fails with: >>> >>> Can't use an undefined value as an ARRAY reference at /usr/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. >>> >>> This is with BioPerl 1.6.0, perl v5.8.8 >>> >>> thanks for any help >>> >>> adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From awitney at sgul.ac.uk Wed Jun 10 09:12:05 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 10 Jun 2009 14:12:05 +0100 Subject: [Bioperl-l] EUtilities Cookbook example fails In-Reply-To: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk> <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu> Message-ID: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk> Hi Chris, not sure if I should start a new thread for this, but it is related to the EUtilities Cookbook and LinkSet.pm. There are several references in the Cookbook to the method "get_linkname", however this seems to have changed in the recent version of LinkSet.pm to "get_link_name". But one reference to the old method name still exists in LinkSet.pm, as shown by this patch: --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ LinkSet.pm 2009-02-20 12:36:37.000000000 +0000 +++ /Users/adam/Desktop/LinkSet.pm 2009-06-10 13:58:49.000000000 +0100 @@ -220,7 +220,7 @@ =cut sub get_link_name { - return ($_[0]->get_linknames)[0]; + return ($_[0]->get_link_names)[0]; } =head2 get_submitted_ids If i haven't got this all wrong entirely, I could go through and fix the Cookbook entries if that was useful? adam On 10 Jun 2009, at 13:28, Chris Fields wrote: > I can reproduce that; I'll look into it. > > chris > > On Jun 10, 2009, at 6:19 AM, Adam Witney wrote: > >> Hi, >> >> I am going through the EUtilities Cookbook, but the last example >> (in section 2.3.1) fails with: >> >> Can't use an undefined value as an ARRAY reference at /usr/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470. >> >> This is with BioPerl 1.6.0, perl v5.8.8 >> >> thanks for any help >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Wed Jun 10 10:10:21 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 10 Jun 2009 15:10:21 +0100 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm In-Reply-To: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> Message-ID: Thanks for the pointers Chris. The new example on the Cookbook doesn't quite work for me as ChrStart seems to appear in the DocSum twice, thus get_contents_by_name('ChrStart') returns a list of two values (which writes the second ChrStart into $end). Also the $start and $end seem to be out by 1, so I needed to change it to this: my ($acc) = ($docsum->get_contents_by_name('ChrAccVer')); my ($start) = ($docsum->get_contents_by_name('ChrStart')); my ($end) = ($docsum->get_contents_by_name('ChrStop')); $start += 1; $end += 1; Ah, looking at this further there appears to be something going on in the response from Entrez. Compare these two gene records: http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi? db=gene&id=18131 (your example below) http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 (my gene) In both cases you can see that ChrStart appears twice, once as part of the GenomicInfo list and once on its own at the bottom. In my example above the two ChrStart values match, but in the Notch3 example you posted the 2nd ChrStart seems to be the same as the ChrStop in the GenomicInfo list. Do you know if the second ChrStart has a separate meaning? I guess in the Cookbook example we would need to make sure that the get_contents_by_name('ChrStart') picks up the value from the GenomicInfo list, is this possible? thanks again adam On 10 Jun 2009, at 14:20, Chris Fields wrote: > EntrezGene doesn't contain the sequence information; I believe it > just links to the sequence in a specified nuc record with given > coordinates. You can get to it, but it takes a little trickery; in > essence you need to use the UID to get the gene summary information, > extract that, then grab the sequence record using seqstart, seqend, > and seqstrand. > > A dump of esummary info for UID 18131, for instance, (using $eutil- > >print_all) gives this info (abbreviated somewhat): > > UID :18131 > Name :Notch3 > Description :Notch gene homolog 3 (Drosophila) > Orgname :Mus musculus > ... > GenomicInfo > GenomicInfoType > ChrLoc :17 > ChrAccVer :NC_000083.5 > ChrStart :32303796 > ChrStop :32257837 > GeneWeight :23049 > > The genomic info section gives the accession.version, start, end, > and (implicitly) the strand (ChrStop is less that ChrStart). I have > added an example to the cookbook: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F > > chris > > On Jun 9, 2009, at 6:20 AM, Adam Witney wrote: > >> Hi, >> >> I have been experimenting with the Bio::DB::EUtilities module, with >> help from the Cookbook. But I can't seem to figure out how to get >> the DNA sequence of a gene; all the examples seem to be fetching >> protein sequence. >> >> How would i go about fetching a sequence using an Entrez GeneID? >> >> thanks for any help >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 10 13:56:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Jun 2009 12:56:46 -0500 Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm In-Reply-To: References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk> <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu> Message-ID: Adam, That's really odd that they do that (both the duplication of ChrStart and the coordinates being off-by-one, which means they appear to be 0- based). It's possible that the second ChrStart is meant to represent the actual first base for the gene irrespective of start/end. My example is on the opposite strand, so the second ChrStart == end. The fact that they use the same element name is slightly annoying (and seemingly redundant), but there is a workaround. We grab only the layered information specifically; in this case we want everything below 'GenomicInfoType': GenomicInfo GenomicInfoType ChrLoc :17 ChrAccVer :NC_000083.5 ChrStart :32303796 ChrStop :32257837 So, we can do this in the DocSum loop (that appears to work for your example): ############################ for my $docsum ($eutil->next_DocSum) { # to ensure we grab the right ChrStart information, we grab the Item above # it in the Item hierarchy (visible via print_all from the eutil instance) my ($item) = $docsum->get_Items_by_name('GenomicInfoType'); my %item_data = map {$_ => 0} qw(ChrAccVer ChrStart ChrStop); while (my $sub_item = $item->next_subItem) { if (exists $item_data{$sub_item->get_name}) { $item_data{$sub_item->get_name} = $sub_item->get_content; } } # check to make sure everything is set for my $check (qw(ChrAccVer ChrStart ChrStop)) { die "$check not set" unless $item_data{$check}; } my $strand = $item_data{ChrStart} > $item_data{ChrStop} ? 2 : 1; $fetcher->set_parameters(-id => $item_data{ChrAccVer}, -seq_start => $item_data{ChrStart} + 1, -seq_stop => $item_data{ChrStop} + 1, -strand => $strand); print $fetcher->get_Response->content; } ############################ That's to retain compatibility with 1.6; I'll update the wiki. I can add some common Item container methods to grab information for any Items contained in the current instance (be it a DocSum or another Item). I'll add that in bioperl-live. chris On Jun 10, 2009, at 9:10 AM, Adam Witney wrote: > Thanks for the pointers Chris. > > The new example on the Cookbook doesn't quite work for me as > ChrStart seems to appear in the DocSum twice, thus > get_contents_by_name('ChrStart') returns a list of two values (which > writes the second ChrStart into $end). Also the $start and $end seem > to be out by 1, so I needed to change it to this: > > my ($acc) = ($docsum->get_contents_by_name('ChrAccVer')); > my ($start) = ($docsum->get_contents_by_name('ChrStart')); > my ($end) = ($docsum->get_contents_by_name('ChrStop')); > > $start += 1; > $end += 1; > > Ah, looking at this further there appears to be something going on > in the response from Entrez. Compare these two gene records: > > http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=18131 > (your example below) > http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 > (my gene) > > In both cases you can see that ChrStart appears twice, once as part > of the GenomicInfo list and once on its own at the bottom. In my > example above the two ChrStart values match, but in the Notch3 > example you posted the 2nd ChrStart seems to be the same as the > ChrStop in the GenomicInfo list. Do you know if the second ChrStart > has a separate meaning? > > I guess in the Cookbook example we would need to make sure that the > get_contents_by_name('ChrStart') picks up the value from the > GenomicInfo list, is this possible? > > thanks again > > adam > > > On 10 Jun 2009, at 14:20, Chris Fields wrote: > >> EntrezGene doesn't contain the sequence information; I believe it >> just links to the sequence in a specified nuc record with given >> coordinates. You can get to it, but it takes a little trickery; in >> essence you need to use the UID to get the gene summary >> information, extract that, then grab the sequence record using >> seqstart, seqend, and seqstrand. >> >> A dump of esummary info for UID 18131, for instance, (using $eutil- >> >print_all) gives this info (abbreviated somewhat): >> >> UID :18131 >> Name :Notch3 >> Description :Notch gene homolog 3 (Drosophila) >> Orgname :Mus musculus >> ... >> GenomicInfo >> GenomicInfoType >> ChrLoc :17 >> ChrAccVer :NC_000083.5 >> ChrStart :32303796 >> ChrStop :32257837 >> GeneWeight :23049 >> >> The genomic info section gives the accession.version, start, end, >> and (implicitly) the strand (ChrStop is less that ChrStart). I have >> added an example to the cookbook: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F >> >> chris >> >> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote: >> >>> Hi, >>> >>> I have been experimenting with the Bio::DB::EUtilities module, >>> with help from the Cookbook. But I can't seem to figure out how to >>> get the DNA sequence of a gene; all the examples seem to be >>> fetching protein sequence. >>> >>> How would i go about fetching a sequence using an Entrez GeneID? >>> >>> thanks for any help >>> >>> adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jun 11 07:36:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 07:36:40 -0400 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? Message-ID: <17AD00895AFD43E1A1436D1065092BAC@NewLife> Hi Chris and list- Will documentation for release 1.6 be available in pdoc on doc.bioperl.org? I notice also that autogenerated documentation for bioperl-live doesn't contain new modules (or HIVQuery & Tiling, anyway ;) )-- cheers, Mark From maj at fortinbras.us Thu Jun 11 09:17:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 09:17:25 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. In-Reply-To: <4A2F622D.5060500@ron.dk> References: <4A2F622D.5060500@ron.dk> Message-ID: <2F52B1CED1374763822BF3AD1D283B3B@NewLife> Rasmus et al- This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it cycles through all enzymes apparently creating a global cut map). AarI has a recognition sequence of CACCTGC (in $enz->seq->seq) but a cut site of CACCTGCNNNN^ (in $enz->seq->site) The bad parm '11' refers to the end of the cut site sequence, but the routine B:R:Analysis::_cuts is attempting to split the 7-symbol recognition sequence, and so throws. This surprises me. Core, let me know if you want me to take this on, or if the module author can fix it quicker. cheers, Mark ----- Original Message ----- From: "Rasmus Ory Nielsen" To: Sent: Wednesday, June 10, 2009 3:35 AM Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. > Hi, > > This is my first time using bioperl for restriction analysis, so please bear > with me, if this is a FAQ. > > I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the > script shown at the bottom of the mail. > My bioperl version is bioperl-live nightly from 09-Jun-2009. > > The scripts throws an exception - see below. But, if I comment out the > '-enzymes' argument, so it uses the built-in collection of enzymes, it works. > > My problem is, that I need to use some of the enzymes that are only available > in rebase. So how do I get this working? > > Thanks for your attention. > > Best regards, > Rasmus Ory Nielsen > > > ############################################################ > Output from the script: > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: Bad end parameter (11). End must be less than the total length of > sequence (total=7) > STACK Bio::PrimarySeq::subseq > /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 > STACK Bio::Restriction::Analysis::_enzyme_sites > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 > STACK Bio::Restriction::Analysis::_cuts > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 > STACK Bio::Restriction::Analysis::cut > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 > STACK Bio::Restriction::Analysis::fragment_maps > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 > STACK toplevel ./restriction_test.pl:30 > ------------------------------------- > > [roni at ksdhcp ~]$ > > > ############################################################ > Output from the script with the '-enzymes' argument commented out > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > $VAR1 = [ > { > 'seq' => 'CTCGACCGTTAGCAA', > 'end' => 15, > 'start' => '1' > }, > { > 'seq' => 'AGCTTTCTACCGTTATCGT', > 'end' => 34, > 'start' => '16' > } > ]; > [roni at ksdhcp ~]$ > > ############################################################ > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::PrimarySeq; > use Bio::Restriction::IO; > use Bio::Restriction::Analysis; > use Data::Dumper; > > # create seq obj > my $seqobj = new Bio::PrimarySeq( > -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', > -primary_id => 'test', > -molecule => 'dna' > ); > > # read rebase file > my $rebase_io = Bio::Restriction::IO->new( > -file => 'withrefm.906', > -format => 'withrefm', > ); > my $rebase_collection = $rebase_io->read; > > # start restriction analysis > my $restriction_analysis = Bio::Restriction::Analysis->new( > -seq => $seqobj, > -enzymes => $rebase_collection, # it works with this line commented out > ); > > # retrieve fragment maps > my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); > print Dumper \@fragment_maps; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Jun 11 10:19:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Jun 2009 09:19:51 -0500 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. In-Reply-To: <2F52B1CED1374763822BF3AD1D283B3B@NewLife> References: <4A2F622D.5060500@ron.dk> <2F52B1CED1374763822BF3AD1D283B3B@NewLife> Message-ID: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Mark, Feel free to take it up. It's probably a good idea to start a bug report for tracking if it proves to be thornier to fix than expected. chris On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote: > Rasmus et al- > > This looks like a bug. A quick debug shows it's barfing on > 'AarI' (as it cycles through > all enzymes apparently creating a global cut map). AarI has a > recognition sequence of > > CACCTGC (in $enz->seq->seq) > > but a cut site of > > CACCTGCNNNN^ (in $enz->seq->site) > > The bad parm '11' refers to the end of the cut site sequence, but > the routine > B:R:Analysis::_cuts is attempting to split the 7-symbol recognition > sequence, > and so throws. > > This surprises me. Core, let me know if you want me to take this on, > or > if the module author can fix it quicker. > > cheers, > Mark > > ----- Original Message ----- From: "Rasmus Ory Nielsen" > To: > Sent: Wednesday, June 10, 2009 3:35 AM > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when > using rebasefile. > > >> Hi, >> >> This is my first time using bioperl for restriction analysis, so >> please bear with me, if this is a FAQ. >> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and >> created the script shown at the bottom of the mail. >> My bioperl version is bioperl-live nightly from 09-Jun-2009. >> >> The scripts throws an exception - see below. But, if I comment out >> the '-enzymes' argument, so it uses the built-in collection of >> enzymes, it works. >> >> My problem is, that I need to use some of the enzymes that are only >> available in rebase. So how do I get this working? >> >> Thanks for your attention. >> >> Best regards, >> Rasmus Ory Nielsen >> >> >> ############################################################ >> Output from the script: >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: Bad end parameter (11). End must be less than the total length >> of sequence (total=7) >> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ >> Bio/PrimarySeq.pm:401 >> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >> STACK toplevel ./restriction_test.pl:30 >> ------------------------------------- >> >> [roni at ksdhcp ~]$ >> >> >> ############################################################ >> Output from the script with the '-enzymes' argument commented out >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> $VAR1 = [ >> { >> 'seq' => 'CTCGACCGTTAGCAA', >> 'end' => 15, >> 'start' => '1' >> }, >> { >> 'seq' => 'AGCTTTCTACCGTTATCGT', >> 'end' => 34, >> 'start' => '16' >> } >> ]; >> [roni at ksdhcp ~]$ >> >> ############################################################ >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::PrimarySeq; >> use Bio::Restriction::IO; >> use Bio::Restriction::Analysis; >> use Data::Dumper; >> >> # create seq obj >> my $seqobj = new Bio::PrimarySeq( >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >> -primary_id => 'test', >> -molecule => 'dna' >> ); >> >> # read rebase file >> my $rebase_io = Bio::Restriction::IO->new( >> -file => 'withrefm.906', >> -format => 'withrefm', >> ); >> my $rebase_collection = $rebase_io->read; >> >> # start restriction analysis >> my $restriction_analysis = Bio::Restriction::Analysis->new( >> -seq => $seqobj, >> -enzymes => $rebase_collection, # it works with this line >> commented out >> ); >> >> # retrieve fragment maps >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >> print Dumper \@fragment_maps; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jun 11 10:26:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 10:26:19 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> References: <4A2F622D.5060500@ron.dk> <2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Message-ID: All-righty-- thanks MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Rasmus Ory Nielsen" ; Sent: Thursday, June 11, 2009 10:19 AM Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. > Mark, > > Feel free to take it up. It's probably a good idea to start a bug report for > tracking if it proves to be thornier to fix than expected. > > chris > > On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> Rasmus et al- >> >> This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it >> cycles through >> all enzymes apparently creating a global cut map). AarI has a recognition >> sequence of >> >> CACCTGC (in $enz->seq->seq) >> >> but a cut site of >> >> CACCTGCNNNN^ (in $enz->seq->site) >> >> The bad parm '11' refers to the end of the cut site sequence, but the >> routine >> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition >> sequence, >> and so throws. >> >> This surprises me. Core, let me know if you want me to take this on, or >> if the module author can fix it quicker. >> >> cheers, >> Mark >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using >> rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so please >>> bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created >>> the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out the >>> '-enzymes' argument, so it uses the built-in collection of enzymes, it >>> works. >>> >>> My problem is, that I need to use some of the enzymes that are only >>> available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total length of >>> sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ >>> Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line commented >>> out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From mauricio at open-bio.org Thu Jun 11 12:46:35 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 11 Jun 2009 11:46:35 -0500 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? In-Reply-To: <17AD00895AFD43E1A1436D1065092BAC@NewLife> References: <17AD00895AFD43E1A1436D1065092BAC@NewLife> Message-ID: <4A3134EB.4080702@open-bio.org> Hi Mark, I'll take a look into this sometime between today and tomorrow. Will keep you posted. Thanks for the heads up :) Mauricio. Mark A. Jensen wrote: > Hi Chris and list- > Will documentation for release 1.6 be available in pdoc on doc.bioperl.org? > I notice also that autogenerated documentation for bioperl-live doesn't contain > new modules (or HIVQuery & Tiling, anyway ;) )-- > cheers, Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Jun 11 14:41:26 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 11 Jun 2009 14:41:26 -0400 Subject: [Bioperl-l] 1.6 on doc.bioperl.org? In-Reply-To: <4A3134EB.4080702@open-bio.org> References: <17AD00895AFD43E1A1436D1065092BAC@NewLife> <4A3134EB.4080702@open-bio.org> Message-ID: cheers Mauricio! MAJ ----- Original Message ----- From: "Mauricio Herrera Cuadra" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Thursday, June 11, 2009 12:46 PM Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org? > Hi Mark, > > I'll take a look into this sometime between today and tomorrow. Will keep you > posted. Thanks for the heads up :) > > Mauricio. > > > Mark A. Jensen wrote: >> Hi Chris and list- >> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org? >> I notice also that autogenerated documentation for bioperl-live doesn't >> contain >> new modules (or HIVQuery & Tiling, anyway ;) )-- >> cheers, Mark >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From Xianjun.Dong at bccs.uib.no Fri Jun 12 16:38:50 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Fri, 12 Jun 2009 22:38:50 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph Message-ID: <4A32BCDA.4080605@ii.uib.no> HI, I am not sure this is the right place I can get help. I've suffered by a problem for several days: I want to highlight parts of regions in my track, using a different background color. To do that, I defined a glyph named "background", based on the 'Bio::Graphics::Glyph::generic' module. I override the draw_component() method, by adding code like below: $gd->filledRectangle($left,0,$right,$gd->height, $self->factory->translate_color($color)); # the script is pasted at the end This will draw a rectangle with top=0, bottom=$gd->height. I made the highlight regions into a list of features, and add_track with -glyph=>'background'. (see the following script, test.pl) This really works as I expect, which will add a colored block at background of all tracks in a panel (including the ruler arrow). You can see the output image in attached file "test.bioperl1.2.3.png" Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not work. Well, it works, but the highlight part only shrink to a low height, instead of covering all tracks in the panel. I also attached the output here, see the file "test.bioperl1.6.png". I tried to think about the reason, the 'background' module is based on the generic module. What can cause the difference? Is it because $gd->height is different, or the tracks followed with 'background' track can not draw from the first position? Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person solve problem, wise person avoid problem"...) But another problem is coming: Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() function, which means I have to use some higher version if I want to create web map for my graphics, but then I have to give up using highlight background. OK. It's long enough for my first-time submission here. Hope someone can throw me some clue. Thanks ahead!! Xianjun ==================== test.pl ======================= #!/usr/bin/perl use strict; use lib "$ENV{HOME}/lib"; use Bio::Graphics; use Bio::Graphics::Feature; my $ftr= 'Bio::Graphics::Feature'; # processed_transcript my $trans1 = $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans5 = $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); my $trans = $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); # hightlight my $trans31 = $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', -source=>'a'); my $trans41 = $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', -source=>'b'); my $panel= Bio::Graphics::Panel->new(-width=>1200, -length=>1050, -start =>0, -pad_left=>12, -pad_right=>12); # the following track works as I expected in bioperl 1.2.3, but not in 1.5 and 1.6 $panel->add_track([$trans41,$trans31], -glyph => 'background', -block_bgcolor => sub{return (shift->source eq 'a')?'#cccccc':'#fffc22'}, ); $panel->add_track($ftr->new(-start=>100,-end=>1000), -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($trans, -glyph => 'transcript2', # 'transcript2', #process_5utr', -fgcolor => 'darkred', -bgcolor => 'darkred', -title => '$source', -link => 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL ); print $panel->png; # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl 1.2.3 my $map = $panel->create_web_map("image"); $panel->finished(); 1; ==================== background.pm ======================= package Bio::Graphics::Glyph::background; use strict; use base 'Bio::Graphics::Glyph::generic'; sub pad_top{ return 0; } sub draw_component { my $self = shift; #$self->SUPER::draw_component(@_); my ($gd,$dx,$dy) = @_; my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); # draw an arrow to indicate the direction of transcript my $color = $self->option('block_bgcolor') || '#cccccc'; $gd->filledRectangle($left,0,$right,$gd->height, $self->factory->translate_color($color)); } 1; -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.2.3.png Type: image/png Size: 2789 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.6.png Type: image/png Size: 2365 bytes Desc: not available URL: From scott at scottcain.net Fri Jun 12 21:29:09 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 12 Jun 2009 21:29:09 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A32BCDA.4080605@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> Message-ID: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> Hello Xianjun, I don't think that approach will work. What you almost certainly need to do is a postgrid callback that does the drawing of the highlighted region. For example code of how to do this, take a look at the make_postgrid_callback subroutine in GBrowse 1.69. The option -postgrid is a method of Bio::Graphics::Panel. Scott On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: > HI, > > I am not sure this is the right place I can get help. > > I've suffered by a problem for several days: I want to highlight parts of > regions in my track, using a different background color. To do that, I > defined a glyph named "background", based on the > 'Bio::Graphics::Glyph::generic' module. I override the draw_component() > method, by adding code like below: > > $gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > > # the script is pasted at the end > > This will draw a rectangle with top=0, bottom=$gd->height. I made the > highlight regions into a list of features, and add_track with > -glyph=>'background'. (see the following script, test.pl) This really works > as I expect, which will add a colored block at background of all tracks in a > panel (including the ruler arrow). You can see the output image in attached > file "test.bioperl1.2.3.png" > > Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not > work. Well, it works, but the highlight part only shrink to a low height, > instead of covering all tracks in the panel. I also attached the output > here, see the file "test.bioperl1.6.png". > > I tried to think about the reason, the 'background' module is based on the > generic module. What can cause the difference? Is it because $gd->height is > different, or the tracks followed with 'background' track can not draw from > the first position? > > Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person > solve problem, wise person avoid problem"...) But another problem is coming: > Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() > function, which means I have to use some higher version if I want to create > web map for my graphics, but then I have to give up using highlight > background. > > OK. It's long enough for my first-time submission here. Hope someone can > throw me some clue. > > Thanks ahead!! > > Xianjun > > > ==================== test.pl ======================= > #!/usr/bin/perl > > use strict; > use lib "$ENV{HOME}/lib"; > > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > > # processed_transcript > my $trans1 = > $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); > my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); > my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans5 = > $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); > my $trans ?= > $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); > > # hightlight > my $trans31 = > $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', > -source=>'a'); > my $trans41 = > $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', > -source=>'b'); > > my $panel= Bio::Graphics::Panel->new(-width=>1200, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12); > > # the following track works as I expected in bioperl 1.2.3, but not in 1.5 > and 1.6 > $panel->add_track([$trans41,$trans31], > ? ? ? ? -glyph ? => 'background', > ? ? ? ? ? ? ? ? -block_bgcolor => sub{return (shift->source eq > 'a')?'#cccccc':'#fffc22'}, > ? ? ? ? ? ? ? ? ); > > $panel->add_track($ftr->new(-start=>100,-end=>1000), > ? ? ? ? ? ? ? ? -glyph=>'arrow', > ? ? ? ? ? ? ? ? -double=>1, > ? ? ? ? ? ? ? ? -tick=>2); > > $panel->add_track($trans, > ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr', > ? ? ? ? ? ? ? ? -fgcolor => 'darkred', > ? ? ? ? ? ? ? ? -bgcolor => 'darkred', > ? ? ? ? ? ? ? ? -title => '$source', > ? ? ? ? ? ? ? ? -link => > 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL > ? ? ? ? ? ? ? ? ); > ?print $panel->png; > > # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl > 1.2.3 > my $map = $panel->create_web_map("image"); > $panel->finished(); > > 1; > > ==================== background.pm ======================= > package Bio::Graphics::Glyph::background; > > use strict; > use base 'Bio::Graphics::Glyph::generic'; > sub pad_top{ > ?return 0; > } > > sub draw_component { > ?my $self = shift; > ?#$self->SUPER::draw_component(@_); > ?my ($gd,$dx,$dy) = @_; > ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); > > ?# draw an arrow to indicate the direction of transcript > ?my $color = $self->option('block_bgcolor') || '#cccccc'; > ?$gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > } > > 1; > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Jun 13 09:27:39 2009 From: scott at scottcain.net (Scott Cain) Date: Sat, 13 Jun 2009 09:27:39 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A339621.2060702@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> Message-ID: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> Hi Xianjun, I understand what you want to do, as the current version of gbrowse does this, which uses bioperl 1.6. Without digging through the code, I can't tell you exactly how this works and you didn't send your code that uses this callback, so I can't try it either. One thing that is different between your code and gbrowse is that each of the tracks is actually a seperate panel (to allow track dragging), so it possible that this sort of callback doesn't work for Bio::Graphics any more. Scott On Saturday, June 13, 2009, Xianjun Dong wrote: > Hi, Scott > > Thanks for your reply first. > > I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function: > > $gd->filledRectangle($left+$start,0,$left+$end,$bottom, > ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); > > where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 > > OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") > > OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images. > > [I am not sure the mailist allow to attach image, otherwise, I put them in the following links: > test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png > test.bioperl1.2.3.png: ? ?http://translog.genereg.net/test.bioperl1.2.3.png ] > > You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer? > > Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever) > > Thanks > > Xianjun > ============================================= > > # this generates the callback for highlighting a region > sub make_postgrid_callback { > ?my $settings = shift; > ?return unless ref $settings->{h_region}; > > ?my @h_regions = map { > ? my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; > ? defined($h_ref) && $h_ref eq $settings->{ref} > ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey'] > ? ? ? ? ? ? ? ?: () > ?} > ? @{$settings->{h_region}}; > > ?return unless @h_regions; > ?return hilite_regions_closure(@h_regions); > } > > # this subroutine generates a Bio::Graphics::Panel callback closure > # suitable for hilighting a region of a panel. > # The args are a list of [start,end,color] > sub hilite_regions_closure { > ?my @h_regions = @_; > > ?return sub { > ? my $gd ? ? = shift; > ? my $panel ?= shift; > ? my $left ? = $panel->pad_left; > ? my $top ? ?= $panel->top; > ? my $bottom = $panel->bottom; > ? for my $r (@h_regions) { > ? ? my ($h_start,$h_end,$h_color) = @$r; > ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end); > ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see something > ? ? # assuming top is 0 so as to ignore top padding > ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom, > ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); > ? } > ?}; > } > > > Scott Cain wrote: > > Hello Xianjun, > > I don't think that approach will work. ?What you almost certainly need > to do is a postgrid callback that does the drawing of the highlighted > region. ?For example code of how to do this, take a look at the > make_postgrid_callback subroutine in GBrowse 1.69. ?The option > -postgrid is a method of Bio::Graphics::Panel. > > Scott > > > > > On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: > > > HI, > > I am not sure this is the right place I can get help. > > I've suffered by a problem for several days: I want to highlight parts of > regions in my track, using a different background color. To do that, I > defined a glyph named "background", based on the > 'Bio::Graphics::Glyph::generic' module. I override the draw_component() > method, by adding code like below: > > $gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > > # the script is pasted at the end > > This will draw a rectangle with top=0, bottom=$gd->height. I made the > highlight regions into a list of features, and add_track with > -glyph=>'background'. (see the following script, test.pl) This really works > as I expect, which will add a colored block at background of all tracks in a > panel (including the ruler arrow). You can see the output image in attached > file "test.bioperl1.2.3.png" > > Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not > work. Well, it works, but the highlight part only shrink to a low height, > instead of covering all tracks in the panel. I also attached the output > here, see the file "test.bioperl1.6.png". > > I tried to think about the reason, the 'background' module is based on the > generic module. What can cause the difference? Is it because $gd->height is > different, or the tracks followed with 'background' track can not draw from > the first position? > > Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person > solve problem, wise person avoid problem"...) But another problem is coming: > Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() > function, which means I have to use some higher version if I want to create > web map for my graphics, but then I have to give up using highlight > background. > > OK. It's long enough for my first-time submission here. Hope someone can > throw me some clue. > > Thanks ahead!! > > Xianjun > > > ==================== test.pl ======================= > #!/usr/bin/perl > > use strict; > use lib "$ENV{HOME}/lib"; > > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > > # processed_transcript > my $trans1 = > $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); > my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); > my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans5 = > $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); > my $trans ?= > $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); > > # hightlight > my $trans31 = > $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', > -source=>'a'); > my $trans41 = > $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', > -source=>'b'); > > my $panel= Bio::Graphics::Panel->new(-width=>1200, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12); > > # the following track works as I expected in bioperl 1.2.3, but not in 1.5 > and 1.6 > $panel->add_track([$trans41,$trans31], > ? ? ? ?-glyph ? => 'background', > ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq > 'a')?'#cccccc':'#fffc22'}, > ? ? ? ? ? ? ? ?); > > $panel->add_track($ftr->new(-start=>100,-end=>1000), > ? ? ? ? ? ? ? ?-glyph=>'arrow', > ? ? ? ? ? ? ? ?-double=>1, > ? ? ? ? ? ? ? ?-tick=>2); > > $panel->add_track($trans, > ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr', > ? ? ? ? ? ? ? ?-fgcolor => 'darkred', > ? ? ? ? ? ? ? ?-bgcolor => 'darkred', > ? ? ? ? ? ? ? ?-title => '$source', > ? ? ? ? ? ? ? ?-link => > 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL > ? ? ? ? ? ? ? ?); > ?print $panel->png; > > # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl > 1.2.3 > my $map = $panel->create_web_map("image"); > $panel->finished(); > > 1; > > ==================== background.pm ======================= > package Bio::Graphics::Glyph::background; > > use strict; > use base 'Bio::Graphics::Glyph::generic'; > sub pad_top{ > ?return 0; > } > > sub draw_component { > ?my $self = shift; > ?#$self->SUPER::draw_component(@_); > ?my ($gd,$dx,$dy) = @_; > ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); > > ?# draw an arrow to indicate the direction of transcript > ?my $color = $self->option('block_bgcolor') || '#cccccc'; > ?$gd->filledRectangle($left,0,$right,$gd->height, > $self->factory->translate_color($color)); > } > > 1; > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From Xianjun.Dong at bccs.uib.no Sat Jun 13 12:48:16 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Sat, 13 Jun 2009 18:48:16 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> Message-ID: <4A33D850.1020203@ii.uib.no> Hi, Scott Before I gave up my own whole solution to use GBrowse, I still want to bother you once: As you suggested, I put -postgrid option when the panel, which will call a function to draw the background. The code below is almost copied from the online POD of Bio::Graphics::Panel (see http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html ) But it still does not work. Could you help to have a look? I paste it below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap drawing function is gap_it, not draw_gap. I guess it's a typo. or not?) my $panel = *Bio::Graphics::Panel*->new(-segment=>$segment, -grid=>1, -width=>600, -postgrid=> \&draw_gap); sub gap_it { my $gd = shift; my $panel = shift; my ($gap_start,$gap_end) = $panel->location2pixel(500,600); my $top = $panel->top; my $bottom = $panel->bottom; my $gray = $panel->translate_color('gray'); $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); } THanks Xianjun ----------------------------------------------- #!/usr/bin/perl use strict; use lib "$ENV{HOME}/lib"; use Bio::Graphics; use Bio::Graphics::Feature; my $ftr= 'Bio::Graphics::Feature'; # processed_transcript my $trans1 = $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans5 = $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); my $trans = $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); # hightlight my $trans31 = $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', -source=>'a'); my $trans41 = $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', -source=>'b'); my $panel= Bio::Graphics::Panel->new(-width=>1200, -length=>1050, -start =>0, -pad_left=>12, -pad_right=>12 -postgrid=>\&gap_it); sub gap_it { my $gd = shift; my $panel = shift; my ($gap_start,$gap_end) = $panel->location2pixel(500,600); my $top = $panel->top; my $bottom = $gd->height, #panel->bottom; my $gray = $panel->translate_color('red'); $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); } # the following track works as I expected in bioperl 1.2.3, but not in 1.5 and 1.6 #$panel->add_track([$trans41,$trans31], # -glyph => 'background', # -block_bgcolor => sub{return (shift->source eq 'a')?'#cccccc':'#fffc22'}, # ); $panel->add_track($ftr->new(-start=>100,-end=>1000), -glyph=>'arrow', -double=>1, -tick=>2); $panel->add_track($trans, -glyph => 'transcript2', # 'transcript2', #process_5utr', -fgcolor => 'darkred', -bgcolor => 'darkred', -title => '$source', -link => 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL ); print $panel->png; # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl 1.2.3 my $map = $panel->create_web_map("image"); $panel->finished(); Scott Cain wrote: > Hi Xianjun, > > I understand what you want to do, as the current version of gbrowse > does this, which uses bioperl 1.6. Without digging through the code, > I can't tell you exactly how this works and you didn't send your code > that uses this callback, so I can't try it either. > > One thing that is different between your code and gbrowse is that each > of the tracks is actually a seperate panel (to allow track dragging), > so it possible that this sort of callback doesn't work for > Bio::Graphics any more. > > Scott > > On Saturday, June 13, 2009, Xianjun Dong wrote: > >> Hi, Scott >> >> Thanks for your reply first. >> >> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function: >> >> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >> $panel->translate_color($h_color)); >> >> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >> >> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") >> >> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images. >> >> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links: >> test.bioperl1.6.png: http://translog.genereg.net/test.bioperl1.6.png >> test.bioperl1.2.3.png: http://translog.genereg.net/test.bioperl1.2.3.png ] >> >> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer? >> >> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever) >> >> Thanks >> >> Xianjun >> ============================================= >> >> # this generates the callback for highlighting a region >> sub make_postgrid_callback { >> my $settings = shift; >> return unless ref $settings->{h_region}; >> >> my @h_regions = map { >> my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >> defined($h_ref) && $h_ref eq $settings->{ref} >> ? [$h_start,$h_end,$h_color||'lightgrey'] >> : () >> } >> @{$settings->{h_region}}; >> >> return unless @h_regions; >> return hilite_regions_closure(@h_regions); >> } >> >> # this subroutine generates a Bio::Graphics::Panel callback closure >> # suitable for hilighting a region of a panel. >> # The args are a list of [start,end,color] >> sub hilite_regions_closure { >> my @h_regions = @_; >> >> return sub { >> my $gd = shift; >> my $panel = shift; >> my $left = $panel->pad_left; >> my $top = $panel->top; >> my $bottom = $panel->bottom; >> for my $r (@h_regions) { >> my ($h_start,$h_end,$h_color) = @$r; >> my ($start,$end) = $panel->location2pixel($h_start,$h_end); >> if ($end-$start <= 1) { $end++; $start-- } # so that we always see something >> # assuming top is 0 so as to ignore top padding >> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >> $panel->translate_color($h_color)); >> } >> }; >> } >> >> >> Scott Cain wrote: >> >> Hello Xianjun, >> >> I don't think that approach will work. What you almost certainly need >> to do is a postgrid callback that does the drawing of the highlighted >> region. For example code of how to do this, take a look at the >> make_postgrid_callback subroutine in GBrowse 1.69. The option >> -postgrid is a method of Bio::Graphics::Panel. >> >> Scott >> >> >> >> >> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: >> >> >> HI, >> >> I am not sure this is the right place I can get help. >> >> I've suffered by a problem for several days: I want to highlight parts of >> regions in my track, using a different background color. To do that, I >> defined a glyph named "background", based on the >> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >> method, by adding code like below: >> >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> >> # the script is pasted at the end >> >> This will draw a rectangle with top=0, bottom=$gd->height. I made the >> highlight regions into a list of features, and add_track with >> -glyph=>'background'. (see the following script, test.pl) This really works >> as I expect, which will add a colored block at background of all tracks in a >> panel (including the ruler arrow). You can see the output image in attached >> file "test.bioperl1.2.3.png" >> >> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not >> work. Well, it works, but the highlight part only shrink to a low height, >> instead of covering all tracks in the panel. I also attached the output >> here, see the file "test.bioperl1.6.png". >> >> I tried to think about the reason, the 'background' module is based on the >> generic module. What can cause the difference? Is it because $gd->height is >> different, or the tracks followed with 'background' track can not draw from >> the first position? >> >> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person >> solve problem, wise person avoid problem"...) But another problem is coming: >> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >> function, which means I have to use some higher version if I want to create >> web map for my graphics, but then I have to give up using highlight >> background. >> >> OK. It's long enough for my first-time submission here. Hope someone can >> throw me some clue. >> >> Thanks ahead!! >> >> Xianjun >> >> >> ==================== test.pl ======================= >> #!/usr/bin/perl >> >> use strict; >> use lib "$ENV{HOME}/lib"; >> >> use Bio::Graphics; >> use Bio::Graphics::Feature; >> my $ftr= 'Bio::Graphics::Feature'; >> >> # processed_transcript >> my $trans1 = >> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans5 = >> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >> my $trans = >> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >> >> # hightlight >> my $trans31 = >> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >> -source=>'a'); >> my $trans41 = >> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >> -source=>'b'); >> >> my $panel= Bio::Graphics::Panel->new(-width=>1200, >> -length=>1050, >> -start =>0, >> -pad_left=>12, >> -pad_right=>12); >> >> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >> and 1.6 >> $panel->add_track([$trans41,$trans31], >> -glyph => 'background', >> -block_bgcolor => sub{return (shift->source eq >> 'a')?'#cccccc':'#fffc22'}, >> ); >> >> $panel->add_track($ftr->new(-start=>100,-end=>1000), >> -glyph=>'arrow', >> -double=>1, >> -tick=>2); >> >> $panel->add_track($trans, >> -glyph => 'transcript2', # 'transcript2', #process_5utr', >> -fgcolor => 'darkred', >> -bgcolor => 'darkred', >> -title => '$source', >> -link => >> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL >> ); >> print $panel->png; >> >> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >> 1.2.3 >> my $map = $panel->create_web_map("image"); >> $panel->finished(); >> >> 1; >> >> ==================== background.pm ======================= >> package Bio::Graphics::Glyph::background; >> >> use strict; >> use base 'Bio::Graphics::Glyph::generic'; >> sub pad_top{ >> return 0; >> } >> >> sub draw_component { >> my $self = shift; >> #$self->SUPER::draw_component(@_); >> my ($gd,$dx,$dy) = @_; >> my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >> >> # draw an arrow to indicate the direction of transcript >> my $color = $self->option('block_bgcolor') || '#cccccc'; >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> } >> >> 1; >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> >> >> >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> >> > > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From maj at fortinbras.us Sun Jun 14 00:35:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 14 Jun 2009 00:35:18 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when usingrebasefile. In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Message-ID: All- I'm finding this is requiring a pretty substantial refactor and rationalization. I have opened a branch at REPOS/bioperl-live/branches/restriction-refactor and am making commits at will there (won't Rob be pleased!). When it appears to be passing tests, I'll let Chris know (on list), and he can decide on its mergability, and brave users could try it out by downloading Bio/Restriction (deeply) via subversion. My running commentary is at Bug #2855. MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Rasmus Ory Nielsen" Sent: Thursday, June 11, 2009 10:19 AM Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when usingrebasefile. > Mark, > > Feel free to take it up. It's probably a good idea to start a bug report for > tracking if it proves to be thornier to fix than expected. > > chris > > On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> Rasmus et al- >> >> This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it >> cycles through >> all enzymes apparently creating a global cut map). AarI has a recognition >> sequence of >> >> CACCTGC (in $enz->seq->seq) >> >> but a cut site of >> >> CACCTGCNNNN^ (in $enz->seq->site) >> >> The bad parm '11' refers to the end of the cut site sequence, but the >> routine >> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition >> sequence, >> and so throws. >> >> This surprises me. Core, let me know if you want me to take this on, or >> if the module author can fix it quicker. >> >> cheers, >> Mark >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using >> rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so please >>> bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created >>> the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out the >>> '-enzymes' argument, so it uses the built-in collection of enzymes, it >>> works. >>> >>> My problem is, that I need to use some of the enzymes that are only >>> available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total length of >>> sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ >>> Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line commented >>> out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Sun Jun 14 21:57:45 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 14 Jun 2009 18:57:45 -0700 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when usingrebasefile. In-Reply-To: References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> Message-ID: <4A35AA99.2080305@cornell.edu> Mark A. Jensen wrote: > I'm finding this is requiring a pretty substantial refactor and > rationalization. I have opened a branch at > REPOS/bioperl-live/branches/restriction-refactor > and am making commits at will there (won't Rob be pleased!). Oh Mark, you are so agile! > When it appears to be passing tests, I'll let Chris know (on list), > and he can decide on its mergability, and brave users could try > it out by downloading Bio/Restriction (deeply) via subversion. If it's passing tests but still has bugs, make sure you add tests for the additional bugs you find! Rob From maj at fortinbras.us Sun Jun 14 22:02:37 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 14 Jun 2009 22:02:37 -0400 Subject: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen usingrebasefile. In-Reply-To: <4A35AA99.2080305@cornell.edu> References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife> <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu> <4A35AA99.2080305@cornell.edu> Message-ID: ----- Original Message ----- From: "Robert Buels" To: "BioPerl List" Sent: Sunday, June 14, 2009 9:57 PM Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen usingrebasefile. > Mark A. Jensen wrote: >> I'm finding this is requiring a pretty substantial refactor and >> rationalization. I have opened a branch at >> REPOS/bioperl-live/branches/restriction-refactor >> and am making commits at will there (won't Rob be pleased!). > Oh Mark, you are so agile! ha! > >> When it appears to be passing tests, I'll let Chris know (on list), >> and he can decide on its mergability, and brave users could try >> it out by downloading Bio/Restriction (deeply) via subversion. > If it's passing tests but still has bugs, make sure you add tests for the > additional bugs you find! mais, bien sur; plenty new tests coming-- thanks Rob- MAJ > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shalabh.sharma7 at gmail.com Mon Jun 15 16:06:31 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 15 Jun 2009 16:06:31 -0400 Subject: [Bioperl-l] sub sampling Message-ID: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com> Hi All, I was just wondering that is there any module is bioperl that do subsampling? I have a file like this: 369859 0477 93 163417 1348 92 228122 0176 88 232792 0050 93 239636 1850 95 300069 0048 96 244108 0046 91 199087 0055 93 206209 0048 96 - - - - - - which contain around 100,000 lines and i want to take out a sample of 25% from this file. Is there any way i can do this in Bioperl? Thanks Shalabh From maj at fortinbras.us Mon Jun 15 19:49:58 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 15 Jun 2009 19:49:58 -0400 Subject: [Bioperl-l] Bio::Restriction refactor [Was: Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: <4A2F622D.5060500@ron.dk> References: <4A2F622D.5060500@ron.dk> Message-ID: Dear All, The revamped Bio::Restriction::* in branch REPOS/bioperl-live/branches/restriction-refactor passes all existing tests, including those in t/Restriction. New tests will be added within the next day or so. The original bug occurred because only a subset of the possible rebase withrefm-formatted enzymes were handled; it choked on freshly-downloaded rebase files because of this. The refactored version now handles *all* rebase types, including those of rebase forms XXX^X [ intrasite cutters, the main types built in to base.pm] XXXX(m/n) [ right-end extrasite cutters ] (s/t)XXXX [ left-end ditto ] (s/t)XXXX(m/n) [ double-end ditto], palindromic and non-palindromic, as well as multisite enzymes that string together combinations of these forms. Much rationalization (well, seems rational to me anyway) and cruft removal in the affected code has also occurred. itype2.pm has been updated as well, to conform to the refactoring. If you're dying to try this now, get a working copy of the branch like so $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor bioperl-rr $ cd bioperl-rr $ perl Build.PL $ ./Build test $ ./Build install This will only hammer your current installation in the $SITE_LIB/Bio/Restriction path; I worked only on a sparse checkout of the necessary files. To revert to your old install, do $ cd $MY_OLD_BIOPERL_WORKINGDIR $ ./Build install [In the possible event that these instructions are in error, there will be a response on this list in a matter of milliseconds, so stand by.] Happy coding- Mark ----- Original Message ----- From: "Rasmus Ory Nielsen" To: Sent: Wednesday, June 10, 2009 3:35 AM Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebasefile. > Hi, > > This is my first time using bioperl for restriction analysis, so please bear > with me, if this is a FAQ. > > I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the > script shown at the bottom of the mail. > My bioperl version is bioperl-live nightly from 09-Jun-2009. > > The scripts throws an exception - see below. But, if I comment out the > '-enzymes' argument, so it uses the built-in collection of enzymes, it works. > > My problem is, that I need to use some of the enzymes that are only available > in rebase. So how do I get this working? > > Thanks for your attention. > > Best regards, > Rasmus Ory Nielsen > > > ############################################################ > Output from the script: > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > > ------------- EXCEPTION ------------- > MSG: Bad end parameter (11). End must be less than the total length of > sequence (total=7) > STACK Bio::PrimarySeq::subseq > /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 > STACK Bio::Restriction::Analysis::_enzyme_sites > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 > STACK Bio::Restriction::Analysis::_cuts > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 > STACK Bio::Restriction::Analysis::cut > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 > STACK Bio::Restriction::Analysis::fragment_maps > /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 > STACK toplevel ./restriction_test.pl:30 > ------------------------------------- > > [roni at ksdhcp ~]$ > > > ############################################################ > Output from the script with the '-enzymes' argument commented out > ############################################################ > > [roni at ksdhcp ~]$ ./restriction_test.pl > > --------------------- WARNING --------------------- > MSG: The enzyme name CviKI-1 was changed to CviKI-I > --------------------------------------------------- > $VAR1 = [ > { > 'seq' => 'CTCGACCGTTAGCAA', > 'end' => 15, > 'start' => '1' > }, > { > 'seq' => 'AGCTTTCTACCGTTATCGT', > 'end' => 34, > 'start' => '16' > } > ]; > [roni at ksdhcp ~]$ > > ############################################################ > > #!/usr/bin/perl > use strict; > use warnings; > use Bio::PrimarySeq; > use Bio::Restriction::IO; > use Bio::Restriction::Analysis; > use Data::Dumper; > > # create seq obj > my $seqobj = new Bio::PrimarySeq( > -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', > -primary_id => 'test', > -molecule => 'dna' > ); > > # read rebase file > my $rebase_io = Bio::Restriction::IO->new( > -file => 'withrefm.906', > -format => 'withrefm', > ); > my $rebase_collection = $rebase_io->read; > > # start restriction analysis > my $restriction_analysis = Bio::Restriction::Analysis->new( > -seq => $seqobj, > -enzymes => $rebase_collection, # it works with this line commented out > ); > > # retrieve fragment maps > my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); > print Dumper \@fragment_maps; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Jun 15 20:07:21 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 15 Jun 2009 20:07:21 -0400 Subject: [Bioperl-l] sub sampling In-Reply-To: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com> References: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com> Message-ID: Shalabh If you want to do sampling with replacement this is not bad (if you trust rand() ): # open your file into $my_infile, then @lines = <$my_infile>; my $num_samps = 10; my $sample_size_pc = 0.25; my @samples; for (1..$num_samps) { push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc * @lines) ) ]; } # now, do something, fr'instance my @sample_pc; foreach (@samples) { my $pct=0; foreach my $line (@lines[ @$_ ]) { @a = split(/\s+/,$line); $pct += $a[2]; } $pct /= @$_; push @sample_pc, $pct; } R's just better for some things, ain't it? MAJ ----- Original Message ----- From: "shalabh sharma" To: "bioperl-l" Sent: Monday, June 15, 2009 4:06 PM Subject: [Bioperl-l] sub sampling > Hi All, I was just wondering that is there any module is bioperl > that do subsampling? > I have a file like this: > > 369859 0477 93 > 163417 1348 92 > 228122 0176 88 > 232792 0050 93 > 239636 1850 95 > 300069 0048 96 > 244108 0046 91 > 199087 0055 93 > 206209 0048 96 > - - - > - - - > > which contain around 100,000 lines and i want to take out a sample of 25% > from this file. Is there any way i can do this in Bioperl? > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Xianjun.Dong at bccs.uib.no Sat Jun 13 08:05:53 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Sat, 13 Jun 2009 14:05:53 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> Message-ID: <4A339621.2060702@ii.uib.no> Hi, Scott Thanks for your reply first. I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function: $gd->filledRectangle($left+$start,0,$left+$end,$bottom, $panel->translate_color($h_color)); where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images. [I am not sure the mailist allow to attach image, otherwise, I put them in the following links: test.bioperl1.6.png: http://translog.genereg.net/test.bioperl1.6.png test.bioperl1.2.3.png: http://translog.genereg.net/test.bioperl1.2.3.png ] You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer? Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever) Thanks Xianjun ============================================= # this generates the callback for highlighting a region sub make_postgrid_callback { my $settings = shift; return unless ref $settings->{h_region}; my @h_regions = map { my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; defined($h_ref) && $h_ref eq $settings->{ref} ? [$h_start,$h_end,$h_color||'lightgrey'] : () } @{$settings->{h_region}}; return unless @h_regions; return hilite_regions_closure(@h_regions); } # this subroutine generates a Bio::Graphics::Panel callback closure # suitable for hilighting a region of a panel. # The args are a list of [start,end,color] sub hilite_regions_closure { my @h_regions = @_; return sub { my $gd = shift; my $panel = shift; my $left = $panel->pad_left; my $top = $panel->top; my $bottom = $panel->bottom; for my $r (@h_regions) { my ($h_start,$h_end,$h_color) = @$r; my ($start,$end) = $panel->location2pixel($h_start,$h_end); if ($end-$start <= 1) { $end++; $start-- } # so that we always see something # assuming top is 0 so as to ignore top padding $gd->filledRectangle($left+$start,0,$left+$end,$bottom, $panel->translate_color($h_color)); } }; } Scott Cain wrote: > Hello Xianjun, > > I don't think that approach will work. What you almost certainly need > to do is a postgrid callback that does the drawing of the highlighted > region. For example code of how to do this, take a look at the > make_postgrid_callback subroutine in GBrowse 1.69. The option > -postgrid is a method of Bio::Graphics::Panel. > > Scott > > > > > On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong wrote: > >> HI, >> >> I am not sure this is the right place I can get help. >> >> I've suffered by a problem for several days: I want to highlight parts of >> regions in my track, using a different background color. To do that, I >> defined a glyph named "background", based on the >> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >> method, by adding code like below: >> >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> >> # the script is pasted at the end >> >> This will draw a rectangle with top=0, bottom=$gd->height. I made the >> highlight regions into a list of features, and add_track with >> -glyph=>'background'. (see the following script, test.pl) This really works >> as I expect, which will add a colored block at background of all tracks in a >> panel (including the ruler arrow). You can see the output image in attached >> file "test.bioperl1.2.3.png" >> >> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not >> work. Well, it works, but the highlight part only shrink to a low height, >> instead of covering all tracks in the panel. I also attached the output >> here, see the file "test.bioperl1.6.png". >> >> I tried to think about the reason, the 'background' module is based on the >> generic module. What can cause the difference? Is it because $gd->height is >> different, or the tracks followed with 'background' track can not draw from >> the first position? >> >> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person >> solve problem, wise person avoid problem"...) But another problem is coming: >> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >> function, which means I have to use some higher version if I want to create >> web map for my graphics, but then I have to give up using highlight >> background. >> >> OK. It's long enough for my first-time submission here. Hope someone can >> throw me some clue. >> >> Thanks ahead!! >> >> Xianjun >> >> >> ==================== test.pl ======================= >> #!/usr/bin/perl >> >> use strict; >> use lib "$ENV{HOME}/lib"; >> >> use Bio::Graphics; >> use Bio::Graphics::Feature; >> my $ftr= 'Bio::Graphics::Feature'; >> >> # processed_transcript >> my $trans1 = >> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans5 = >> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >> my $trans = >> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >> >> # hightlight >> my $trans31 = >> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >> -source=>'a'); >> my $trans41 = >> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >> -source=>'b'); >> >> my $panel= Bio::Graphics::Panel->new(-width=>1200, >> -length=>1050, >> -start =>0, >> -pad_left=>12, >> -pad_right=>12); >> >> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >> and 1.6 >> $panel->add_track([$trans41,$trans31], >> -glyph => 'background', >> -block_bgcolor => sub{return (shift->source eq >> 'a')?'#cccccc':'#fffc22'}, >> ); >> >> $panel->add_track($ftr->new(-start=>100,-end=>1000), >> -glyph=>'arrow', >> -double=>1, >> -tick=>2); >> >> $panel->add_track($trans, >> -glyph => 'transcript2', # 'transcript2', #process_5utr', >> -fgcolor => 'darkred', >> -bgcolor => 'darkred', >> -title => '$source', >> -link => >> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', #EnsEMBL >> ); >> print $panel->png; >> >> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >> 1.2.3 >> my $map = $panel->create_web_map("image"); >> $panel->finished(); >> >> 1; >> >> ==================== background.pm ======================= >> package Bio::Graphics::Glyph::background; >> >> use strict; >> use base 'Bio::Graphics::Glyph::generic'; >> sub pad_top{ >> return 0; >> } >> >> sub draw_component { >> my $self = shift; >> #$self->SUPER::draw_component(@_); >> my ($gd,$dx,$dy) = @_; >> my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >> >> # draw an arrow to indicate the direction of transcript >> my $color = $self->option('block_bgcolor') || '#cccccc'; >> $gd->filledRectangle($left,0,$right,$gd->height, >> $self->factory->translate_color($color)); >> } >> >> 1; >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.2.3.png Type: image/png Size: 2789 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.bioperl1.6.png Type: image/png Size: 2365 bytes Desc: not available URL: From malcolm.cook at gmail.com Tue Jun 16 04:06:36 2009 From: malcolm.cook at gmail.com (Malcolm Cook) Date: Tue, 16 Jun 2009 03:06:36 -0500 Subject: [Bioperl-l] Alignment->slice() issue? Message-ID: Kevin, I'm getting struck by this old issue you once coded around. http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html Any chance you could share your implementation with fellow traveller... ?? Thanks, Malcolm Cook Stowers insitute for Medical research From remi.planel at free.fr Tue Jun 16 10:57:27 2009 From: remi.planel at free.fr (Remi Planel) Date: Tue, 16 Jun 2009 16:57:27 +0200 Subject: [Bioperl-l] Hits Object Message-ID: <4A37B2D7.70807@free.fr> Hi all, I couldn't find out from a Bio::Search::Result::ResultI object (obtain after parsing a blast report) a way to filter some of the hsps associated ? By filter I mean eliminate for each hit some hsps I'm not interested in ? Can I modify directly the Result object ? Thanks, From lsbrath at gmail.com Tue Jun 16 11:42:37 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Tue, 16 Jun 2009 11:42:37 -0400 Subject: [Bioperl-l] error message: can't call method "next_hit" on and undefined value Message-ID: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> Hello, My method produces an error message stating that it can't call a "next_hit" method on an undefined value. Hello, My method produces an error message stating that it can't call a "next_hit" method on an undefined value. sub hu_bl2seq_parser{ my ($maid, $maid_dir) = @_; # Get the report my $in = new Bio::SearchIO(-format => 'blast', -file => ">".$maid_dir."\\".$maid."aln_hu.aln", -report_type => 'blastn'); #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out"); #my $out = Bio::AlignIO->newFh(-format => 'clustalw' ); my $result=$in->next_result; my($hu_aln,$hu_mismatches); # Get info about the first hit my $hit = $result->next_hit; my $name = $hit->name; # get info about the first hsp of the first hit my $hsp = $hit->next_hsp; # get the alignment object my $aln = $hsp->get_aln; #my $percent_id = $hsp->percent_identity; #my $aln_length = $hsp->length('total'); my @mismatches = $hsp->seq_inds('query','nomatch'); my $aln_str=""; # access the alignment string my $strIO=IO::String->new($aln_str); # write the string alignio in clustalw format my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO); # now the actual alignment string is accessable for printing or in this case moving to a db table $alnio->write_aln($aln); $hu_aln=$aln_str; $hu_mismatches = scalar @mismatches; return($hu_aln, $hu_mismatches); } The problem is at "my $hit = $result->next_hit;" Any help will be appreciated. LomSpace From cjfields at illinois.edu Tue Jun 16 14:14:18 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 16 Jun 2009 13:14:18 -0500 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: <9A7FE5B3-29A2-4FAE-AE5A-945064DD8DB6@illinois.edu> I'll check out the branch sometime today and run tests on it. Thanks for the hard work Mark! chris On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote: > Dear All, > > There are tests for the new functionality of Bio::Restriction > now in t/Restriction on the branch, along with the withrefm.906 > in t/data that revealed the bug in RON's post. All tests pass without > warnings on my machine (which is bioperl live, perl 5.10.10, > under Vista/cygwin - yes, I still don't have a real computer). > We're ready for a merge on my end. > > Thanks all for your silent assent to these machinations. > cheers > Mark > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Rasmus Ory Nielsen" ; > Sent: Monday, June 15, 2009 7:49 PM > Subject: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > > >> Dear All, >> >> The revamped Bio::Restriction::* in branch >> >> REPOS/bioperl-live/branches/restriction-refactor >> >> passes all existing tests, including those in t/Restriction. >> New tests will be added within the next day or so. >> The original bug occurred because only a subset of >> the possible rebase withrefm-formatted enzymes were >> handled; it choked on freshly-downloaded rebase >> files because of this. >> >> The refactored version now handles *all* rebase types, >> including those of rebase forms >> >> XXX^X [ intrasite cutters, the main types >> built in to base.pm] >> XXXX(m/n) [ right-end extrasite cutters ] >> (s/t)XXXX [ left-end ditto ] >> (s/t)XXXX(m/n) [ double-end ditto], >> >> palindromic and non-palindromic, as well as multisite >> enzymes that string together combinations of these >> forms. Much rationalization (well, seems rational to me >> anyway) and cruft removal in the affected code has also >> occurred. itype2.pm has been updated as well, to >> conform to the refactoring. >> >> If you're dying to try this now, get a working copy >> of the branch like so >> >> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ >> restriction-refactor bioperl-rr >> $ cd bioperl-rr >> $ perl Build.PL >> $ ./Build test >> $ ./Build install >> >> This will only hammer your current installation in the >> $SITE_LIB/Bio/Restriction path; I worked only on >> a sparse checkout of the necessary files. To revert to your >> old install, do >> >> $ cd $MY_OLD_BIOPERL_WORKINGDIR >> $ ./Build install >> >> [In the possible event that these instructions are in error, >> there will be a response on this list in a matter of >> milliseconds, so stand by.] >> >> Happy coding- >> Mark >> >> >> >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when >> using rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so >>> please bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and >>> created the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out >>> the '-enzymes' argument, so it uses the built-in collection of >>> enzymes, it works. >>> >>> My problem is, that I need to use some of the enzymes that are >>> only available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total >>> length of sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ >>> 5.10.0/Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line >>> commented out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Tue Jun 16 13:58:56 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 13:58:56 -0400 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: Dear All, There are tests for the new functionality of Bio::Restriction now in t/Restriction on the branch, along with the withrefm.906 in t/data that revealed the bug in RON's post. All tests pass without warnings on my machine (which is bioperl live, perl 5.10.10, under Vista/cygwin - yes, I still don't have a real computer). We're ready for a merge on my end. Thanks all for your silent assent to these machinations. cheers Mark ----- Original Message ----- From: "Mark A. Jensen" To: "Rasmus Ory Nielsen" ; Sent: Monday, June 15, 2009 7:49 PM Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > Dear All, > > The revamped Bio::Restriction::* in branch > > REPOS/bioperl-live/branches/restriction-refactor > > passes all existing tests, including those in t/Restriction. > New tests will be added within the next day or so. > The original bug occurred because only a subset of > the possible rebase withrefm-formatted enzymes were > handled; it choked on freshly-downloaded rebase > files because of this. > > The refactored version now handles *all* rebase types, > including those of rebase forms > > XXX^X [ intrasite cutters, the main types > built in to base.pm] > XXXX(m/n) [ right-end extrasite cutters ] > (s/t)XXXX [ left-end ditto ] > (s/t)XXXX(m/n) [ double-end ditto], > > palindromic and non-palindromic, as well as multisite > enzymes that string together combinations of these > forms. Much rationalization (well, seems rational to me > anyway) and cruft removal in the affected code has also > occurred. itype2.pm has been updated as well, to > conform to the refactoring. > > If you're dying to try this now, get a working copy > of the branch like so > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor > bioperl-rr > $ cd bioperl-rr > $ perl Build.PL > $ ./Build test > $ ./Build install > > This will only hammer your current installation in the > $SITE_LIB/Bio/Restriction path; I worked only on > a sparse checkout of the necessary files. To revert to your > old install, do > > $ cd $MY_OLD_BIOPERL_WORKINGDIR > $ ./Build install > > [In the possible event that these instructions are in error, > there will be a response on this list in a matter of > milliseconds, so stand by.] > > Happy coding- > Mark > > > > > ----- Original Message ----- > From: "Rasmus Ory Nielsen" > To: > Sent: Wednesday, June 10, 2009 3:35 AM > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using > rebasefile. > > >> Hi, >> >> This is my first time using bioperl for restriction analysis, so please bear >> with me, if this is a FAQ. >> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the >> script shown at the bottom of the mail. >> My bioperl version is bioperl-live nightly from 09-Jun-2009. >> >> The scripts throws an exception - see below. But, if I comment out the >> '-enzymes' argument, so it uses the built-in collection of enzymes, it works. >> >> My problem is, that I need to use some of the enzymes that are only available >> in rebase. So how do I get this working? >> >> Thanks for your attention. >> >> Best regards, >> Rasmus Ory Nielsen >> >> >> ############################################################ >> Output from the script: >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: Bad end parameter (11). End must be less than the total length of >> sequence (total=7) >> STACK Bio::PrimarySeq::subseq >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 >> STACK Bio::Restriction::Analysis::_enzyme_sites >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >> STACK Bio::Restriction::Analysis::_cuts >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >> STACK Bio::Restriction::Analysis::cut >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >> STACK Bio::Restriction::Analysis::fragment_maps >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >> STACK toplevel ./restriction_test.pl:30 >> ------------------------------------- >> >> [roni at ksdhcp ~]$ >> >> >> ############################################################ >> Output from the script with the '-enzymes' argument commented out >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> $VAR1 = [ >> { >> 'seq' => 'CTCGACCGTTAGCAA', >> 'end' => 15, >> 'start' => '1' >> }, >> { >> 'seq' => 'AGCTTTCTACCGTTATCGT', >> 'end' => 34, >> 'start' => '16' >> } >> ]; >> [roni at ksdhcp ~]$ >> >> ############################################################ >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::PrimarySeq; >> use Bio::Restriction::IO; >> use Bio::Restriction::Analysis; >> use Data::Dumper; >> >> # create seq obj >> my $seqobj = new Bio::PrimarySeq( >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >> -primary_id => 'test', >> -molecule => 'dna' >> ); >> >> # read rebase file >> my $rebase_io = Bio::Restriction::IO->new( >> -file => 'withrefm.906', >> -format => 'withrefm', >> ); >> my $rebase_collection = $rebase_io->read; >> >> # start restriction analysis >> my $restriction_analysis = Bio::Restriction::Analysis->new( >> -seq => $seqobj, >> -enzymes => $rebase_collection, # it works with this line commented >> out >> ); >> >> # retrieve fragment maps >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >> print Dumper \@fragment_maps; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Jun 16 13:51:14 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 13:51:14 -0400 Subject: [Bioperl-l] Hits Object In-Reply-To: <4A37B2D7.70807@free.fr> Message-ID: <3766B1A38606458EB5FA24D24371433D@NewLife> Remi- have a look at http://www.bioperl.org/wiki/HOWTO:SearchIO and maybe http://www.bioperl.org/wiki/Parsing_BLAST_HSPs; perhaps your questions will be answered there- cheers, Mark From cjfields at illinois.edu Tue Jun 16 14:31:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 16 Jun 2009 13:31:10 -0500 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: Everything passes on my end (Mac OS X 10.5, perl 5.10.0). +1 on the merge. Also (as mentioned some time back w/ Hilmar among others), we can probably delete this branch seeing as the code will be merged to trunk (it being a feature branch and all). Worth doing the same for a few other feature branches as well. chris On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote: > Dear All, > > There are tests for the new functionality of Bio::Restriction > now in t/Restriction on the branch, along with the withrefm.906 > in t/data that revealed the bug in RON's post. All tests pass without > warnings on my machine (which is bioperl live, perl 5.10.10, > under Vista/cygwin - yes, I still don't have a real computer). > We're ready for a merge on my end. > > Thanks all for your silent assent to these machinations. > cheers > Mark > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Rasmus Ory Nielsen" ; > Sent: Monday, June 15, 2009 7:49 PM > Subject: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > > >> Dear All, >> >> The revamped Bio::Restriction::* in branch >> >> REPOS/bioperl-live/branches/restriction-refactor >> >> passes all existing tests, including those in t/Restriction. >> New tests will be added within the next day or so. >> The original bug occurred because only a subset of >> the possible rebase withrefm-formatted enzymes were >> handled; it choked on freshly-downloaded rebase >> files because of this. >> >> The refactored version now handles *all* rebase types, >> including those of rebase forms >> >> XXX^X [ intrasite cutters, the main types >> built in to base.pm] >> XXXX(m/n) [ right-end extrasite cutters ] >> (s/t)XXXX [ left-end ditto ] >> (s/t)XXXX(m/n) [ double-end ditto], >> >> palindromic and non-palindromic, as well as multisite >> enzymes that string together combinations of these >> forms. Much rationalization (well, seems rational to me >> anyway) and cruft removal in the affected code has also >> occurred. itype2.pm has been updated as well, to >> conform to the refactoring. >> >> If you're dying to try this now, get a working copy >> of the branch like so >> >> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ >> restriction-refactor bioperl-rr >> $ cd bioperl-rr >> $ perl Build.PL >> $ ./Build test >> $ ./Build install >> >> This will only hammer your current installation in the >> $SITE_LIB/Bio/Restriction path; I worked only on >> a sparse checkout of the necessary files. To revert to your >> old install, do >> >> $ cd $MY_OLD_BIOPERL_WORKINGDIR >> $ ./Build install >> >> [In the possible event that these instructions are in error, >> there will be a response on this list in a matter of >> milliseconds, so stand by.] >> >> Happy coding- >> Mark >> >> >> >> >> ----- Original Message ----- From: "Rasmus Ory Nielsen" >> To: >> Sent: Wednesday, June 10, 2009 3:35 AM >> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when >> using rebasefile. >> >> >>> Hi, >>> >>> This is my first time using bioperl for restriction analysis, so >>> please bear with me, if this is a FAQ. >>> >>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and >>> created the script shown at the bottom of the mail. >>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>> >>> The scripts throws an exception - see below. But, if I comment out >>> the '-enzymes' argument, so it uses the built-in collection of >>> enzymes, it works. >>> >>> My problem is, that I need to use some of the enzymes that are >>> only available in rebase. So how do I get this working? >>> >>> Thanks for your attention. >>> >>> Best regards, >>> Rasmus Ory Nielsen >>> >>> >>> ############################################################ >>> Output from the script: >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: Bad end parameter (11). End must be less than the total >>> length of sequence (total=7) >>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ >>> 5.10.0/Bio/PrimarySeq.pm:401 >>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>> STACK toplevel ./restriction_test.pl:30 >>> ------------------------------------- >>> >>> [roni at ksdhcp ~]$ >>> >>> >>> ############################################################ >>> Output from the script with the '-enzymes' argument commented out >>> ############################################################ >>> >>> [roni at ksdhcp ~]$ ./restriction_test.pl >>> >>> --------------------- WARNING --------------------- >>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>> --------------------------------------------------- >>> $VAR1 = [ >>> { >>> 'seq' => 'CTCGACCGTTAGCAA', >>> 'end' => 15, >>> 'start' => '1' >>> }, >>> { >>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>> 'end' => 34, >>> 'start' => '16' >>> } >>> ]; >>> [roni at ksdhcp ~]$ >>> >>> ############################################################ >>> >>> #!/usr/bin/perl >>> use strict; >>> use warnings; >>> use Bio::PrimarySeq; >>> use Bio::Restriction::IO; >>> use Bio::Restriction::Analysis; >>> use Data::Dumper; >>> >>> # create seq obj >>> my $seqobj = new Bio::PrimarySeq( >>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>> -primary_id => 'test', >>> -molecule => 'dna' >>> ); >>> >>> # read rebase file >>> my $rebase_io = Bio::Restriction::IO->new( >>> -file => 'withrefm.906', >>> -format => 'withrefm', >>> ); >>> my $rebase_collection = $rebase_io->read; >>> >>> # start restriction analysis >>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>> -seq => $seqobj, >>> -enzymes => $rebase_collection, # it works with this line >>> commented out >>> ); >>> >>> # retrieve fragment maps >>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>> print Dumper \@fragment_maps; >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From cjfields at illinois.edu Tue Jun 16 15:07:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 16 Jun 2009 14:07:44 -0500 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: References: Message-ID: Sounds to me like a BioPerl bug. Do you have some example data demonstrating the problem? chris On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote: > Kevin, > > I'm getting struck by this old issue you once coded around. > > http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html > > Any chance you could share your implementation with fellow > traveller... > > ?? > > Thanks, > > Malcolm Cook > Stowers insitute for Medical research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jun 16 15:32:02 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 15:32:02 -0400 Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined value In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> Message-ID: <91AC45F45A0F43D292323A711F0D5BDA@NewLife> lomspace- this my $in = new Bio::SearchIO(-format => 'blast', -file => ">".$maid_dir."\\".$maid."aln_hu.aln", -report_type => 'blastn'); should be my $in = new Bio::SearchIO(-format => 'blast', -file => $maid_dir."\\".$maid."aln_hu.aln", -report_type => 'blastn'); if you're reading the file. Then $result will have something in it when you do $in->next_result cheers, MAJ ----- Original Message ----- From: "Mgavi Brathwaite" To: Sent: Tuesday, June 16, 2009 11:42 AM Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined value > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. > > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. > > sub hu_bl2seq_parser{ > my ($maid, $maid_dir) = @_; > # Get the report > my $in = new Bio::SearchIO(-format => 'blast', > -file => ">".$maid_dir."\\".$maid."aln_hu.aln", > -report_type => 'blastn'); > #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out"); > #my $out = Bio::AlignIO->newFh(-format => 'clustalw' ); > my $result=$in->next_result; > my($hu_aln,$hu_mismatches); > # Get info about the first hit > my $hit = $result->next_hit; > my $name = $hit->name; > # get info about the first hsp of the first hit > my $hsp = $hit->next_hsp; > # get the alignment object > my $aln = $hsp->get_aln; > #my $percent_id = $hsp->percent_identity; > #my $aln_length = $hsp->length('total'); > my @mismatches = $hsp->seq_inds('query','nomatch'); > my $aln_str=""; > # access the alignment string > my $strIO=IO::String->new($aln_str); > # write the string alignio in clustalw format > my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO); > # now the actual alignment string is accessable for printing or in > this case moving to a db table > $alnio->write_aln($aln); > $hu_aln=$aln_str; > $hu_mismatches = scalar @mismatches; > return($hu_aln, $hu_mismatches); > } > > The problem is at "my $hit = $result->next_hit;" > Any help will be appreciated. > LomSpace > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Tue Jun 16 15:46:40 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 16 Jun 2009 12:46:40 -0700 Subject: [Bioperl-l] error message: can't call method "next_hit" on and undefined value In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com> Message-ID: <4A37F6A0.1080907@cornell.edu> Mgavi Brathwaite wrote: > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. Your proximate problem seems to be that you are prepending a '>' to the filename in your invocation of Bio::SearchIO::new, which I think might cause it to write to the file instead of reading from it. But also, you probably want to use next_result and next_hit in while loops, since they return undef when there are no more hits or hsps to parse. This is what is causing your "can't call next_hit on undefined value" error. next_result() returns undef when there are no results to parse. by while loops, I mean something like: while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { # insert the rest of your operations here } } Hope this helps. Rob > Hello, > My method produces an error message stating that it can't call a "next_hit" > method on an undefined value. > > sub hu_bl2seq_parser{ > my ($maid, $maid_dir) = @_; > # Get the report > my $in = new Bio::SearchIO(-format => 'blast', > -file => ">".$maid_dir."\\".$maid."aln_hu.aln", > -report_type => 'blastn'); > #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out"); > #my $out = Bio::AlignIO->newFh(-format => 'clustalw' ); > my $result=$in->next_result; > my($hu_aln,$hu_mismatches); > # Get info about the first hit > my $hit = $result->next_hit; > my $name = $hit->name; > # get info about the first hsp of the first hit > my $hsp = $hit->next_hsp; > # get the alignment object > my $aln = $hsp->get_aln; > #my $percent_id = $hsp->percent_identity; > #my $aln_length = $hsp->length('total'); > my @mismatches = $hsp->seq_inds('query','nomatch'); > my $aln_str=""; > # access the alignment string > my $strIO=IO::String->new($aln_str); > # write the string alignio in clustalw format > my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO); > # now the actual alignment string is accessable for printing or in > this case moving to a db table > $alnio->write_aln($aln); > $hu_aln=$aln_str; > $hu_mismatches = scalar @mismatches; > return($hu_aln, $hu_mismatches); > } > > The problem is at "my $hit = $result->next_hit;" > Any help will be appreciated. > LomSpace > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Jun 16 16:10:34 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 16:10:34 -0400 Subject: [Bioperl-l] Bio::Restriction refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: <61179C22E04F479686C7F5CFEC496FB0@NewLife> Right; will remove branch. Will go ahead with merge at 21:20 UTC. cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: ; "Rasmus Ory Nielsen" Sent: Tuesday, June 16, 2009 2:31 PM Subject: Re: [Bioperl-l] Bio::Restriction refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > Everything passes on my end (Mac OS X 10.5, perl 5.10.0). +1 on the merge. > > Also (as mentioned some time back w/ Hilmar among others), we can probably > delete this branch seeing as the code will be merged to trunk (it being a > feature branch and all). Worth doing the same for a few other feature > branches as well. > > chris > > On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote: > >> Dear All, >> >> There are tests for the new functionality of Bio::Restriction >> now in t/Restriction on the branch, along with the withrefm.906 >> in t/data that revealed the bug in RON's post. All tests pass without >> warnings on my machine (which is bioperl live, perl 5.10.10, >> under Vista/cygwin - yes, I still don't have a real computer). >> We're ready for a merge on my end. >> >> Thanks all for your silent assent to these machinations. >> cheers >> Mark >> >> ----- Original Message ----- From: "Mark A. Jensen" >> To: "Rasmus Ory Nielsen" ; >> Sent: Monday, June 15, 2009 7:49 PM >> Subject: [Bioperl-l] Bio::Restriction refactor >> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] >> >> >>> Dear All, >>> >>> The revamped Bio::Restriction::* in branch >>> >>> REPOS/bioperl-live/branches/restriction-refactor >>> >>> passes all existing tests, including those in t/Restriction. >>> New tests will be added within the next day or so. >>> The original bug occurred because only a subset of >>> the possible rebase withrefm-formatted enzymes were >>> handled; it choked on freshly-downloaded rebase >>> files because of this. >>> >>> The refactored version now handles *all* rebase types, >>> including those of rebase forms >>> >>> XXX^X [ intrasite cutters, the main types >>> built in to base.pm] >>> XXXX(m/n) [ right-end extrasite cutters ] >>> (s/t)XXXX [ left-end ditto ] >>> (s/t)XXXX(m/n) [ double-end ditto], >>> >>> palindromic and non-palindromic, as well as multisite >>> enzymes that string together combinations of these >>> forms. Much rationalization (well, seems rational to me >>> anyway) and cruft removal in the affected code has also >>> occurred. itype2.pm has been updated as well, to >>> conform to the refactoring. >>> >>> If you're dying to try this now, get a working copy >>> of the branch like so >>> >>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ >>> restriction-refactor bioperl-rr >>> $ cd bioperl-rr >>> $ perl Build.PL >>> $ ./Build test >>> $ ./Build install >>> >>> This will only hammer your current installation in the >>> $SITE_LIB/Bio/Restriction path; I worked only on >>> a sparse checkout of the necessary files. To revert to your >>> old install, do >>> >>> $ cd $MY_OLD_BIOPERL_WORKINGDIR >>> $ ./Build install >>> >>> [In the possible event that these instructions are in error, >>> there will be a response on this list in a matter of >>> milliseconds, so stand by.] >>> >>> Happy coding- >>> Mark >>> >>> >>> >>> >>> ----- Original Message ----- From: "Rasmus Ory Nielsen" >>> To: >>> Sent: Wednesday, June 10, 2009 3:35 AM >>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using >>> rebasefile. >>> >>> >>>> Hi, >>>> >>>> This is my first time using bioperl for restriction analysis, so please >>>> bear with me, if this is a FAQ. >>>> >>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created >>>> the script shown at the bottom of the mail. >>>> My bioperl version is bioperl-live nightly from 09-Jun-2009. >>>> >>>> The scripts throws an exception - see below. But, if I comment out the >>>> '-enzymes' argument, so it uses the built-in collection of enzymes, it >>>> works. >>>> >>>> My problem is, that I need to use some of the enzymes that are only >>>> available in rebase. So how do I get this working? >>>> >>>> Thanks for your attention. >>>> >>>> Best regards, >>>> Rasmus Ory Nielsen >>>> >>>> >>>> ############################################################ >>>> Output from the script: >>>> ############################################################ >>>> >>>> [roni at ksdhcp ~]$ ./restriction_test.pl >>>> >>>> --------------------- WARNING --------------------- >>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>>> --------------------------------------------------- >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: Bad end parameter (11). End must be less than the total length of >>>> sequence (total=7) >>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ >>>> 5.10.0/Bio/PrimarySeq.pm:401 >>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ >>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ >>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ >>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ >>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >>>> STACK toplevel ./restriction_test.pl:30 >>>> ------------------------------------- >>>> >>>> [roni at ksdhcp ~]$ >>>> >>>> >>>> ############################################################ >>>> Output from the script with the '-enzymes' argument commented out >>>> ############################################################ >>>> >>>> [roni at ksdhcp ~]$ ./restriction_test.pl >>>> >>>> --------------------- WARNING --------------------- >>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I >>>> --------------------------------------------------- >>>> $VAR1 = [ >>>> { >>>> 'seq' => 'CTCGACCGTTAGCAA', >>>> 'end' => 15, >>>> 'start' => '1' >>>> }, >>>> { >>>> 'seq' => 'AGCTTTCTACCGTTATCGT', >>>> 'end' => 34, >>>> 'start' => '16' >>>> } >>>> ]; >>>> [roni at ksdhcp ~]$ >>>> >>>> ############################################################ >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> use Bio::PrimarySeq; >>>> use Bio::Restriction::IO; >>>> use Bio::Restriction::Analysis; >>>> use Data::Dumper; >>>> >>>> # create seq obj >>>> my $seqobj = new Bio::PrimarySeq( >>>> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >>>> -primary_id => 'test', >>>> -molecule => 'dna' >>>> ); >>>> >>>> # read rebase file >>>> my $rebase_io = Bio::Restriction::IO->new( >>>> -file => 'withrefm.906', >>>> -format => 'withrefm', >>>> ); >>>> my $rebase_collection = $rebase_io->read; >>>> >>>> # start restriction analysis >>>> my $restriction_analysis = Bio::Restriction::Analysis->new( >>>> -seq => $seqobj, >>>> -enzymes => $rebase_collection, # it works with this line commented >>>> out >>>> ); >>>> >>>> # retrieve fragment maps >>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >>>> print Dumper \@fragment_maps; >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From MEC at stowers.org Tue Jun 16 16:13:33 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Tue, 16 Jun 2009 15:13:33 -0500 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: References: Message-ID: Chris! erm, yeah, I do.... ... and I will schedule some time to code up a test and add it to AlignI's suite.... Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Tuesday, June 16, 2009 2:08 PM > To: Malcolm Cook > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Alignment->slice() issue? > > Sounds to me like a BioPerl bug. Do you have some example > data demonstrating the problem? > > chris > > On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote: > > > Kevin, > > > > I'm getting struck by this old issue you once coded around. > > > > http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html > > > > Any chance you could share your implementation with fellow > > traveller... > > > > ?? > > > > Thanks, > > > > Malcolm Cook > > Stowers insitute for Medical research > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Jun 16 22:47:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 16 Jun 2009 22:47:39 -0400 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: References: <4A2F622D.5060500@ron.dk> Message-ID: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife> Dear All, The refactored Bio::Restriction::* has been merged to trunk, with all tests passing. [Anyone got a cigarette?] cheers, Mark ----- Original Message ----- From: "Mark A. Jensen" To: "Rasmus Ory Nielsen" ; Sent: Monday, June 15, 2009 7:49 PM Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > Dear All, > > The revamped Bio::Restriction::* in branch > > REPOS/bioperl-live/branches/restriction-refactor > > passes all existing tests, including those in t/Restriction. > New tests will be added within the next day or so. > The original bug occurred because only a subset of > the possible rebase withrefm-formatted enzymes were > handled; it choked on freshly-downloaded rebase > files because of this. > > The refactored version now handles *all* rebase types, > including those of rebase forms > > XXX^X [ intrasite cutters, the main types > built in to base.pm] > XXXX(m/n) [ right-end extrasite cutters ] > (s/t)XXXX [ left-end ditto ] > (s/t)XXXX(m/n) [ double-end ditto], > > palindromic and non-palindromic, as well as multisite > enzymes that string together combinations of these > forms. Much rationalization (well, seems rational to me > anyway) and cruft removal in the affected code has also > occurred. itype2.pm has been updated as well, to > conform to the refactoring. > > If you're dying to try this now, get a working copy > of the branch like so > > $ svn co > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor > bioperl-rr > $ cd bioperl-rr > $ perl Build.PL > $ ./Build test > $ ./Build install > > This will only hammer your current installation in the > $SITE_LIB/Bio/Restriction path; I worked only on > a sparse checkout of the necessary files. To revert to your > old install, do > > $ cd $MY_OLD_BIOPERL_WORKINGDIR > $ ./Build install > > [In the possible event that these instructions are in error, > there will be a response on this list in a matter of > milliseconds, so stand by.] > > Happy coding- > Mark > > > > > ----- Original Message ----- > From: "Rasmus Ory Nielsen" > To: > Sent: Wednesday, June 10, 2009 3:35 AM > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using > rebasefile. > > >> Hi, >> >> This is my first time using bioperl for restriction analysis, so please bear >> with me, if this is a FAQ. >> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the >> script shown at the bottom of the mail. >> My bioperl version is bioperl-live nightly from 09-Jun-2009. >> >> The scripts throws an exception - see below. But, if I comment out the >> '-enzymes' argument, so it uses the built-in collection of enzymes, it works. >> >> My problem is, that I need to use some of the enzymes that are only available >> in rebase. So how do I get this working? >> >> Thanks for your attention. >> >> Best regards, >> Rasmus Ory Nielsen >> >> >> ############################################################ >> Output from the script: >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> >> ------------- EXCEPTION ------------- >> MSG: Bad end parameter (11). End must be less than the total length of >> sequence (total=7) >> STACK Bio::PrimarySeq::subseq >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 >> STACK Bio::Restriction::Analysis::_enzyme_sites >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 >> STACK Bio::Restriction::Analysis::_cuts >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 >> STACK Bio::Restriction::Analysis::cut >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 >> STACK Bio::Restriction::Analysis::fragment_maps >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 >> STACK toplevel ./restriction_test.pl:30 >> ------------------------------------- >> >> [roni at ksdhcp ~]$ >> >> >> ############################################################ >> Output from the script with the '-enzymes' argument commented out >> ############################################################ >> >> [roni at ksdhcp ~]$ ./restriction_test.pl >> >> --------------------- WARNING --------------------- >> MSG: The enzyme name CviKI-1 was changed to CviKI-I >> --------------------------------------------------- >> $VAR1 = [ >> { >> 'seq' => 'CTCGACCGTTAGCAA', >> 'end' => 15, >> 'start' => '1' >> }, >> { >> 'seq' => 'AGCTTTCTACCGTTATCGT', >> 'end' => 34, >> 'start' => '16' >> } >> ]; >> [roni at ksdhcp ~]$ >> >> ############################################################ >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> use Bio::PrimarySeq; >> use Bio::Restriction::IO; >> use Bio::Restriction::Analysis; >> use Data::Dumper; >> >> # create seq obj >> my $seqobj = new Bio::PrimarySeq( >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', >> -primary_id => 'test', >> -molecule => 'dna' >> ); >> >> # read rebase file >> my $rebase_io = Bio::Restriction::IO->new( >> -file => 'withrefm.906', >> -format => 'withrefm', >> ); >> my $rebase_collection = $rebase_io->read; >> >> # start restriction analysis >> my $restriction_analysis = Bio::Restriction::Analysis->new( >> -seq => $seqobj, >> -enzymes => $rebase_collection, # it works with this line commented >> out >> ); >> >> # retrieve fragment maps >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); >> print Dumper \@fragment_maps; >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue Jun 16 23:21:22 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 17 Jun 2009 15:21:22 +1200 Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] In-Reply-To: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife> References: <4A2F622D.5060500@ron.dk> <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3297FF3E2E4@exchsth.agresearch.co.nz> Cigarettes are post-coitus and pre-firing squad. What you'd be needing is a cigar (proud father) ;-) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Wednesday, 17 June 2009 2:48 p.m. > To: bioperl-l at lists.open-bio.org > Cc: Rasmus Ory Nielsen > Subject: Re: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. Exception when using rebasefile.] > > Dear All, > > The refactored Bio::Restriction::* has been merged to trunk, with all > tests passing. [Anyone got a cigarette?] > > cheers, > Mark > > ----- Original Message ----- > From: "Mark A. Jensen" > To: "Rasmus Ory Nielsen" ; > Sent: Monday, June 15, 2009 7:49 PM > Subject: [Bioperl-l] Bio::Restriction refactor > [Was:Bio::Restriction::Analysis. > Exception when using rebasefile.] > > > > Dear All, > > > > The revamped Bio::Restriction::* in branch > > > > REPOS/bioperl-live/branches/restriction-refactor > > > > passes all existing tests, including those in t/Restriction. > > New tests will be added within the next day or so. > > The original bug occurred because only a subset of > > the possible rebase withrefm-formatted enzymes were > > handled; it choked on freshly-downloaded rebase > > files because of this. > > > > The refactored version now handles *all* rebase types, > > including those of rebase forms > > > > XXX^X [ intrasite cutters, the main types > > built in to base.pm] > > XXXX(m/n) [ right-end extrasite cutters ] > > (s/t)XXXX [ left-end ditto ] > > (s/t)XXXX(m/n) [ double-end ditto], > > > > palindromic and non-palindromic, as well as multisite > > enzymes that string together combinations of these > > forms. Much rationalization (well, seems rational to me > > anyway) and cruft removal in the affected code has also > > occurred. itype2.pm has been updated as well, to > > conform to the refactoring. > > > > If you're dying to try this now, get a working copy > > of the branch like so > > > > $ svn co > > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor > > bioperl-rr > > $ cd bioperl-rr > > $ perl Build.PL > > $ ./Build test > > $ ./Build install > > > > This will only hammer your current installation in the > > $SITE_LIB/Bio/Restriction path; I worked only on > > a sparse checkout of the necessary files. To revert to your > > old install, do > > > > $ cd $MY_OLD_BIOPERL_WORKINGDIR > > $ ./Build install > > > > [In the possible event that these instructions are in error, > > there will be a response on this list in a matter of > > milliseconds, so stand by.] > > > > Happy coding- > > Mark > > > > > > > > > > ----- Original Message ----- > > From: "Rasmus Ory Nielsen" > > To: > > Sent: Wednesday, June 10, 2009 3:35 AM > > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using > > rebasefile. > > > > > >> Hi, > >> > >> This is my first time using bioperl for restriction analysis, so please > bear > >> with me, if this is a FAQ. > >> > >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created > the > >> script shown at the bottom of the mail. > >> My bioperl version is bioperl-live nightly from 09-Jun-2009. > >> > >> The scripts throws an exception - see below. But, if I comment out the > >> '-enzymes' argument, so it uses the built-in collection of enzymes, it > works. > >> > >> My problem is, that I need to use some of the enzymes that are only > available > >> in rebase. So how do I get this working? > >> > >> Thanks for your attention. > >> > >> Best regards, > >> Rasmus Ory Nielsen > >> > >> > >> ############################################################ > >> Output from the script: > >> ############################################################ > >> > >> [roni at ksdhcp ~]$ ./restriction_test.pl > >> > >> --------------------- WARNING --------------------- > >> MSG: The enzyme name CviKI-1 was changed to CviKI-I > >> --------------------------------------------------- > >> > >> ------------- EXCEPTION ------------- > >> MSG: Bad end parameter (11). End must be less than the total length of > >> sequence (total=7) > >> STACK Bio::PrimarySeq::subseq > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401 > >> STACK Bio::Restriction::Analysis::_enzyme_sites > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900 > >> STACK Bio::Restriction::Analysis::_cuts > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801 > >> STACK Bio::Restriction::Analysis::cut > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379 > >> STACK Bio::Restriction::Analysis::fragment_maps > >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515 > >> STACK toplevel ./restriction_test.pl:30 > >> ------------------------------------- > >> > >> [roni at ksdhcp ~]$ > >> > >> > >> ############################################################ > >> Output from the script with the '-enzymes' argument commented out > >> ############################################################ > >> > >> [roni at ksdhcp ~]$ ./restriction_test.pl > >> > >> --------------------- WARNING --------------------- > >> MSG: The enzyme name CviKI-1 was changed to CviKI-I > >> --------------------------------------------------- > >> $VAR1 = [ > >> { > >> 'seq' => 'CTCGACCGTTAGCAA', > >> 'end' => 15, > >> 'start' => '1' > >> }, > >> { > >> 'seq' => 'AGCTTTCTACCGTTATCGT', > >> 'end' => 34, > >> 'start' => '16' > >> } > >> ]; > >> [roni at ksdhcp ~]$ > >> > >> ############################################################ > >> > >> #!/usr/bin/perl > >> use strict; > >> use warnings; > >> use Bio::PrimarySeq; > >> use Bio::Restriction::IO; > >> use Bio::Restriction::Analysis; > >> use Data::Dumper; > >> > >> # create seq obj > >> my $seqobj = new Bio::PrimarySeq( > >> -seq => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT', > >> -primary_id => 'test', > >> -molecule => 'dna' > >> ); > >> > >> # read rebase file > >> my $rebase_io = Bio::Restriction::IO->new( > >> -file => 'withrefm.906', > >> -format => 'withrefm', > >> ); > >> my $rebase_collection = $rebase_io->read; > >> > >> # start restriction analysis > >> my $restriction_analysis = Bio::Restriction::Analysis->new( > >> -seq => $seqobj, > >> -enzymes => $rebase_collection, # it works with this line commented > >> out > >> ); > >> > >> # retrieve fragment maps > >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII'); > >> print Dumper \@fragment_maps; > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From e.stupka at ucl.ac.uk Wed Jun 17 07:29:08 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 12:29:08 +0100 Subject: [Bioperl-l] Next-gen modules Message-ID: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Dear all, after several years of absence I am slowly coming back to Bioperl, and hope to contribute again to its development. One area that I was thinking of starting from, since we are actively involved with it, is to improve BIoperl's support fo next-gen sequencing data, tools, etc. Since I am sure I have missed out on a lot of recent developments, do let me know if/what is useful. One example that comes to mind is that the conversion of various formats to/from FASTQ does not seem to be supported. Some code can be found within Li Heng's script: http://maq.sourceforge.net/ fq_all2std.pl but it would be good if it could make its way into SeqIO? And similarly, potentially, for other next-gen sequence formats? Similarly, there seems to be little in bioperl-run to support tools that have been developed in this area, such as Maq, BowTie, TopHat, etc? Do let me know if there is a past thread on this, or other people actively developing, etc. so that I can find out what priorities are. thanks and best regards to all (old friends and new), Elia --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From maj at fortinbras.us Wed Jun 17 08:19:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 08:19:04 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <4C3D793879C64A5E84C67FE313C86FA4@NewLife> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl ] ----- Original Message ----- From: "Elia Stupka" To: Sent: Wednesday, June 17, 2009 7:29 AM Subject: [Bioperl-l] Next-gen modules > Dear all, > > after several years of absence I am slowly coming back to Bioperl, and > hope to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen > sequencing data, tools, etc. Since I am sure I have missed out on a > lot of recent developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various > formats to/from FASTQ does not seem to be supported. Some code can be > found within Li Heng's script: http://maq.sourceforge.net/ > fq_all2std.pl but it would be good if it could make its way into > SeqIO? And similarly, potentially, for other next-gen sequence formats? > > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, etc? > > Do let me know if there is a past thread on this, or other people > actively developing, etc. so that I can find out what priorities are. > > thanks and best regards to all (old friends and new), > > Elia > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Wed Jun 17 08:21:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Jun 2009 13:21:17 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <320fb6e00906170521m7d997334j321d92fda2da4114@mail.gmail.com> On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka wrote: > > Dear all, > > after several years of absence I am slowly coming back to Bioperl, and hope > to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen sequencing > data, tools, etc. Since I am sure I have missed out on a lot of recent > developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various formats > to/from FASTQ does not seem to be supported. Some code can be found within > Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be > good if it could make its way into SeqIO? And similarly, potentially, for > other next-gen sequence formats? If you do add FASTQ support to BioPerl's SeqIO (and I think that is a good idea), please could you follow the format names used by Biopython - as this time we got there first ;) I'm asking this as Biopython's SeqIO tries to use the same format names as BioPerl's SeqIO and EMBOSS, see http://biopython.org/wiki/SeqIO Specifically, * "fastq" in Biopython means the original Sanger standard FASTQ files encoding PHRED qualities using an ASCII offset of 33. * "fastq-solexa" in Biopython means the early Solexa/Illumina style FASTQ files which encode Solexa qualities using an ASCII offset of 64. * "fastq-illumina" in Biopython will mean recent Solexa/Illumina style FASTQ files (from pipeline version 1.3+) which encode PHRED qualities using an ASCII offset of 64. This is in the Biopython repository, but hasn't been released yet - so the name "fastq-illumina" isn't set in stone yet. For good quality reads, PHRED and Solexa scores are approximately equal, so the "fastq-solexa" and "fastq-illumina" variants are almost equivalent. > Similarly, there seems to be little in bioperl-run to support tools that > have been developed in this area, such as Maq, BowTie, TopHat, etc? > > Do let me know if there is a past thread on this, or other people actively > developing, etc. so that I can find out what priorities are. Have you seen these recent threads?: http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html Regards, Peter (at Biopython) From maj at fortinbras.us Wed Jun 17 08:02:11 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 08:02:11 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <92C15E3391F64BAF801754E924122540@NewLife> Elia-- I say a definite +1; in fact, this sounds like it should be a Hot Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some others you might have missed in your hiatus...). I will create a page that can be a central point for wish lists, discussion, etc. There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html cheers from a newbie, Mark ----- Original Message ----- From: "Elia Stupka" To: Sent: Wednesday, June 17, 2009 7:29 AM Subject: [Bioperl-l] Next-gen modules > Dear all, > > after several years of absence I am slowly coming back to Bioperl, and > hope to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen > sequencing data, tools, etc. Since I am sure I have missed out on a > lot of recent developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various > formats to/from FASTQ does not seem to be supported. Some code can be > found within Li Heng's script: http://maq.sourceforge.net/ > fq_all2std.pl but it would be good if it could make its way into > SeqIO? And similarly, potentially, for other next-gen sequence formats? > > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, etc? > > Do let me know if there is a past thread on this, or other people > actively developing, etc. so that I can find out what priorities are. > > thanks and best regards to all (old friends and new), > > Elia > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jun 17 08:57:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 07:57:52 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: Elia, As Mark indicated, we recently discussed the lack of support for next- gen on list, at least re: fastq. I may be hit with the same thing in a few months time myself, and I recall Jason and a few others also mentioning the same. Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but I don't believe it has been committed to trunk yet, so maybe he can answer. From prior discussions IIRC the issues were: 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina 1.3) from one another (so maybe some optional validation), and 2) having a way for the Seq object to either 'know' what format is contained, or we use phred score and convert back and forth from that (I think the latter makes more sense). Peter's suggestions also are reasonable, though does biopython have a separate module for each of these variations? Our version (I believe) mainly varied the conversion within Bio::SeqIO::fastq itself based on the fastq variant passed in as a separate named argument. As for the wrappers, we would most certainly welcome them! chris On Jun 17, 2009, at 6:29 AM, Elia Stupka wrote: > Dear all, > > after several years of absence I am slowly coming back to Bioperl, > and hope to contribute again to its development. > > One area that I was thinking of starting from, since we are actively > involved with it, is to improve BIoperl's support fo next-gen > sequencing data, tools, etc. Since I am sure I have missed out on a > lot of recent developments, do let me know if/what is useful. > > One example that comes to mind is that the conversion of various > formats to/from FASTQ does not seem to be supported. Some code can > be found within Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl > but it would be good if it could make its way into SeqIO? And > similarly, potentially, for other next-gen sequence formats? > > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, > etc? > > Do let me know if there is a past thread on this, or other people > actively developing, etc. so that I can find out what priorities are. > > thanks and best regards to all (old friends and new), > > Elia > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.stupka at ucl.ac.uk Wed Jun 17 08:54:22 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 13:54:22 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> Message-ID: Dear Mark, thanks a lot for the pointers. With regards to FASTQ parsing: -my understanding by reading past threads is to work on a single format, i.e. FASTQ and to interpet the quality "flavours" as just quality conversions, right? -However, I assume we would still want to support a simple way for the user to say format => 'fastq-solexa' using the nomenclature adopted in BioPython suggested by Peter, right? -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they would produce identical outputs, will be a good check. Finally, I saw Tristan's reply to Heikki's thread, so what is the status quo? Is it moving forward? cheers Elia On 17 Jun 2009, at 13:02, Mark A. Jensen wrote: > Elia-- > I say a definite +1; in fact, this sounds like it should be a Hot > Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some > others > you might have missed in your hiatus...). I will create a page that > can be a central point for wish lists, discussion, etc. > > There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html > > cheers from a newbie, Mark > > ----- Original Message ----- From: "Elia Stupka" > To: > Sent: Wednesday, June 17, 2009 7:29 AM > Subject: [Bioperl-l] Next-gen modules > > >> Dear all, >> after several years of absence I am slowly coming back to Bioperl, >> and hope to contribute again to its development. >> One area that I was thinking of starting from, since we are >> actively involved with it, is to improve BIoperl's support fo next- >> gen sequencing data, tools, etc. Since I am sure I have missed out >> on a lot of recent developments, do let me know if/what is useful. >> One example that comes to mind is that the conversion of various >> formats to/from FASTQ does not seem to be supported. Some code can >> be found within Li Heng's script: http://maq.sourceforge.net/ >> fq_all2std.pl but it would be good if it could make its way into >> SeqIO? And similarly, potentially, for other next-gen sequence >> formats? >> Similarly, there seems to be little in bioperl-run to support >> tools that have been developed in this area, such as Maq, BowTie, >> TopHat, etc? >> Do let me know if there is a past thread on this, or other people >> actively developing, etc. so that I can find out what priorities are. >> thanks and best regards to all (old friends and new), >> Elia >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From biopython at maubp.freeserve.co.uk Wed Jun 17 09:25:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Jun 2009 14:25:59 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields wrote: > > Elia, > > As Mark indicated, we recently discussed the lack of support for next-gen on > list, at least re: fastq. ?I may be hit with the same thing in a few months > time myself, and I recall Jason and a few others also mentioning the same. > ?Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but > I don't believe it has been committed to trunk yet, so maybe he can answer. > > From prior discussions IIRC the issues were: > > 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina > 1.3) from one another (so maybe some optional validation), and Following the python rule of thumb for being explicit, Biopython makes the user specify which FASTQ variant is being used. I don't think you can do anything else. Any attempted validation would have to be heuristic based on the ASCII characters found, and would risk false positive warnings. > 2) having a way for the Seq object to either 'know' what format is > contained, or we use phred score and convert back and forth from that (I > think the latter makes more sense). I think it could make sense for BioPerl to convert Solexa scores to/from PHRED scores on the fly (especially now that Illumina is abandoning the Solexa score system). Python style tries to avoid implicit conversions, so Biopython doesn't automatically do a conversion from Solexa to PHRED scores on parsing (but will on writing if the requested output format requires this). > Peter's suggestions also are reasonable, though does biopython have a > separate module for each of these variations? ?Our version (I believe) > mainly varied the conversion within Bio::SeqIO::fastq itself based on the > fastq variant passed in as a separate named argument. Biopython's SeqIO gives the three FASTQ variants their own unique names. This format name is a required argument for parsing/writing (we don't try and guess the file format from the data contents). Internally we have three separate FASTQ parsers/writers although they do share code. Other issues to keep in mind: (3) There should be no warning parsing files where the optional repeated title is missing on the "+" lines (as discussed earlier on the BioPerl list). (4) When writing FASTQ files should BioPerl omit the optional repeated title on the "+" line? Biopython omits this as I understand this to be common practice, and can make a big different to file sizes - especially on short read data from Solexa/Illumina. (5) Also test reading and writing files with an optional description (as well as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples, e.g. @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC (6) Test reading and writing files where the encoded quality string starts with a "@" or a "+" character, e.g. http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html Peter From tristan.lefebure at gmail.com Wed Jun 17 09:27:12 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 17 Jun 2009 09:27:12 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> Message-ID: <200906170927.13273.tristan.lefebure@gmail.com> Hello, Regarding next-gen sequences and bioperl, following my experience, another issue is bioperl speed. For example, if you want to trim bad quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, you've got to be patient (but may be I missed some shortcuts...). A pure perl solution will be between 100 to 1000x faster... Would it be possible to have an ultra-light quality object with few simple methods for next-gen reads? I can contribute some tests if that sounds like an important point. -Tristan On Wednesday 17 June 2009 08:02:11 Mark A. Jensen wrote: > Elia-- > I say a definite +1; in fact, this sounds like it should > be a Hot Topic (see > http://www.bioperl.org/wiki/Category:Hot_Topics for some > others you might have missed in your hiatus...). I will > create a page that can be a central point for wish lists, > discussion, etc. > > There has been much discussion of late about FASTQ > http://lists.open-bio.org/pipermail/bioperl-l/2009-June/0 >30187.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02 >9970.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02 >9911.html > http://lists.open-bio.org/pipermail/bioperl-l/2009-April/ >029765.html > > cheers from a newbie, > Mark > > ----- Original Message ----- > From: "Elia Stupka" > To: > Sent: Wednesday, June 17, 2009 7:29 AM > Subject: [Bioperl-l] Next-gen modules > > > Dear all, > > > > after several years of absence I am slowly coming back > > to Bioperl, and hope to contribute again to its > > development. > > > > One area that I was thinking of starting from, since we > > are actively involved with it, is to improve BIoperl's > > support fo next-gen sequencing data, tools, etc. Since > > I am sure I have missed out on a lot of recent > > developments, do let me know if/what is useful. > > > > One example that comes to mind is that the conversion > > of various formats to/from FASTQ does not seem to be > > supported. Some code can be found within Li Heng's > > script: http://maq.sourceforge.net/ fq_all2std.pl but > > it would be good if it could make its way into SeqIO? > > And similarly, potentially, for other next-gen sequence > > formats? > > > > Similarly, there seems to be little in bioperl-run to > > support tools that have been developed in this area, > > such as Maq, BowTie, TopHat, etc? > > > > Do let me know if there is a past thread on this, or > > other people actively developing, etc. so that I can > > find out what priorities are. > > > > thanks and best regards to all (old friends and new), > > > > Elia > > > > --- > > Senior Lecturer, Bioinformatics > > UCL Cancer Institute > > Paul O' Gorman Building > > University College London > > Gower Street > > WC1E 6BT > > London > > UK > > > > Office (UCL): +44 207 679 6493 > > Office (ICMS): +44 0207 8822374 > > > > Mobile: +44 7597 566 194 > > Mobile (Italy): +39 338 8448801 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Wed Jun 17 09:54:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Jun 2009 14:54:45 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> Message-ID: <320fb6e00906170654m735dc054iaf94fa2f86647002@mail.gmail.com> On Wed, Jun 17, 2009 at 1:54 PM, Elia Stupka wrote: > > Dear Mark, > > thanks a lot for the pointers. > > With regards to FASTQ parsing: > > -my understanding by reading past threads is to work on a single format, > i.e. FASTQ and to interpet the quality "flavours" as just quality > conversions, right? > -However, I assume we would still want to support a simple way for the user > to say format => 'fastq-solexa' using the nomenclature adopted in BioPython > suggested by Peter, right? I think you will need a way for the user to say they have a Solexa, or an Illumina 1.3+, or an original Sanger standard FASTQ file. >From reading the http://bioperl.org/wiki/HOWTO:SeqIO wiki page, I assumed BioPerl's SeqIO just had formats (e.g. the "chadoxml" format and the variant "flybase_chadoxml" format). Does BioPerl's SeqIO format system have any concept of flavour that I am not aware of? > -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code > at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they > would produce identical outputs, will be a good check. Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl is a useful guide (although it doesn't yet cope with the new Illumina 1.3+ variant), but I don't trust it 100%. See e.g. http://lists.open-bio.org/pipermail/biopython/2009-June/005208.html http://lists.open-bio.org/pipermail/biopython/2009-June/005209.html Peter From john.marshall at sanger.ac.uk Wed Jun 17 09:28:12 2009 From: john.marshall at sanger.ac.uk (John Marshall) Date: Wed, 17 Jun 2009 14:28:12 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk> On 17 Jun 2009, at 12:29, Elia Stupka wrote: > Similarly, there seems to be little in bioperl-run to support tools > that have been developed in this area, such as Maq, BowTie, TopHat, > etc? FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to submit in the not too distant future. (First it needs some "blah blah" replaced with actual documentation and a test suite.) Cheers, John [1] http://www.ebi.ac.uk/~zerbino/velvet/ -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From Kevin.M.Brown at asu.edu Wed Jun 17 11:41:18 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 17 Jun 2009 08:41:18 -0700 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu> Warning: This is very ugly code and makes a few assumptions, such as the alignment objects are stored in order of their start position. I made this assumption as that is how I put them into the object to begin with. =head1 C Function to slice up an alignment sequence based on start and end parameters and returns a new alignment object. slice($alignment, $start, $end) =cut sub slice { my ($alignment, $start, $end, $new_align) = @_; $$new_align = new Bio::SimpleAlign; print $$alignment->no_sequences() . "\n"; $$new_align->add_seq( new Bio::LocatableSeq( -seq => substr( $$alignment->get_seq_by_pos(1)->seq(), $start - 1, $end - $start + 1 ), -id => $$alignment->get_seq_by_pos(1)->display_id(), -start => max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1), -end => min( $$alignment->get_seq_by_pos(1)->end - $start + 1, $end - $start + 1 ), -alphabet => 'dna', -strand => $$alignment->get_seq_by_pos(1)->strand() ) ); # implement a binary search to determine a decent offset into the alignment my $probe; if ($$alignment->no_sequences() <= 2) { $probe = $$alignment->no_sequences(); } else { my ($L, $R) = (1, $$alignment->no_sequences()); while (($R - $L) > 1) { $probe = floor(($R + $L) / 2); # gotta watch this. Had the check backwards and so was never going # in the right direction for the search. If I reverse these two # variables, then I have to either reverse the conditions or change # the > to a <. if ($$alignment->get_seq_by_pos($probe)->start() > $start) { $R = $probe; } else { $L = $probe; } } } # now go through the results that are after that point for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++) { my $seq = $$alignment->get_seq_by_pos($i); last if ($seq->start() > $end); # Only concern ourselves with primers that land inside the desired region # other primers will show up in the image maps for each gene. if ($seq->start() >= $start && $seq->end() <= $end) { # values for the substr pullout of a given sequence my $offset = max($start - $seq->start(), 0); my $length = min($end, $seq->end()) - max($start, $seq->start()) + 1; $$new_align->add_seq( new Bio::LocatableSeq( -seq => $seq->seq(), -id => $seq->display_id(), -start => max($seq->start - $start + 1, 1), -end => min($seq->end - $start + 1, $end - $start + 1), -alphabet => 'dna', -strand => $seq->strand() ) ); } } return 1; } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Malcolm Cook > Sent: Tuesday, June 16, 2009 1:07 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Alignment->slice() issue? > > Kevin, > > I'm getting struck by this old issue you once coded around. > > http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html > > Any chance you could share your implementation with fellow > traveller... > > ?? > > Thanks, > > Malcolm Cook > Stowers insitute for Medical research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jun 17 12:47:38 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 12:47:38 -0400 Subject: [Bioperl-l] bioperl-dev or branch? : redux In-Reply-To: References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com> Message-ID: <6DF025D32D664F61BC64B49184A2E6DD@NewLife> Hi All, I thought I'd revisit this thread, since in the last couple weeks, have used both techniques (bioperl-dev and branch from trunk) to produce completed projects. My thoughts: Using bioperl-dev was very nice for creating Bio::Search::Tiling, a new addition to the core api. There was no pressure to conform to the existing api there. In particular, there was no implicit insistence to make things work through Bio::Search::Utils, and I was free to factor it out. The Tiling api was definitely unstable until the end, when it was ported to the core. As I made regular reports to bioperl-l, everything was transparent and up front, and I received excellent suggestions there (as usual). For Bio::Restriction, using the branch was just as natural. Here, the existing structure was well established, and all the work needed to happen beneath the api. All old t/Restriction tests needed to pass, and additional ones created for the new functionality. So here, using bioperl-dev wasn't natural, even though some "experiments" needed to be tried (some succeeded and some failed, as you can see in the commentary at Bug #2855). Even though the new code turned out to require substantial effort, the effort was required to fix a true bug in the working core, and any fixes needed to work transparently with respect to the users for whom this bug had not been an issue. Using the branch made it relatively easy to merge quickly back into the core when done, and there is a certain psychological pressure too provided by an open branch which is helpful. Hilmar raised the very good point in the previous discussion that (essentially) bioperl-dev shouldn't become a sandbox with lots of unfinished code scraps and derelict stuff that doesn't work. My view is bioperl-dev will become a sandbox only if we treat it like one. I've filled out the Bioperl-dev page on the wiki (http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing some recognition to devs there whose modules become part of the core may be a better way to insure that projects that are started on bioperl-dev actually get finished, than to prescribe beforehand what kinds of projects may get started. I believe this follows the adage of liberality on what is accepted, and strictness on what is emitted. cheers, MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Chase Miller" Cc: "BioPerl List" Sent: Thursday, May 21, 2009 4:00 PM Subject: Re: [Bioperl-l] bioperl-dev or branch? > Moving this question to the BioPerl list, which is where we need to > discuss this I think. Can someone refresh my memory on what the > Bioperl-dev repository is or was meant for? It doesn't seem documented > on the wiki. > > My (admittedly vague) recollection is that bioperl-dev is basically > for highly experimental changes or functionality. > > I'm not clear why everything else shouldn't go either into the main > trunk or into a branch. If there is a realistic expectation for > something to be folded into the main trunk sooner or later, what would > be the reasons for not putting it into a branch of the main > repository? If we are putting it into a separate repository, we're > waiving a lot of svn's support for merging and resolving concurrent > edits. > > I would also go actually go a step further and suggest that even if > this GSoC project starts out on a branch (which I can see good reasons > for, such as eliminating fear to disrupt something), there should be a > plan to move to main trunk before the end of the project. We've had a > good tradition in BioPerl of developing directly on the main trunk. It > sometimes leads to occasional disruptions when lots of tests seem > failing, but it also encourages development discipline and make new > code to melt into the BioPerl code base without requiring any extra > steps by someone. > > Any and all thoughts or comments welcome and appreciated! > > -hilmar > > On May 21, 2009, at 11:26 AM, Chase Miller wrote: > >> This brings me to a question about where I should have my code >> repository. Originally, I was going to use Bioperl-dev, but it was >> brought to my attention that that repository does not normally >> receive daily updates and it might not be the right place for my day >> to day development. An alternative would be to use something like >> google code on a daily basis and commit to Bioperl-dev on a weekly >> basis. > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jun 17 13:06:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 12:06:44 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> Message-ID: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> On Jun 17, 2009, at 8:25 AM, Peter wrote: > On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields > wrote: >> >> Elia, >> >> As Mark indicated, we recently discussed the lack of support for >> next-gen on >> list, at least re: fastq. I may be hit with the same thing in a >> few months >> time myself, and I recall Jason and a few others also mentioning >> the same. >> Heikki wrote some code for Illumina FASTQ for SeqIO and related >> modules but >> I don't believe it has been committed to trunk yet, so maybe he can >> answer. >> >> From prior discussions IIRC the issues were: >> >> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, >> Illumina >> 1.3) from one another (so maybe some optional validation), and > > Following the python rule of thumb for being explicit, Biopython makes > the user specify which FASTQ variant is being used. I don't think you > can do anything else. Any attempted validation would have to be > heuristic based on the ASCII characters found, and would risk false > positive warnings. Right; I'm thinking along the same lines. If anything the most we would allow is some level of validation, so if there were a degree of uncertainty about the format one could set a validation flag to check bounds during the parse and warn if they are exceeded. >> 2) having a way for the Seq object to either 'know' what format is >> contained, or we use phred score and convert back and forth from >> that (I >> think the latter makes more sense). > > I think it could make sense for BioPerl to convert Solexa scores to/ > from > PHRED scores on the fly (especially now that Illumina is abandoning > the Solexa score system). Python style tries to avoid implicit > conversions, > so Biopython doesn't automatically do a conversion from Solexa to > PHRED scores on parsing (but will on writing if the requested output > format requires this). > >> Peter's suggestions also are reasonable, though does biopython have a >> separate module for each of these variations? Our version (I >> believe) >> mainly varied the conversion within Bio::SeqIO::fastq itself based >> on the >> fastq variant passed in as a separate named argument. > > Biopython's SeqIO gives the three FASTQ variants their own unique > names. This format name is a required argument for parsing/writing > (we don't try and guess the file format from the data contents). > Internally > we have three separate FASTQ parsers/writers although they do share > code. We could easily do the same if others agree. Actually, if we specified that shorthand for a variant on a format would be designated as -format => 'format-variant', I think we could easily hack SeqIO to deal with that by splitting on '-' and passing everything to the constructor as (-format => 'format', -variant => 'variant'). Very little repeated code in this case, just an additional named parameter indicating the format variant (and the SeqIO class can do the type checking on that within the constructor). > Other issues to keep in mind: > > (3) There should be no warning parsing files where the optional > repeated > title is missing on the "+" lines (as discussed earlier on the > BioPerl list). Agreed, though we'll have to check the current fastq parser to see if that's currently the case. I thought that was fixed but maybe not? > (4) When writing FASTQ files should BioPerl omit the optional repeated > title on the "+" line? Biopython omits this as I understand this to be > common practice, and can make a big different to file sizes - > especially > on short read data from Solexa/Illumina. Agreed, particularly if it's commonly encountered. > (5) Also test reading and writing files with an optional description > (as well > as an identifier) on the "@" (and "+") lines. See the NCBI SRA for > examples, > e.g. > > @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 > GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC > +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 > IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC Should be easy enough to implement with a simple regex. > (6) Test reading and writing files where the encoded quality string > starts > with a "@" or a "+" character, e.g. > http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html > > Peter Mark, getting all that? ;> chris From cjfields at illinois.edu Wed Jun 17 13:09:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 12:09:54 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: > Hello, > Regarding next-gen sequences and bioperl, following my > experience, another issue is bioperl speed. For example, if > you want to trim bad quality bases at ends of 1E6 Solexa > reads using Bio::SeqIO::fastq and some methods in > Bio::Seq::Quality, well, you've got to be patient (but may > be I missed some shortcuts...). The key issues affecting speed in bioperl are contained object instantiation and inheritance (and between those two, the latter much more so as it plays a role with contained objects as well as the container). http://www.bioperl.org/wiki/Why_BioPerl_is_slow Moose/Perl6 roles/traits are one way around that issue, but we are a ways off from getting that running. I think to get that working decently would be a from-ground-up endeavor (see my past posts on biomoose/bioperl6). > A pure perl solution will be between 100 to 1000x faster... > Would it be possible to have an ultra-light quality object > with few simple methods for next-gen reads? > > I can contribute some tests if that sounds like an important > point. > > -Tristan The quality objects themselves I don't think are that heavy; I think the main impediment is inheritance. One could get around that a bit by using a direct_new method to create a blessed hash directly, then reimplement methods to lazily create any objects contained on the fly. chris From bill at genenformics.com Wed Jun 17 13:03:16 2009 From: bill at genenformics.com (bill at genenformics.com) Date: Wed, 17 Jun 2009 10:03:16 -0700 Subject: [Bioperl-l] Alignment->slice() issue? In-Reply-To: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu> Message-ID: <92dadb76ce7d7b8eeb4644b47ef1a81f.squirrel@mail.dreamhost.com> Hopefully this is helpful. http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/seqalign/Dense_seg.cpp#L648 Bill at genenformics > Warning: This is very ugly code and makes a few assumptions, such as the > alignment objects are stored in order of their start position. I made > this assumption as that is how I put them into the object to begin with. > > =head1 C > > Function to slice up an alignment sequence based on start and end > parameters > and returns a new alignment object. > > slice($alignment, $start, $end) > > =cut > > sub slice > { > my ($alignment, $start, $end, $new_align) = @_; > > $$new_align = new Bio::SimpleAlign; > print $$alignment->no_sequences() . "\n"; > > $$new_align->add_seq( > new Bio::LocatableSeq( > -seq => > substr( > > $$alignment->get_seq_by_pos(1)->seq(), > $start - 1, $end > - $start + 1 > ), > -id => > $$alignment->get_seq_by_pos(1)->display_id(), > -start => > > max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1), > -end => min( > > $$alignment->get_seq_by_pos(1)->end - $start + 1, > $end - $start > + 1 > ), > -alphabet => 'dna', > -strand => > $$alignment->get_seq_by_pos(1)->strand() > ) > ); > > # implement a binary search to determine a decent offset into > the alignment > my $probe; > > if ($$alignment->no_sequences() <= 2) { > $probe = $$alignment->no_sequences(); > } > else { > my ($L, $R) = (1, $$alignment->no_sequences()); > while (($R - $L) > 1) > { > $probe = floor(($R + $L) / 2); > > # gotta watch this. Had the check backwards and so was > never going > # in the right direction for the search. If I reverse > these two > # variables, then I have to either reverse the > conditions or change > # the > to a <. > if ($$alignment->get_seq_by_pos($probe)->start() > > $start) > { > $R = $probe; > } > else > { > $L = $probe; > } > } > } > # now go through the results that are after that point > for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++) > { > my $seq = $$alignment->get_seq_by_pos($i); > last if ($seq->start() > $end); > > # Only concern ourselves with primers that land inside > the desired region > # other primers will show up in the image maps for each > gene. > if ($seq->start() >= $start && $seq->end() <= $end) > { > > # values for the substr pullout of a given > sequence > my $offset = max($start - $seq->start(), 0); > my $length = > min($end, $seq->end()) - max($start, > $seq->start()) + 1; > $$new_align->add_seq( > new Bio::LocatableSeq( > -seq => $seq->seq(), > -id => > $seq->display_id(), > -start => > max($seq->start - $start + 1, 1), > -end => min($seq->end - > $start + 1, $end - $start + 1), > -alphabet => 'dna', > -strand => > $seq->strand() > ) > ); > } > } > return 1; > } > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Malcolm Cook >> Sent: Tuesday, June 16, 2009 1:07 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Alignment->slice() issue? >> >> Kevin, >> >> I'm getting struck by this old issue you once coded around. >> >> http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html >> >> Any chance you could share your implementation with fellow >> traveller... >> >> ?? >> >> Thanks, >> >> Malcolm Cook >> Stowers insitute for Medical research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Jun 17 13:13:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 13:13:23 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> Message-ID: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> I'm on the case! (but maybe not in realtime, today!) ----- Original Message ----- From: "Chris Fields" To: "Peter" Cc: "BioPerl List" ; "Elia Stupka" ; "Heikki Lehvaslaiho" Sent: Wednesday, June 17, 2009 1:06 PM Subject: Re: [Bioperl-l] Next-gen modules > > On Jun 17, 2009, at 8:25 AM, Peter wrote: > >> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields wrote: >>> >>> Elia, >>> >>> As Mark indicated, we recently discussed the lack of support for next-gen >>> on >>> list, at least re: fastq. I may be hit with the same thing in a few months >>> time myself, and I recall Jason and a few others also mentioning the same. >>> Heikki wrote some code for Illumina FASTQ for SeqIO and related modules >>> but >>> I don't believe it has been committed to trunk yet, so maybe he can answer. >>> >>> From prior discussions IIRC the issues were: >>> >>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, >>> Illumina >>> 1.3) from one another (so maybe some optional validation), and >> >> Following the python rule of thumb for being explicit, Biopython makes >> the user specify which FASTQ variant is being used. I don't think you >> can do anything else. Any attempted validation would have to be >> heuristic based on the ASCII characters found, and would risk false >> positive warnings. > > Right; I'm thinking along the same lines. If anything the most we would > allow is some level of validation, so if there were a degree of uncertainty > about the format one could set a validation flag to check bounds during the > parse and warn if they are exceeded. > >>> 2) having a way for the Seq object to either 'know' what format is >>> contained, or we use phred score and convert back and forth from that (I >>> think the latter makes more sense). >> >> I think it could make sense for BioPerl to convert Solexa scores to/ from >> PHRED scores on the fly (especially now that Illumina is abandoning >> the Solexa score system). Python style tries to avoid implicit conversions, >> so Biopython doesn't automatically do a conversion from Solexa to >> PHRED scores on parsing (but will on writing if the requested output >> format requires this). >> >>> Peter's suggestions also are reasonable, though does biopython have a >>> separate module for each of these variations? Our version (I believe) >>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the >>> fastq variant passed in as a separate named argument. >> >> Biopython's SeqIO gives the three FASTQ variants their own unique >> names. This format name is a required argument for parsing/writing >> (we don't try and guess the file format from the data contents). Internally >> we have three separate FASTQ parsers/writers although they do share >> code. > > We could easily do the same if others agree. Actually, if we specified that > shorthand for a variant on a format would be designated as -format => > 'format-variant', I think we could easily hack SeqIO to deal with that by > splitting on '-' and passing everything to the constructor as (-format => > 'format', -variant => 'variant'). Very little repeated code in this case, > just an additional named parameter indicating the format variant (and the > SeqIO class can do the type checking on that within the constructor). > >> Other issues to keep in mind: >> >> (3) There should be no warning parsing files where the optional repeated >> title is missing on the "+" lines (as discussed earlier on the BioPerl >> list). > > Agreed, though we'll have to check the current fastq parser to see if that's > currently the case. I thought that was fixed but maybe not? > >> (4) When writing FASTQ files should BioPerl omit the optional repeated >> title on the "+" line? Biopython omits this as I understand this to be >> common practice, and can make a big different to file sizes - especially >> on short read data from Solexa/Illumina. > > Agreed, particularly if it's commonly encountered. > >> (5) Also test reading and writing files with an optional description (as >> well >> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples, >> e.g. >> >> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC > > Should be easy enough to implement with a simple regex. > >> (6) Test reading and writing files where the encoded quality string starts >> with a "@" or a "+" character, e.g. >> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >> >> Peter > > Mark, getting all that? ;> > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From e.stupka at ucl.ac.uk Wed Jun 17 13:49:38 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 18:49:38 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: I would suggest developing the "standard" version first, then moving onto potential optimizations. When we went through a similar argument in Ensembl about 8 years ago we ended up dropping Bio::Root completely... If one is truly after performance for these large next-gen projects, it'd be down to pure piping, shell, and worrying about location and copying of files, sticking to systems-level as much as possible, and quite far from Bioperl altogether, so I think it's a whole different level of optimization issues, probably outside the scope of Bioperl. Elia On 17 Jun 2009, at 18:09, Chris Fields wrote: > > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: > >> Hello, >> Regarding next-gen sequences and bioperl, following my >> experience, another issue is bioperl speed. For example, if >> you want to trim bad quality bases at ends of 1E6 Solexa >> reads using Bio::SeqIO::fastq and some methods in >> Bio::Seq::Quality, well, you've got to be patient (but may >> be I missed some shortcuts...). > > The key issues affecting speed in bioperl are contained object > instantiation and inheritance (and between those two, the latter > much more so as it plays a role with contained objects as well as > the container). > > http://www.bioperl.org/wiki/Why_BioPerl_is_slow > > Moose/Perl6 roles/traits are one way around that issue, but we are a > ways off from getting that running. I think to get that working > decently would be a from-ground-up endeavor (see my past posts on > biomoose/bioperl6). > >> A pure perl solution will be between 100 to 1000x faster... >> Would it be possible to have an ultra-light quality object >> with few simple methods for next-gen reads? >> >> I can contribute some tests if that sounds like an important >> point. >> >> -Tristan > > The quality objects themselves I don't think are that heavy; I think > the main impediment is inheritance. One could get around that a bit > by using a direct_new method to create a blessed hash directly, then > reimplement methods to lazily create any objects contained on the fly. > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From cjfields at illinois.edu Wed Jun 17 13:52:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 12:52:49 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> Message-ID: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu> I think this is a top priority for a fall BioPerl release, maybe 1.6.2 (I am planning on a summer 1.6.1 release still). Made it into a bug report for tracking: http://bugzilla.open-bio.org/show_bug.cgi?id=2857 If no one works on this I may take it up after the 1.6.1 release. chris On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote: > I'm on the case! (but maybe not in realtime, today!) > > ----- Original Message ----- From: "Chris Fields" > > To: "Peter" > Cc: "BioPerl List" ; "Elia Stupka" >; "Heikki Lehvaslaiho" > Sent: Wednesday, June 17, 2009 1:06 PM > Subject: Re: [Bioperl-l] Next-gen modules > > >> >> On Jun 17, 2009, at 8:25 AM, Peter wrote: >> >>> On Wed, Jun 17, 2009 at 1:57 PM, Chris >>> Fields wrote: >>>> >>>> Elia, >>>> >>>> As Mark indicated, we recently discussed the lack of support for >>>> next-gen on >>>> list, at least re: fastq. I may be hit with the same thing in a >>>> few months >>>> time myself, and I recall Jason and a few others also mentioning >>>> the same. >>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related >>>> modules but >>>> I don't believe it has been committed to trunk yet, so maybe he >>>> can answer. >>>> >>>> From prior discussions IIRC the issues were: >>>> >>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina >>>> 1.0, Illumina >>>> 1.3) from one another (so maybe some optional validation), and >>> >>> Following the python rule of thumb for being explicit, Biopython >>> makes >>> the user specify which FASTQ variant is being used. I don't think >>> you >>> can do anything else. Any attempted validation would have to be >>> heuristic based on the ASCII characters found, and would risk false >>> positive warnings. >> >> Right; I'm thinking along the same lines. If anything the most we >> would allow is some level of validation, so if there were a degree >> of uncertainty about the format one could set a validation flag to >> check bounds during the parse and warn if they are exceeded. >> >>>> 2) having a way for the Seq object to either 'know' what format is >>>> contained, or we use phred score and convert back and forth from >>>> that (I >>>> think the latter makes more sense). >>> >>> I think it could make sense for BioPerl to convert Solexa scores >>> to/ from >>> PHRED scores on the fly (especially now that Illumina is abandoning >>> the Solexa score system). Python style tries to avoid implicit >>> conversions, >>> so Biopython doesn't automatically do a conversion from Solexa to >>> PHRED scores on parsing (but will on writing if the requested output >>> format requires this). >>> >>>> Peter's suggestions also are reasonable, though does biopython >>>> have a >>>> separate module for each of these variations? Our version (I >>>> believe) >>>> mainly varied the conversion within Bio::SeqIO::fastq itself >>>> based on the >>>> fastq variant passed in as a separate named argument. >>> >>> Biopython's SeqIO gives the three FASTQ variants their own unique >>> names. This format name is a required argument for parsing/writing >>> (we don't try and guess the file format from the data contents). >>> Internally >>> we have three separate FASTQ parsers/writers although they do share >>> code. >> >> We could easily do the same if others agree. Actually, if we >> specified that shorthand for a variant on a format would be >> designated as -format => 'format-variant', I think we could easily >> hack SeqIO to deal with that by splitting on '-' and passing >> everything to the constructor as (-format => 'format', -variant => >> 'variant'). Very little repeated code in this case, just an >> additional named parameter indicating the format variant (and the >> SeqIO class can do the type checking on that within the >> constructor). >> >>> Other issues to keep in mind: >>> >>> (3) There should be no warning parsing files where the optional >>> repeated >>> title is missing on the "+" lines (as discussed earlier on the >>> BioPerl list). >> >> Agreed, though we'll have to check the current fastq parser to see >> if that's currently the case. I thought that was fixed but maybe >> not? >> >>> (4) When writing FASTQ files should BioPerl omit the optional >>> repeated >>> title on the "+" line? Biopython omits this as I understand this >>> to be >>> common practice, and can make a big different to file sizes - >>> especially >>> on short read data from Solexa/Illumina. >> >> Agreed, particularly if it's commonly encountered. >> >>> (5) Also test reading and writing files with an optional >>> description (as well >>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA >>> for examples, >>> e.g. >>> >>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC >> >> Should be easy enough to implement with a simple regex. >> >>> (6) Test reading and writing files where the encoded quality >>> string starts >>> with a "@" or a "+" character, e.g. >>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >>> >>> Peter >> >> Mark, getting all that? ;> >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.stupka at ucl.ac.uk Wed Jun 17 14:01:28 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 19:01:28 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife> <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu> Message-ID: If we reach a consensus on how/who/what, I will be happy to contribute some coding time in the coming days. Would it be a good starting point to start adding the different formats as named in BioPython, and test support for reading/wrting them? I could start playing with that. regards, Elia On 17 Jun 2009, at 18:52, Chris Fields wrote: > I think this is a top priority for a fall BioPerl release, maybe > 1.6.2 (I am planning on a summer 1.6.1 release still). Made it into > a bug report for tracking: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2857 > > If no one works on this I may take it up after the 1.6.1 release. > > chris > > On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote: > >> I'm on the case! (but maybe not in realtime, today!) >> >> ----- Original Message ----- From: "Chris Fields" > > >> To: "Peter" >> Cc: "BioPerl List" ; "Elia Stupka" > >; "Heikki Lehvaslaiho" >> Sent: Wednesday, June 17, 2009 1:06 PM >> Subject: Re: [Bioperl-l] Next-gen modules >> >> >>> >>> On Jun 17, 2009, at 8:25 AM, Peter wrote: >>> >>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris >>>> Fields wrote: >>>>> >>>>> Elia, >>>>> >>>>> As Mark indicated, we recently discussed the lack of support >>>>> for next-gen on >>>>> list, at least re: fastq. I may be hit with the same thing in >>>>> a few months >>>>> time myself, and I recall Jason and a few others also >>>>> mentioning the same. >>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related >>>>> modules but >>>>> I don't believe it has been committed to trunk yet, so maybe he >>>>> can answer. >>>>> >>>>> From prior discussions IIRC the issues were: >>>>> >>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina >>>>> 1.0, Illumina >>>>> 1.3) from one another (so maybe some optional validation), and >>>> >>>> Following the python rule of thumb for being explicit, Biopython >>>> makes >>>> the user specify which FASTQ variant is being used. I don't think >>>> you >>>> can do anything else. Any attempted validation would have to be >>>> heuristic based on the ASCII characters found, and would risk false >>>> positive warnings. >>> >>> Right; I'm thinking along the same lines. If anything the most >>> we would allow is some level of validation, so if there were a >>> degree of uncertainty about the format one could set a validation >>> flag to check bounds during the parse and warn if they are >>> exceeded. >>> >>>>> 2) having a way for the Seq object to either 'know' what format is >>>>> contained, or we use phred score and convert back and forth >>>>> from that (I >>>>> think the latter makes more sense). >>>> >>>> I think it could make sense for BioPerl to convert Solexa scores >>>> to/ from >>>> PHRED scores on the fly (especially now that Illumina is abandoning >>>> the Solexa score system). Python style tries to avoid implicit >>>> conversions, >>>> so Biopython doesn't automatically do a conversion from Solexa to >>>> PHRED scores on parsing (but will on writing if the requested >>>> output >>>> format requires this). >>>> >>>>> Peter's suggestions also are reasonable, though does biopython >>>>> have a >>>>> separate module for each of these variations? Our version (I >>>>> believe) >>>>> mainly varied the conversion within Bio::SeqIO::fastq itself >>>>> based on the >>>>> fastq variant passed in as a separate named argument. >>>> >>>> Biopython's SeqIO gives the three FASTQ variants their own unique >>>> names. This format name is a required argument for parsing/writing >>>> (we don't try and guess the file format from the data contents). >>>> Internally >>>> we have three separate FASTQ parsers/writers although they do share >>>> code. >>> >>> We could easily do the same if others agree. Actually, if we >>> specified that shorthand for a variant on a format would be >>> designated as -format => 'format-variant', I think we could >>> easily hack SeqIO to deal with that by splitting on '-' and >>> passing everything to the constructor as (-format => 'format', - >>> variant => 'variant'). Very little repeated code in this case, >>> just an additional named parameter indicating the format variant >>> (and the SeqIO class can do the type checking on that within the >>> constructor). >>> >>>> Other issues to keep in mind: >>>> >>>> (3) There should be no warning parsing files where the optional >>>> repeated >>>> title is missing on the "+" lines (as discussed earlier on the >>>> BioPerl list). >>> >>> Agreed, though we'll have to check the current fastq parser to see >>> if that's currently the case. I thought that was fixed but maybe >>> not? >>> >>>> (4) When writing FASTQ files should BioPerl omit the optional >>>> repeated >>>> title on the "+" line? Biopython omits this as I understand this >>>> to be >>>> common practice, and can make a big different to file sizes - >>>> especially >>>> on short read data from Solexa/Illumina. >>> >>> Agreed, particularly if it's commonly encountered. >>> >>>> (5) Also test reading and writing files with an optional >>>> description (as well >>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA >>>> for examples, >>>> e.g. >>>> >>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC >>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 >>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC >>> >>> Should be easy enough to implement with a simple regex. >>> >>>> (6) Test reading and writing files where the encoded quality >>>> string starts >>>> with a "@" or a "+" character, e.g. >>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html >>>> >>>> Peter >>> >>> Mark, getting all that? ;> >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From tristan.lefebure at gmail.com Wed Jun 17 14:09:42 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 17 Jun 2009 14:09:42 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> Message-ID: <200906171409.42558.tristan.lefebure@gmail.com> Thanks both for the light. That probably means that the place bioperl will take in the handling of the next-gen sequencing raw data (i.e. reads) is very limited, nope? (at least until bioperl6). A single GA2 solexa lane generates about 9 million reads, and I would really not called that a big project... BTW, is there a simple way to see object instantiation and inheritance, as well as time consumption for each, when once calls next_seq() (or any other method)? -Tristan On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote: > I would suggest developing the "standard" version first, > then moving onto potential optimizations. > > When we went through a similar argument in Ensembl about > 8 years ago we ended up dropping Bio::Root completely... > > If one is truly after performance for these large > next-gen projects, it'd be down to pure piping, shell, > and worrying about location and copying of files, > sticking to systems-level as much as possible, and quite > far from Bioperl altogether, so I think it's a whole > different level of optimization issues, probably outside > the scope of Bioperl. > > Elia > > On 17 Jun 2009, at 18:09, Chris Fields wrote: > > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: > >> Hello, > >> Regarding next-gen sequences and bioperl, following my > >> experience, another issue is bioperl speed. For > >> example, if you want to trim bad quality bases at ends > >> of 1E6 Solexa reads using Bio::SeqIO::fastq and some > >> methods in Bio::Seq::Quality, well, you've got to be > >> patient (but may be I missed some shortcuts...). > > > > The key issues affecting speed in bioperl are contained > > object instantiation and inheritance (and between those > > two, the latter much more so as it plays a role with > > contained objects as well as the container). > > > > http://www.bioperl.org/wiki/Why_BioPerl_is_slow > > > > Moose/Perl6 roles/traits are one way around that issue, > > but we are a ways off from getting that running. I > > think to get that working decently would be a > > from-ground-up endeavor (see my past posts on > > biomoose/bioperl6). > > > >> A pure perl solution will be between 100 to 1000x > >> faster... Would it be possible to have an ultra-light > >> quality object with few simple methods for next-gen > >> reads? > >> > >> I can contribute some tests if that sounds like an > >> important point. > >> > >> -Tristan > > > > The quality objects themselves I don't think are that > > heavy; I think the main impediment is inheritance. One > > could get around that a bit by using a direct_new > > method to create a blessed hash directly, then > > reimplement methods to lazily create any objects > > contained on the fly. > > > > chris > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 From bix at sendu.me.uk Wed Jun 17 14:20:00 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jun 2009 19:20:00 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: <4A3933D0.4040808@sendu.me.uk> Tristan Lefebure wrote: > Hello, > Regarding next-gen sequences and bioperl, following my > experience, another issue is bioperl speed. For example, if > you want to trim bad quality bases at ends of 1E6 Solexa > reads using Bio::SeqIO::fastq and some methods in > Bio::Seq::Quality, well, you've got to be patient (but may > be I missed some shortcuts...). This is my concern as well. Or, rather, is there actually a significant set of users out there who are dealing with next-gen sequencing and would consider using BioPerl for their work? I'm working with all the 1000-genomes data at the Sanger, and we at least are probably never going to use BioPerl for the work. > A pure perl solution will be between 100 to 1000x faster... > Would it be possible to have an ultra-light quality object > with few simple methods for next-gen reads? The fastq parser itself already seems pretty fast. The way to get the speedup is to not create any Bio::Seq* objects but just return the data directly. At that point it's not taking much advantage of BioPerl. But certainly it could be done... From e.stupka at ucl.ac.uk Wed Jun 17 14:39:08 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 19:39:08 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <200906171409.42558.tristan.lefebure@gmail.com> Message-ID: <8C661293-DF7D-4262-970A-92AF0015BB04@ucl.ac.uk> We are using bioperl for simple pre and post-processing of data for full Solexa runs, and although it might not be ideal, the scripting with Bioperl is not a major killer. When I was referring to large, heavy pipelines I was thinking of pipelines that deal with many Solexa runs as one project (e.g. 1000 genomes) who really cannot afford any bottleneck in their pipelines, because that affects directly their storage. cheers Elia On 17 Jun 2009, at 19:09, Tristan Lefebure wrote: > Thanks both for the light. > > That probably means that the place bioperl will take in the > handling of the next-gen sequencing raw data (i.e. reads) is > very limited, nope? (at least until bioperl6). A single GA2 > solexa lane generates about 9 million reads, and I would > really not called that a big project... > > BTW, is there a simple way to see object instantiation and > inheritance, as well as time consumption for each, when once > calls next_seq() (or any other method)? > > -Tristan > > On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote: >> I would suggest developing the "standard" version first, >> then moving onto potential optimizations. >> >> When we went through a similar argument in Ensembl about >> 8 years ago we ended up dropping Bio::Root completely... >> >> If one is truly after performance for these large >> next-gen projects, it'd be down to pure piping, shell, >> and worrying about location and copying of files, >> sticking to systems-level as much as possible, and quite >> far from Bioperl altogether, so I think it's a whole >> different level of optimization issues, probably outside >> the scope of Bioperl. >> >> Elia >> >> On 17 Jun 2009, at 18:09, Chris Fields wrote: >>> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote: >>>> Hello, >>>> Regarding next-gen sequences and bioperl, following my >>>> experience, another issue is bioperl speed. For >>>> example, if you want to trim bad quality bases at ends >>>> of 1E6 Solexa reads using Bio::SeqIO::fastq and some >>>> methods in Bio::Seq::Quality, well, you've got to be >>>> patient (but may be I missed some shortcuts...). >>> >>> The key issues affecting speed in bioperl are contained >>> object instantiation and inheritance (and between those >>> two, the latter much more so as it plays a role with >>> contained objects as well as the container). >>> >>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow >>> >>> Moose/Perl6 roles/traits are one way around that issue, >>> but we are a ways off from getting that running. I >>> think to get that working decently would be a >>> from-ground-up endeavor (see my past posts on >>> biomoose/bioperl6). >>> >>>> A pure perl solution will be between 100 to 1000x >>>> faster... Would it be possible to have an ultra-light >>>> quality object with few simple methods for next-gen >>>> reads? >>>> >>>> I can contribute some tests if that sounds like an >>>> important point. >>>> >>>> -Tristan >>> >>> The quality objects themselves I don't think are that >>> heavy; I think the main impediment is inheritance. One >>> could get around that a bit by using a direct_new >>> method to create a blessed hash directly, then >>> reimplement methods to lazily create any objects >>> contained on the fly. >>> >>> chris >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 > > --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From cjfields at illinois.edu Wed Jun 17 14:40:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 13:40:05 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <200906171409.42558.tristan.lefebure@gmail.com> Message-ID: <63B608B2-8DE0-4FD1-9E15-339FD226D7AB@illinois.edu> On Jun 17, 2009, at 1:09 PM, Tristan Lefebure wrote: > Thanks both for the light. > > That probably means that the place bioperl will take in the > handling of the next-gen sequencing raw data (i.e. reads) is > very limited, nope? (at least until bioperl6). A single GA2 > solexa lane generates about 9 million reads, and I would > really not called that a big project... I don't think it's impossible. If you parse any very long list of sequences in order it will be very slow, yes, but if they were indexed or loaded into a DB lookups would of course be magnitudes faster. We already have perl-based indexing for fastq (Bio::Index::Fastq), so maybe something could be built on top of that. I haven't looked but we can also wrap other C/C++-based parsers as well. BioLib, for instance, has bindings to io_lib, so maybe that could be (ab)used in some way. > BTW, is there a simple way to see object instantiation and > inheritance, as well as time consumption for each, when once > calls next_seq() (or any other method)? > > -Tristan As a simple benchmark, at one point all feature tag information was converted into Bio::Annotations. I reverted that behavior to be simple tag/value again and had a pretty decent bump: http://www.bioperl.org/wiki/Feature_Annotation_rollback#Simple_Benchmark Also, I tried reimplementing some parsers as generic 'event'-based driver/handler and they were slightly faster, the key roadblock being instantation again. If I didn't create Features/Annotations I saw a significant speedup. That's not entirely unexpected, as SeqFeatures also contain Locations (in turn that can contain subLocations) and (until recently) tag-based Bio::Annotation by default. Annotations are collected in an Annotation::Collection and can contain other objects I believe (Ontology terms, etc). The overall lesson is, if you don't have very heavy objects being created the overhead is actually quite small; it's only when you greedily instantiate everything that you run into problems. chris From cjfields at illinois.edu Wed Jun 17 15:05:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 14:05:03 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> Message-ID: On Jun 17, 2009, at 12:49 PM, Elia Stupka wrote: > I would suggest developing the "standard" version first, then moving > onto potential optimizations. Yes, agreed. > When we went through a similar argument in Ensembl about 8 years ago > we ended up dropping Bio::Root completely... They (strangely enough) still use it in a few modules and require bioperl 1.2.3, but (in my experience) the latest bioperl works just fine. I asked about that and never got a response. > If one is truly after performance for these large next-gen projects, > it'd be down to pure piping, shell, and worrying about location and > copying of files, sticking to systems-level as much as possible, and > quite far from Bioperl altogether, so I think it's a whole different > level of optimization issues, probably outside the scope of Bioperl. > > Elia In the end I don't think we can run it using perl alone, no, and I believe using BioPerl by itself will not be the optimal solution, but it can probably interface with something that is. chris From e.stupka at ucl.ac.uk Wed Jun 17 15:14:04 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 20:14:04 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk> Message-ID: <9AC2CFC1-D7E7-4B93-9671-65C30E5AA285@ucl.ac.uk> Excellent, I was thinking of working on Maq and BowTie as priorities. Elia On 17 Jun 2009, at 14:28, John Marshall wrote: > On 17 Jun 2009, at 12:29, Elia Stupka wrote: >> Similarly, there seems to be little in bioperl-run to support tools >> that have been developed in this area, such as Maq, BowTie, TopHat, >> etc? > > FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to > submit in the not too distant future. (First it needs some "blah > blah" replaced with actual documentation and a test suite.) > > Cheers, > > John > > [1] http://www.ebi.ac.uk/~zerbino/velvet/ > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > ResearchLimited, a charity registered in England with number 1021457 > and acompany registered in England with number 2742969, whose > registeredoffice is 215 Euston Road, London, NW1 > 2BE._______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From michael.watson at bbsrc.ac.uk Wed Jun 17 15:15:20 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 17 Jun 2009 20:15:20 +0100 Subject: [Bioperl-l] Next-gen modules References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B291F1@iahce2ksrv1.iah.bbsrc.ac.uk> In answer to your question, yes! We have 6 illumina datasets which we have searched against sequence databases using fasta, and I used SearchIO to parse the results. This is where BioPerl comes into its own - wrapped around fast, optimised solutions written in C or Java. Sure, I could have written something in sed/awk/pure perl/C etc to parse out the information I needed faster, but the SearchIO solution only took a few minutes to parse a huge fasta results file, and for me (and many others, I suspect) a few minutes is not a problem. ________________________________ From: bioperl-l-bounces at lists.open-bio.org on behalf of Sendu Bala Sent: Wed 17/06/2009 7:20 PM To: tristan.lefebure at gmail.com Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Next-gen modules Tristan Lefebure wrote: > Hello, > Regarding next-gen sequences and bioperl, following my > experience, another issue is bioperl speed. For example, if > you want to trim bad quality bases at ends of 1E6 Solexa > reads using Bio::SeqIO::fastq and some methods in > Bio::Seq::Quality, well, you've got to be patient (but may > be I missed some shortcuts...). This is my concern as well. Or, rather, is there actually a significant set of users out there who are dealing with next-gen sequencing and would consider using BioPerl for their work? I'm working with all the 1000-genomes data at the Sanger, and we at least are probably never going to use BioPerl for the work. > A pure perl solution will be between 100 to 1000x faster... > Would it be possible to have an ultra-light quality object > with few simple methods for next-gen reads? The fastq parser itself already seems pretty fast. The way to get the speedup is to not create any Bio::Seq* objects but just return the data directly. At that point it's not taking much advantage of BioPerl. But certainly it could be done... _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jun 17 15:30:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 14:30:15 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3933D0.4040808@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> Message-ID: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: > Tristan Lefebure wrote: >> Hello, >> Regarding next-gen sequences and bioperl, following my experience, >> another issue is bioperl speed. For example, if you want to trim >> bad quality bases at ends of 1E6 Solexa reads using >> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >> you've got to be patient (but may be I missed some shortcuts...). > > This is my concern as well. Or, rather, is there actually a > significant set of users out there who are dealing with next-gen > sequencing and would consider using BioPerl for their work? > > I'm working with all the 1000-genomes data at the Sanger, and we at > least are probably never going to use BioPerl for the work. Are you using pure perl or (gasp) something else? ;> Judging by the feedback there are definitely a set of users who would like to integrate nextgen into bioperl somehow, probably to take advantage of other aspects of bioperl. >> A pure perl solution will be between 100 to 1000x faster... Would >> it be possible to have an ultra-light quality object with few >> simple methods for next-gen reads? > > The fastq parser itself already seems pretty fast. The way to get > the speedup is to not create any Bio::Seq* objects but just return > the data directly. At that point it's not taking much advantage of > BioPerl. But certainly it could be done... I suppose the best way to assess what needs to be done is come up with a set of 'use cases' specifying what users want so we can design around them, otherwise we're shooting in the dark. I'm personally wondering if this could be done as a sequence database, something similar in theme to Lincoln's SeqFeature::Store, but sequence only, and returns quality objects in a similar manner (ala Storable)? Not sure whether that's feasible, but it's appears at least scalable. chris From e.stupka at ucl.ac.uk Wed Jun 17 15:37:26 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 20:37:26 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4C3D793879C64A5E84C67FE313C86FA4@NewLife> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <4C3D793879C64A5E84C67FE313C86FA4@NewLife> Message-ID: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk> Dear all, I tried to summarize today's discussion with what seems to be the "shaping consensus" on the Wiki page: http://www.bioperl.org/wiki/Nextgen_in_Bioperl good night, Elia On 17 Jun 2009, at 13:19, Mark A. Jensen wrote: > [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl > ] > ----- Original Message ----- From: "Elia Stupka" > To: > Sent: Wednesday, June 17, 2009 7:29 AM > Subject: [Bioperl-l] Next-gen modules > > >> Dear all, >> after several years of absence I am slowly coming back to Bioperl, >> and hope to contribute again to its development. >> One area that I was thinking of starting from, since we are >> actively involved with it, is to improve BIoperl's support fo next- >> gen sequencing data, tools, etc. Since I am sure I have missed out >> on a lot of recent developments, do let me know if/what is useful. >> One example that comes to mind is that the conversion of various >> formats to/from FASTQ does not seem to be supported. Some code can >> be found within Li Heng's script: http://maq.sourceforge.net/ >> fq_all2std.pl but it would be good if it could make its way into >> SeqIO? And similarly, potentially, for other next-gen sequence >> formats? >> Similarly, there seems to be little in bioperl-run to support >> tools that have been developed in this area, such as Maq, BowTie, >> TopHat, etc? >> Do let me know if there is a past thread on this, or other people >> actively developing, etc. so that I can find out what priorities are. >> thanks and best regards to all (old friends and new), >> Elia >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From e.stupka at ucl.ac.uk Wed Jun 17 16:06:35 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 21:06:35 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> Message-ID: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> Interesting that you mention the database issue. We found that for specific memory/CPU intenstive things we also switch to using dbs. For example, after many years of loyal use of disconnected_ranges we switched to a simple SQL implementation of it, because of the large performance gains it would give us. Similarly in Ensembl as well as in the old days of bioperl-db we opted for doing subseq within SQL where possible. Some lean way of SQL'izing specific components could be less "disruptive" than avoiding object creation and provide significant gains in performance. Could be set as an optional flag, and could use temporary ad hoc SQL databases? Still, priority now is to make SeqIO compliant with all those formats, than we can worry about performance :) Elia On 17 Jun 2009, at 20:30, Chris Fields wrote: > On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: > >> Tristan Lefebure wrote: >>> Hello, >>> Regarding next-gen sequences and bioperl, following my experience, >>> another issue is bioperl speed. For example, if you want to trim >>> bad quality bases at ends of 1E6 Solexa reads using >>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>> you've got to be patient (but may be I missed some shortcuts...). >> >> This is my concern as well. Or, rather, is there actually a >> significant set of users out there who are dealing with next-gen >> sequencing and would consider using BioPerl for their work? >> >> I'm working with all the 1000-genomes data at the Sanger, and we at >> least are probably never going to use BioPerl for the work. > > Are you using pure perl or (gasp) something else? ;> > > Judging by the feedback there are definitely a set of users who > would like to integrate nextgen into bioperl somehow, probably to > take advantage of other aspects of bioperl. > >>> A pure perl solution will be between 100 to 1000x faster... Would >>> it be possible to have an ultra-light quality object with few >>> simple methods for next-gen reads? >> >> The fastq parser itself already seems pretty fast. The way to get >> the speedup is to not create any Bio::Seq* objects but just return >> the data directly. At that point it's not taking much advantage of >> BioPerl. But certainly it could be done... > > > I suppose the best way to assess what needs to be done is come up > with a set of 'use cases' specifying what users want so we can > design around them, otherwise we're shooting in the dark. > > I'm personally wondering if this could be done as a sequence > database, something similar in theme to Lincoln's SeqFeature::Store, > but sequence only, and returns quality objects in a similar manner > (ala Storable)? Not sure whether that's feasible, but it's appears > at least scalable. > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From maj at fortinbras.us Wed Jun 17 16:29:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 16:29:31 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><4C3D793879C64A5E84C67FE313C86FA4@NewLife> <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk> Message-ID: <1C89D353AD0B4D219515BF1EAAA1FFB5@NewLife> Thanks Elia for those wiki notes-- [I would say you received an enthusiatic 'welcome back'!] cheers, Mark ----- Original Message ----- From: "Elia Stupka" To: "Mark A. Jensen" Cc: Sent: Wednesday, June 17, 2009 3:37 PM Subject: Re: [Bioperl-l] Next-gen modules > Dear all, > > I tried to summarize today's discussion with what seems to be the > "shaping consensus" on the Wiki page: > > http://www.bioperl.org/wiki/Nextgen_in_Bioperl > > good night, > > Elia > > > On 17 Jun 2009, at 13:19, Mark A. Jensen wrote: > >> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl >> ] >> ----- Original Message ----- From: "Elia Stupka" >> To: >> Sent: Wednesday, June 17, 2009 7:29 AM >> Subject: [Bioperl-l] Next-gen modules >> >> >>> Dear all, >>> after several years of absence I am slowly coming back to Bioperl, >>> and hope to contribute again to its development. >>> One area that I was thinking of starting from, since we are >>> actively involved with it, is to improve BIoperl's support fo next- >>> gen sequencing data, tools, etc. Since I am sure I have missed out >>> on a lot of recent developments, do let me know if/what is useful. >>> One example that comes to mind is that the conversion of various >>> formats to/from FASTQ does not seem to be supported. Some code can >>> be found within Li Heng's script: http://maq.sourceforge.net/ >>> fq_all2std.pl but it would be good if it could make its way into >>> SeqIO? And similarly, potentially, for other next-gen sequence >>> formats? >>> Similarly, there seems to be little in bioperl-run to support >>> tools that have been developed in this area, such as Maq, BowTie, >>> TopHat, etc? >>> Do let me know if there is a past thread on this, or other people >>> actively developing, etc. so that I can find out what priorities are. >>> thanks and best regards to all (old friends and new), >>> Elia >>> --- >>> Senior Lecturer, Bioinformatics >>> UCL Cancer Institute >>> Paul O' Gorman Building >>> University College London >>> Gower Street >>> WC1E 6BT >>> London >>> UK >>> Office (UCL): +44 207 679 6493 >>> Office (ICMS): +44 0207 8822374 >>> Mobile: +44 7597 566 194 >>> Mobile (Italy): +39 338 8448801 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jun 17 16:35:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 15:35:38 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> Message-ID: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> So, #1 priority is to get fastq up-to-speed, then maybe assess other options. Illuminating discussion, thanks Elia! urgh, excuse unintended bad pun above... chris On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > Interesting that you mention the database issue. We found that for > specific memory/CPU intenstive things we also switch to using dbs. > For example, after many years of loyal use of disconnected_ranges we > switched to a simple SQL implementation of it, because of the large > performance gains it would give us. Similarly in Ensembl as well as > in the old days of bioperl-db we opted for doing subseq within SQL > where possible. > > Some lean way of SQL'izing specific components could be less > "disruptive" than avoiding object creation and provide significant > gains in performance. Could be set as an optional flag, and could > use temporary ad hoc SQL databases? > > Still, priority now is to make SeqIO compliant with all those > formats, than we can worry about performance :) > > Elia > > On 17 Jun 2009, at 20:30, Chris Fields wrote: > >> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >> >>> Tristan Lefebure wrote: >>>> Hello, >>>> Regarding next-gen sequences and bioperl, following my >>>> experience, another issue is bioperl speed. For example, if you >>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>> you've got to be patient (but may be I missed some shortcuts...). >>> >>> This is my concern as well. Or, rather, is there actually a >>> significant set of users out there who are dealing with next-gen >>> sequencing and would consider using BioPerl for their work? >>> >>> I'm working with all the 1000-genomes data at the Sanger, and we >>> at least are probably never going to use BioPerl for the work. >> >> Are you using pure perl or (gasp) something else? ;> >> >> Judging by the feedback there are definitely a set of users who >> would like to integrate nextgen into bioperl somehow, probably to >> take advantage of other aspects of bioperl. >> >>>> A pure perl solution will be between 100 to 1000x faster... Would >>>> it be possible to have an ultra-light quality object with few >>>> simple methods for next-gen reads? >>> >>> The fastq parser itself already seems pretty fast. The way to get >>> the speedup is to not create any Bio::Seq* objects but just return >>> the data directly. At that point it's not taking much advantage of >>> BioPerl. But certainly it could be done... >> >> >> I suppose the best way to assess what needs to be done is come up >> with a set of 'use cases' specifying what users want so we can >> design around them, otherwise we're shooting in the dark. >> >> I'm personally wondering if this could be done as a sequence >> database, something similar in theme to Lincoln's >> SeqFeature::Store, but sequence only, and returns quality objects >> in a similar manner (ala Storable)? Not sure whether that's >> feasible, but it's appears at least scalable. >> >> chris >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.stupka at ucl.ac.uk Wed Jun 17 16:36:31 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Wed, 17 Jun 2009 21:36:31 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> Message-ID: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> Better than colorspaced discussions for sure ;) Elia On 17 Jun 2009, at 21:35, Chris Fields wrote: > So, #1 priority is to get fastq up-to-speed, then maybe assess other > options. > > Illuminating discussion, thanks Elia! > > urgh, excuse unintended bad pun above... > > chris > > On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > >> Interesting that you mention the database issue. We found that for >> specific memory/CPU intenstive things we also switch to using dbs. >> For example, after many years of loyal use of disconnected_ranges >> we switched to a simple SQL implementation of it, because of the >> large performance gains it would give us. Similarly in Ensembl as >> well as in the old days of bioperl-db we opted for doing subseq >> within SQL where possible. >> >> Some lean way of SQL'izing specific components could be less >> "disruptive" than avoiding object creation and provide significant >> gains in performance. Could be set as an optional flag, and could >> use temporary ad hoc SQL databases? >> >> Still, priority now is to make SeqIO compliant with all those >> formats, than we can worry about performance :) >> >> Elia >> >> On 17 Jun 2009, at 20:30, Chris Fields wrote: >> >>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>> >>>> Tristan Lefebure wrote: >>>>> Hello, >>>>> Regarding next-gen sequences and bioperl, following my >>>>> experience, another issue is bioperl speed. For example, if you >>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>>> you've got to be patient (but may be I missed some shortcuts...). >>>> >>>> This is my concern as well. Or, rather, is there actually a >>>> significant set of users out there who are dealing with next-gen >>>> sequencing and would consider using BioPerl for their work? >>>> >>>> I'm working with all the 1000-genomes data at the Sanger, and we >>>> at least are probably never going to use BioPerl for the work. >>> >>> Are you using pure perl or (gasp) something else? ;> >>> >>> Judging by the feedback there are definitely a set of users who >>> would like to integrate nextgen into bioperl somehow, probably to >>> take advantage of other aspects of bioperl. >>> >>>>> A pure perl solution will be between 100 to 1000x faster... >>>>> Would it be possible to have an ultra-light quality object with >>>>> few simple methods for next-gen reads? >>>> >>>> The fastq parser itself already seems pretty fast. The way to get >>>> the speedup is to not create any Bio::Seq* objects but just >>>> return the data directly. At that point it's not taking much >>>> advantage of BioPerl. But certainly it could be done... >>> >>> >>> I suppose the best way to assess what needs to be done is come up >>> with a set of 'use cases' specifying what users want so we can >>> design around them, otherwise we're shooting in the dark. >>> >>> I'm personally wondering if this could be done as a sequence >>> database, something similar in theme to Lincoln's >>> SeqFeature::Store, but sequence only, and returns quality objects >>> in a similar manner (ala Storable)? Not sure whether that's >>> feasible, but it's appears at least scalable. >>> >>> chris >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- Senior Lecturer, Bioinformatics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Office (UCL): +44 207 679 6493 Office (ICMS): +44 0207 8822374 Mobile: +44 7597 566 194 Mobile (Italy): +39 338 8448801 From maj at fortinbras.us Wed Jun 17 16:54:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Jun 2009 16:54:00 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife><200906170927.13273.tristan.lefebure@gmail.com><4A3933D0.4040808@sendu.me.uk><8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu><0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> Message-ID: <2B2A7A587B0F488DAA18E80A1BFD671B@NewLife> unintended! Does that mean your delete key's broke...? ----- Original Message ----- From: "Chris Fields" To: "Elia Stupka" Cc: ; Sent: Wednesday, June 17, 2009 4:35 PM Subject: Re: [Bioperl-l] Next-gen modules > So, #1 priority is to get fastq up-to-speed, then maybe assess other > options. > > Illuminating discussion, thanks Elia! > > urgh, excuse unintended bad pun above... > > chris > > On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > >> Interesting that you mention the database issue. We found that for >> specific memory/CPU intenstive things we also switch to using dbs. >> For example, after many years of loyal use of disconnected_ranges we >> switched to a simple SQL implementation of it, because of the large >> performance gains it would give us. Similarly in Ensembl as well as >> in the old days of bioperl-db we opted for doing subseq within SQL >> where possible. >> >> Some lean way of SQL'izing specific components could be less >> "disruptive" than avoiding object creation and provide significant >> gains in performance. Could be set as an optional flag, and could >> use temporary ad hoc SQL databases? >> >> Still, priority now is to make SeqIO compliant with all those >> formats, than we can worry about performance :) >> >> Elia >> >> On 17 Jun 2009, at 20:30, Chris Fields wrote: >> >>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>> >>>> Tristan Lefebure wrote: >>>>> Hello, >>>>> Regarding next-gen sequences and bioperl, following my >>>>> experience, another issue is bioperl speed. For example, if you >>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>>> you've got to be patient (but may be I missed some shortcuts...). >>>> >>>> This is my concern as well. Or, rather, is there actually a >>>> significant set of users out there who are dealing with next-gen >>>> sequencing and would consider using BioPerl for their work? >>>> >>>> I'm working with all the 1000-genomes data at the Sanger, and we >>>> at least are probably never going to use BioPerl for the work. >>> >>> Are you using pure perl or (gasp) something else? ;> >>> >>> Judging by the feedback there are definitely a set of users who >>> would like to integrate nextgen into bioperl somehow, probably to >>> take advantage of other aspects of bioperl. >>> >>>>> A pure perl solution will be between 100 to 1000x faster... Would >>>>> it be possible to have an ultra-light quality object with few >>>>> simple methods for next-gen reads? >>>> >>>> The fastq parser itself already seems pretty fast. The way to get >>>> the speedup is to not create any Bio::Seq* objects but just return >>>> the data directly. At that point it's not taking much advantage of >>>> BioPerl. But certainly it could be done... >>> >>> >>> I suppose the best way to assess what needs to be done is come up >>> with a set of 'use cases' specifying what users want so we can >>> design around them, otherwise we're shooting in the dark. >>> >>> I'm personally wondering if this could be done as a sequence >>> database, something similar in theme to Lincoln's >>> SeqFeature::Store, but sequence only, and returns quality objects >>> in a similar manner (ala Storable)? Not sure whether that's >>> feasible, but it's appears at least scalable. >>> >>> chris >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> --- >> Senior Lecturer, Bioinformatics >> UCL Cancer Institute >> Paul O' Gorman Building >> University College London >> Gower Street >> WC1E 6BT >> London >> UK >> >> Office (UCL): +44 207 679 6493 >> Office (ICMS): +44 0207 8822374 >> >> Mobile: +44 7597 566 194 >> Mobile (Italy): +39 338 8448801 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hartzell at alerce.com Wed Jun 17 16:40:03 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 17 Jun 2009 13:40:03 -0700 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3933D0.4040808@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> Message-ID: <19001.21667.127519.462899@already.dhcp.gene.com> Sendu Bala writes: > Tristan Lefebure wrote: > > Hello, > > Regarding next-gen sequences and bioperl, following my > > experience, another issue is bioperl speed. For example, if > > you want to trim bad quality bases at ends of 1E6 Solexa > > reads using Bio::SeqIO::fastq and some methods in > > Bio::Seq::Quality, well, you've got to be patient (but may > > be I missed some shortcuts...). > > This is my concern as well. Or, rather, is there actually a significant > set of users out there who are dealing with next-gen sequencing and > would consider using BioPerl for their work? > > I'm working with all the 1000-genomes data at the Sanger, and we at > least are probably never going to use BioPerl for the work. > [...] Is it purely a speed issue, or are there other issues (e.g. stability, correctness, compatibility) that are contributing to your decision? What *are* you using? g. From bix at sendu.me.uk Wed Jun 17 18:10:57 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jun 2009 23:10:57 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> Message-ID: <4A3969F1.8080002@sendu.me.uk> Chris Fields wrote: > On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: > >> Tristan Lefebure wrote: >>> Hello, >>> Regarding next-gen sequences and bioperl, following my experience, >>> another issue is bioperl speed. For example, if you want to trim bad >>> quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and >>> some methods in Bio::Seq::Quality, well, you've got to be patient >>> (but may be I missed some shortcuts...). >> >> This is my concern as well. Or, rather, is there actually a >> significant set of users out there who are dealing with next-gen >> sequencing and would consider using BioPerl for their work? >> >> I'm working with all the 1000-genomes data at the Sanger, and we at >> least are probably never going to use BioPerl for the work. > > Are you using pure perl or (gasp) something else? ;> We use some perl stuff, some C stuff. My own stuff is OO perl, but much lighter weight than BioPerl. Absolute minimal object creation. >>> A pure perl solution will be between 100 to 1000x faster... Would it >>> be possible to have an ultra-light quality object with few simple >>> methods for next-gen reads? >> >> The fastq parser itself already seems pretty fast. The way to get the >> speedup is to not create any Bio::Seq* objects but just return the >> data directly. At that point it's not taking much advantage of >> BioPerl. But certainly it could be done... > > I suppose the best way to assess what needs to be done is come up with a > set of 'use cases' specifying what users want so we can design around > them, otherwise we're shooting in the dark. Indeed. Though at least I think we can all agree it would be nice to have the functionality there even if it's slow. There will always be at least some use-cases where the run speed doesn't matter. > I'm personally wondering if this could be done as a sequence database, > something similar in theme to Lincoln's SeqFeature::Store, but sequence > only, and returns quality objects in a similar manner (ala Storable)? > Not sure whether that's feasible, but it's appears at least scalable. I think not. Well, at least SeqFeature::Store doesn't scale. Try storing millions of features in a database and watch it crawl to complete unusability. I can't imagine a db scaling to holding hundreds of TB of data either. I'm also not sure what the benefit is. There are already high-speed ways of indexing your fastq or bam files. From bix at sendu.me.uk Wed Jun 17 18:24:50 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 17 Jun 2009 23:24:50 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <19001.21667.127519.462899@already.dhcp.gene.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <19001.21667.127519.462899@already.dhcp.gene.com> Message-ID: <4A396D32.5070909@sendu.me.uk> George Hartzell wrote: > Sendu Bala writes: > > Tristan Lefebure wrote: > > > Hello, > > > Regarding next-gen sequences and bioperl, following my > > > experience, another issue is bioperl speed. For example, if > > > you want to trim bad quality bases at ends of 1E6 Solexa > > > reads using Bio::SeqIO::fastq and some methods in > > > Bio::Seq::Quality, well, you've got to be patient (but may > > > be I missed some shortcuts...). > > > > This is my concern as well. Or, rather, is there actually a significant > > set of users out there who are dealing with next-gen sequencing and > > would consider using BioPerl for their work? > > > > I'm working with all the 1000-genomes data at the Sanger, and we at > > least are probably never going to use BioPerl for the work. > > [...] > > Is it purely a speed issue, or are there other issues (e.g. stability, > correctness, compatibility) that are contributing to your decision? Too heavy-weight, too slow, too memory intensive, missing too much functionality in any case. If I have to write new parsers and wrappers, I may as well make them fast (which means they don't "fit" into BioPerl). > What *are* you using? There are already great tools written in C that do all the heavy lifting and the rest is done in perl written for speed and low memory. From cjfields at illinois.edu Wed Jun 17 18:38:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 17:38:26 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3969F1.8080002@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> Message-ID: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu> On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote: > Chris Fields wrote: >> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>> Tristan Lefebure wrote: >>>> Hello, >>>> Regarding next-gen sequences and bioperl, following my >>>> experience, another issue is bioperl speed. For example, if you >>>> want to trim bad quality bases at ends of 1E6 Solexa reads using >>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well, >>>> you've got to be patient (but may be I missed some shortcuts...). >>> >>> This is my concern as well. Or, rather, is there actually a >>> significant set of users out there who are dealing with next-gen >>> sequencing and would consider using BioPerl for their work? >>> >>> I'm working with all the 1000-genomes data at the Sanger, and we >>> at least are probably never going to use BioPerl for the work. >> Are you using pure perl or (gasp) something else? ;> > > We use some perl stuff, some C stuff. My own stuff is OO perl, but > much lighter weight than BioPerl. Absolute minimal object creation. Makes sense. >>>> A pure perl solution will be between 100 to 1000x faster... Would >>>> it be possible to have an ultra-light quality object with few >>>> simple methods for next-gen reads? >>> >>> The fastq parser itself already seems pretty fast. The way to get >>> the speedup is to not create any Bio::Seq* objects but just return >>> the data directly. At that point it's not taking much advantage of >>> BioPerl. But certainly it could be done... >> I suppose the best way to assess what needs to be done is come up >> with a set of 'use cases' specifying what users want so we can >> design around them, otherwise we're shooting in the dark. > > Indeed. Though at least I think we can all agree it would be nice to > have the functionality there even if it's slow. There will always be > at least some use-cases where the run speed doesn't matter. Agreed. >> I'm personally wondering if this could be done as a sequence >> database, something similar in theme to Lincoln's >> SeqFeature::Store, but sequence only, and returns quality objects >> in a similar manner (ala Storable)? Not sure whether that's >> feasible, but it's appears at least scalable. > > I think not. Well, at least SeqFeature::Store doesn't scale. Try > storing millions of features in a database and watch it crawl to > complete unusability. I can't imagine a db scaling to holding > hundreds of TB of data either. I'm also not sure what the benefit > is. There are already high-speed ways of indexing your fastq or bam > files. Interesting that you ran into issues with SF::Store; wonder if object storage is the limiting factor there, or if it is something else. Anyone else having this issue? chris From cjfields at illinois.edu Wed Jun 17 21:08:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Jun 2009 20:08:55 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A396D32.5070909@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <19001.21667.127519.462899@already.dhcp.gene.com> <4A396D32.5070909@sendu.me.uk> Message-ID: <03A96F40-27CD-4D38-9A4A-04AB4CECC8DE@illinois.edu> On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote: > George Hartzell wrote: >> Sendu Bala writes: >> > Tristan Lefebure wrote: >> > > Hello, >> > > Regarding next-gen sequences and bioperl, following my > > >> experience, another issue is bioperl speed. For example, if > > >> you want to trim bad quality bases at ends of 1E6 Solexa > > reads >> using Bio::SeqIO::fastq and some methods in > > Bio::Seq::Quality, >> well, you've got to be patient (but may > > be I missed some >> shortcuts...). >> > > This is my concern as well. Or, rather, is there actually a >> significant > set of users out there who are dealing with next-gen >> sequencing and > would consider using BioPerl for their work? >> > > I'm working with all the 1000-genomes data at the Sanger, and >> we at > least are probably never going to use BioPerl for the work. >> > [...] >> Is it purely a speed issue, or are there other issues (e.g. >> stability, >> correctness, compatibility) that are contributing to your decision? > > Too heavy-weight, too slow, too memory intensive, missing too much > functionality in any case. If I have to write new parsers and > wrappers, I may as well make them fast (which means they don't "fit" > into BioPerl). That's (unfortunately) true. It may be easy to whip up something that works, but it probably won't be fast. >> What *are* you using? > > There are already great tools written in C that do all the heavy > lifting and the rest is done in perl written for speed and low memory. Like this one? http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml I suppose if one were inclined, this could be wrapped with SWIG in BioLib, but would it be worth it (maybe beyond grabbing the file indices)? chris From jbarrick at msu.edu Wed Jun 17 23:10:43 2009 From: jbarrick at msu.edu (Jeffrey Barrick) Date: Wed, 17 Jun 2009 23:10:43 -0400 Subject: [Bioperl-l] svn error Message-ID: <7C1A481F-275E-4E08-AA1B-036BC708D5E1@msu.edu> Hi all, I've been trying to download the latest version of "bioperl-live" through svn as per the instructions at [http://www.bioperl.org/wiki/Using_Subversion ] and I keep getting an "svn: Found malformed header in revision file" error when it gets to "bioperl-live/t/RemoteDB/EMBL.t", causing it to stop prematurely. I also get the error when trying to browse that directory, for example: http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t/RemoteDB Any ideas? Thanks, --Jeff From hlapp at gmx.net Wed Jun 17 21:51:16 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 17 Jun 2009 20:51:16 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> Message-ID: On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: > Similarly in Ensembl as well as in the old days of bioperl-db we > opted for doing subseq within SQL where possible. BTW Bioperl-db still lazy-loads sequences, and does subseq in SQL, unless you manipulate the sequence, or make it a non-persistent object. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Thu Jun 18 02:45:17 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 18 Jun 2009 07:45:17 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu> Message-ID: <4A39E27D.9040807@sendu.me.uk> Chris Fields wrote: > On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote: > >>> I'm personally wondering if this could be done as a sequence >>> database, something similar in theme to Lincoln's SeqFeature::Store, >>> but sequence only, and returns quality objects in a similar manner >>> (ala Storable)? Not sure whether that's feasible, but it's appears >>> at least scalable. >> >> I think not. Well, at least SeqFeature::Store doesn't scale. Try >> storing millions of features in a database and watch it crawl to >> complete unusability. I can't imagine a db scaling to holding hundreds >> of TB of data either. I'm also not sure what the benefit is. There are >> already high-speed ways of indexing your fastq or bam files. > > Interesting that you ran into issues with SF::Store; wonder if object > storage is the limiting factor there, or if it is something else. Object storage certainly was an issue, which is why I patched it to (optionally) not store objects. That helped a great deal, but ultimately only increased the number of features you could store before it slowed down; it didn't solve the problem completely. From Xianjun.Dong at bccs.uib.no Thu Jun 18 06:15:47 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Thu, 18 Jun 2009 12:15:47 +0200 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A33D850.1020203@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> <4A33D850.1020203@ii.uib.no> Message-ID: <4A3A13D3.7050208@ii.uib.no> Hi, Scott, Do you mind to have a look of the code (below my signature) if I use the -postgrid callback correctly? I still cannnot get the background for the whole panel. Thanks Xianjun Xianjun Dong wrote: > Hi, Scott > > Before I gave up my own whole solution to use GBrowse, I still want to > bother you once: > > As you suggested, I put -postgrid option when the panel, which will > call a function to draw the background. The code below is almost > copied from the online POD of Bio::Graphics::Panel (see > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html > ) > > But it still does not work. Could you help to have a look? I paste it > below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while > the gap drawing function is gap_it, not draw_gap. I guess it's a typo. > or not?) > > THanks > > Xianjun > > ----------------------------------------------- mytestcode.pl > -------------------------- > > #!/usr/bin/perl > > use strict; > use lib "$ENV{HOME}/lib"; > > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > > # processed_transcript > my $trans1 = > $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); > my $trans2 = > $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); > my $trans3 = > $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = > $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans5 = > $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); > my $trans = > $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); > > # hightlight > my $trans31 = > $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', > -source=>'a'); > my $trans41 = > $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', > -source=>'b'); > > my $panel= Bio::Graphics::Panel->new(-width=>1200, > -length=>1050, > -start =>0, > -pad_left=>12, > -pad_right=>12 > -postgrid=>\&gap_it); > > sub gap_it { > my $gd = shift; > my $panel = shift; > my ($gap_start,$gap_end) = $panel->location2pixel(500,600); > my $top = $panel->top; > my $bottom = $gd->height, #panel->bottom; > my $gray = $panel->translate_color('red'); > $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); > } > # the following track works as I expected in bioperl 1.2.3, but not in > 1.5 and 1.6 > #$panel->add_track([$trans41,$trans31], > # -glyph => 'background', > # -block_bgcolor => sub{return (shift->source eq > 'a')?'#cccccc':'#fffc22'}, > # ); > > $panel->add_track($ftr->new(-start=>100,-end=>1000), > -glyph=>'arrow', > -double=>1, > -tick=>2); > > $panel->add_track($trans, > -glyph => 'transcript2', # 'transcript2', #process_5utr', > -fgcolor => 'darkred', > -bgcolor => 'darkred', > -title => '$source', > -link => > 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', > #EnsEMBL > ); > print $panel->png; > > # the following part works in bioperl 1.5 and 1.6, but not work in > Bioperl 1.2.3 > my $map = $panel->create_web_map("image"); > $panel->finished(); > > > > > > > > > > > Scott Cain wrote: >> Hi Xianjun, >> >> I understand what you want to do, as the current version of gbrowse >> does this, which uses bioperl 1.6. Without digging through the code, >> I can't tell you exactly how this works and you didn't send your code >> that uses this callback, so I can't try it either. >> >> One thing that is different between your code and gbrowse is that each >> of the tracks is actually a seperate panel (to allow track dragging), >> so it possible that this sort of callback doesn't work for >> Bio::Graphics any more. >> >> Scott >> >> On Saturday, June 13, 2009, Xianjun Dong >> wrote: >> >>> Hi, Scott >>> >>> Thanks for your reply first. >>> >>> I still have question: I dig out the code from GBrowse (which I >>> paste below). Method make_postgrid_callback gets all highlight >>> region and then use hilite_regions_closure function to draw them >>> out, using the following GD function: >>> >>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>> $panel->translate_color($h_color)); >>> >>> where the $bottom=$panel->bottom. This is the only difference from >>> my code, where I use $gd->height. I guess they are almost same >>> (except the pad_bottom), we can see this in the code of >>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >>> >>> >>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, >>> for my highlight regions. The output is same, when using the library >>> of Bioperl 1.6 (or 1.5). You can see the attached image >>> ("test.bioperl1.6.png") >>> >>> OK. I might have not explained my question explicitly. My question >>> is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl >>> 1.2.3), I can get the right image I want (see the attached file >>> "test.bioperl1.2.3.png"), where the highlight range will go from the >>> roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the >>> highlight region in its own track, not the whole panel. OK, did I >>> explain clearly now? you can see the difference of the two images. >>> >>> [I am not sure the mailist allow to attach image, otherwise, I put >>> them in the following links: >>> test.bioperl1.6.png: http://translog.genereg.net/test.bioperl1.6.png >>> test.bioperl1.2.3.png: >>> http://translog.genereg.net/test.bioperl1.2.3.png ] >>> >>> You can test it and see the difference if you have both 1.2.3 and >>> 1.6 on your computer? >>> >>> Really want to know how this works in bioperl 1.2.3 (Even though >>> this might be a bug at that version, or whatever) >>> >>> Thanks >>> >>> Xianjun >>> ============================================= >>> >>> # this generates the callback for highlighting a region >>> sub make_postgrid_callback { >>> my $settings = shift; >>> return unless ref $settings->{h_region}; >>> >>> my @h_regions = map { >>> my ($h_ref,$h_start,$h_end,$h_color) = >>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >>> defined($h_ref) && $h_ref eq $settings->{ref} >>> ? [$h_start,$h_end,$h_color||'lightgrey'] >>> : () >>> } >>> @{$settings->{h_region}}; >>> >>> return unless @h_regions; >>> return hilite_regions_closure(@h_regions); >>> } >>> >>> # this subroutine generates a Bio::Graphics::Panel callback closure >>> # suitable for hilighting a region of a panel. >>> # The args are a list of [start,end,color] >>> sub hilite_regions_closure { >>> my @h_regions = @_; >>> >>> return sub { >>> my $gd = shift; >>> my $panel = shift; >>> my $left = $panel->pad_left; >>> my $top = $panel->top; >>> my $bottom = $panel->bottom; >>> for my $r (@h_regions) { >>> my ($h_start,$h_end,$h_color) = @$r; >>> my ($start,$end) = $panel->location2pixel($h_start,$h_end); >>> if ($end-$start <= 1) { $end++; $start-- } # so that we always >>> see something >>> # assuming top is 0 so as to ignore top padding >>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>> $panel->translate_color($h_color)); >>> } >>> }; >>> } >>> >>> >>> Scott Cain wrote: >>> >>> Hello Xianjun, >>> >>> I don't think that approach will work. What you almost certainly need >>> to do is a postgrid callback that does the drawing of the highlighted >>> region. For example code of how to do this, take a look at the >>> make_postgrid_callback subroutine in GBrowse 1.69. The option >>> -postgrid is a method of Bio::Graphics::Panel. >>> >>> Scott >>> >>> >>> >>> >>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun >>> Dong wrote: >>> >>> >>> HI, >>> >>> I am not sure this is the right place I can get help. >>> >>> I've suffered by a problem for several days: I want to highlight >>> parts of >>> regions in my track, using a different background color. To do that, I >>> defined a glyph named "background", based on the >>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >>> method, by adding code like below: >>> >>> $gd->filledRectangle($left,0,$right,$gd->height, >>> $self->factory->translate_color($color)); >>> >>> # the script is pasted at the end >>> >>> This will draw a rectangle with top=0, bottom=$gd->height. I made the >>> highlight regions into a list of features, and add_track with >>> -glyph=>'background'. (see the following script, test.pl) This >>> really works >>> as I expect, which will add a colored block at background of all >>> tracks in a >>> panel (including the ruler arrow). You can see the output image in >>> attached >>> file "test.bioperl1.2.3.png" >>> >>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it >>> does not >>> work. Well, it works, but the highlight part only shrink to a low >>> height, >>> instead of covering all tracks in the panel. I also attached the output >>> here, see the file "test.bioperl1.6.png". >>> >>> I tried to think about the reason, the 'background' module is based >>> on the >>> generic module. What can cause the difference? Is it because >>> $gd->height is >>> different, or the tracks followed with 'background' track can not >>> draw from >>> the first position? >>> >>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart >>> person >>> solve problem, wise person avoid problem"...) But another problem is >>> coming: >>> Bio::Graphics in Bioperl 1.2.3 does not support >>> $panel->create_web_map() >>> function, which means I have to use some higher version if I want to >>> create >>> web map for my graphics, but then I have to give up using highlight >>> background. >>> >>> OK. It's long enough for my first-time submission here. Hope someone >>> can >>> throw me some clue. >>> >>> Thanks ahead!! >>> >>> Xianjun >>> >>> >>> ==================== test.pl ======================= >>> #!/usr/bin/perl >>> >>> use strict; >>> use lib "$ENV{HOME}/lib"; >>> >>> use Bio::Graphics; >>> use Bio::Graphics::Feature; >>> my $ftr= 'Bio::Graphics::Feature'; >>> >>> # processed_transcript >>> my $trans1 = >>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>> my $trans2 = >>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>> my $trans3 = >>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans4 = >>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans5 = >>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>> my $trans = >>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>> >>> # hightlight >>> my $trans31 = >>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>> >>> -source=>'a'); >>> my $trans41 = >>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>> >>> -source=>'b'); >>> >>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>> -length=>1050, >>> -start =>0, >>> -pad_left=>12, >>> -pad_right=>12); >>> >>> # the following track works as I expected in bioperl 1.2.3, but not >>> in 1.5 >>> and 1.6 >>> $panel->add_track([$trans41,$trans31], >>> -glyph => 'background', >>> -block_bgcolor => sub{return (shift->source eq >>> 'a')?'#cccccc':'#fffc22'}, >>> ); >>> >>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>> -glyph=>'arrow', >>> -double=>1, >>> -tick=>2); >>> >>> $panel->add_track($trans, >>> -glyph => 'transcript2', # 'transcript2', #process_5utr', >>> -fgcolor => 'darkred', >>> -bgcolor => 'darkred', >>> -title => '$source', >>> -link => >>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', >>> #EnsEMBL >>> ); >>> print $panel->png; >>> >>> # the following part works in bioperl 1.5 and 1.6, but not work in >>> Bioperl >>> 1.2.3 >>> my $map = $panel->create_web_map("image"); >>> $panel->finished(); >>> >>> 1; >>> >>> ==================== background.pm ======================= >>> package Bio::Graphics::Glyph::background; >>> >>> use strict; >>> use base 'Bio::Graphics::Glyph::generic'; >>> sub pad_top{ >>> return 0; >>> } >>> >>> sub draw_component { >>> my $self = shift; >>> #$self->SUPER::draw_component(@_); >>> my ($gd,$dx,$dy) = @_; >>> my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >>> >>> # draw an arrow to indicate the direction of transcript >>> my $color = $self->option('block_bgcolor') || '#cccccc'; >>> $gd->filledRectangle($left,0,$right,$gd->height, >>> $self->factory->translate_color($color)); >>> } >>> >>> 1; >>> >>> -- >>> ========================================== >>> Xianjun Dong >>> PhD student, Lenhard group >>> Computational Biology Unit >>> Bergen Center for Computational Science >>> University of Bergen >>> Hoyteknologisenteret, Thormohlensgate 55 >>> N-5008 Bergen, Norway >>> E-mail: xianjun.dong at bccs.uib.no >>> Tel.: +47 555 84022 >>> Fax : +47 555 84295 >>> ========================================== >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> ========================================== >>> Xianjun Dong >>> PhD student, Lenhard group >>> Computational Biology Unit >>> Bergen Center for Computational Science >>> University of Bergen >>> Hoyteknologisenteret, Thormohlensgate 55 >>> N-5008 Bergen, Norway >>> E-mail: xianjun.dong at bccs.uib.no >>> Tel.: +47 555 84022 >>> Fax : +47 555 84295 >>> ========================================== >>> >>> >>> >> >> > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From charles.tilford at bms.com Thu Jun 18 09:38:34 2009 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 18 Jun 2009 09:38:34 -0400 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? Message-ID: <4A3A435A.8000505@bms.com> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace channels. Can anyone confirm? Hi all, I'm using the SCF Bio::SeqIO module to parse trace data out of chromatograms. The SCF files are being produced by phred using the "-cd" parameter. The traces come out great, and the corresponding base calls from the .phd files align with the peaks wonderfully when I visualize them on a rendered trace. However, only the A bases align to the appropriate trace channel, the rest are mixed up. I find that if I do the following re-mapping, the phred base calls match the SeqIO : Remapped A : A C : G G : T T : C The relevant part of Bio::SeqIO::scf is here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9 ... which indicates that it expects the pack()ed trace data to be in order ATGC. The base call parsing code is here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8 ... which is unpacking in order ACGT. As far as I can tell, the relevant official SCF documentation is here: http://staden.sourceforge.net/manual/formats_unix_4.html ... which indicates that both trace and base order should be ACGT (matching the SeqIO unpack() for bases, but not traces). My empirical channel unscrambling mapping implies order ACTG, which is different from either of the two orders above. The sequence from the SCF file (should be that from original AB1 file, I think) is not perfectly identical to that called by phred, but is very similar (to be expected); that is, I don't need to remap C, G and T to get it to align with the phred data. So it looks like the SeqIO module is not mapping the sections of the packed trace data to the appropriate bases. The unpack order is different than the staden documentation ... but so is the order I impose to correct the problem. I am still unclear as to the differences between V2 and V3 of the format. The major difference appears to be coding the trace absolutely (V2) or relatively to prior values (V3); I'd expect if I was using one format and SeqIO was trying to parse the other that I would get garbage out. Running in verbose reports "scf.pm is working with a version 2 scf." Thoughts on this would be appreciated - can anyone confirm a problem with trace extraction from SCF? I'm hoping that once I convince our admin to (properly) install staden::read that I can work directly with the ab1 files, but I need to stopgap on SCF for the time being.... -CAT From cjfields at illinois.edu Thu Jun 18 11:31:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Jun 2009 10:31:08 -0500 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <4A3A435A.8000505@bms.com> References: <4A3A435A.8000505@bms.com> Message-ID: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu> Charles, The best way to make sure this is addressed is to file a ticket (bug report) on it so we can properly track it. I have a local installation of io_lib and I believe we also have Geneious installed locally (both of which read SCF), so I can work on confirming that. If it stays on the list it may not get answered and a possible bug report will be lost (to possibly bite someone else later). AFAIK this module doesn't use staden::read but is pure perl. You are more than welcome to try out Bio::SeqIO::staden::read, but I have to warn you that most of us are looking at replacing it's functionality at some point with BioLib bindings to io_lib (more stable) and so we don't intend on following up with bug fixes. Note: there is also Bio::SCF (non-bp): http://search.cpan.org/~lds/Bio-SCF-1.01/ chris On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote: > Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace > channels. Can anyone confirm? > > Hi all, > > I'm using the SCF Bio::SeqIO module to parse trace data out of > chromatograms. The SCF files are being produced by phred using the "- > cd" parameter. The traces come out great, and the corresponding base > calls from the .phd files align with the peaks wonderfully when I > visualize them on a rendered trace. However, only the A bases align > to the appropriate trace channel, the rest are mixed up. I find that > if I do the following re-mapping, the phred base calls match the > > SeqIO : Remapped > A : A > C : G > G : T > T : C > > The relevant part of Bio::SeqIO::scf is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9 > > ... which indicates that it expects the pack()ed trace data to be in > order ATGC. The base call parsing code is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8 > > ... which is unpacking in order ACGT. As far as I can tell, the > relevant official SCF documentation is here: > > http://staden.sourceforge.net/manual/formats_unix_4.html > > ... which indicates that both trace and base order should be ACGT > (matching the SeqIO unpack() for bases, but not traces). My > empirical channel unscrambling mapping implies order ACTG, which is > different from either of the two orders above. The sequence from the > SCF file (should be that from original AB1 file, I think) is not > perfectly identical to that called by phred, but is very similar (to > be expected); that is, I don't need to remap C, G and T to get it to > align with the phred data. > > So it looks like the SeqIO module is not mapping the sections of the > packed trace data to the appropriate bases. The unpack order is > different than the staden documentation ... but so is the order I > impose to correct the problem. I am still unclear as to the > differences between V2 and V3 of the format. The major difference > appears to be coding the trace absolutely (V2) or relatively to > prior values (V3); I'd expect if I was using one format and SeqIO > was trying to parse the other that I would get garbage out. Running > in verbose reports "scf.pm is working with a version 2 scf." > > Thoughts on this would be appreciated - can anyone confirm a problem > with trace extraction from SCF? > > I'm hoping that once I convince our admin to (properly) install > staden::read that I can work directly with the ab1 files, but I need > to stopgap on SCF for the time being.... > > -CAT From MEC at stowers.org Thu Jun 18 11:42:48 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 18 Jun 2009 10:42:48 -0500 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <4A3A435A.8000505@bms.com> References: <4A3A435A.8000505@bms.com> Message-ID: Charles, Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters. Its not in the bioperl project but it is an easy install from CPAN. I am familiar with staden::read installation woes. Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box".... Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri #!/usr/bin/env perl # PURPOSE: extract from AB1 files into fasta format the sequence in # the 'clear range' defined by 3 parameters. If there is no clear # range, emit warning and skip the sequence. The fasta 'defline' # identifier is taken as the sample name. Other useful attributes are # also embedded into the defline using attribute=value syntax. # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1 # NOTE: 20 4 20 is ABI default settings # EXAMPLE: # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta # AUTHOR: malcolm_cook at stowers-institute.org use strict; use warnings; use Bio::Trace::ABIF; use Text::Wrap qw(wrap); $Text::Wrap::columns = 72; # wrap the sequence use File::Basename; my ($window_width, $bad_bases_threshold, $quality_threshold, @ARGV) = @ARGV; my $abif = Bio::Trace::ABIF->new(); sub main {} { foreach (@ARGV) { $abif->open_abif($_) or die "error opening $_ as ABIF"; my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width, $bad_bases_threshold, $quality_threshold ); my $sample_score = $abif->sample_score( $window_width, $bad_bases_threshold, $quality_threshold ); # my $contiguous_read_length = $abif->contiguous_read_length($window_width, # $quality_threshold, # 0, # ==> trim_ends # ); # my $length_of_read = $abif->length_of_read( # $window_width, # $quality_threshold, # # $method # ); my $defline = join "\t", $abif->sample_name, #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline #$abif->container_identifier . ':' . $abif->well_id, # or this, for container:well_id formatted defline identifiers (map {my $method = $_; "$method=". ($abif->$method() || '')} qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment # sample_tracking_id - don't use this - it is internal to ABI software "clear_range_start=$clear_range_start", "clear_range_stop=$clear_range_stop", "sample_score=$sample_score", #"contiguous_read_length=$contiguous_read_length", #"length_of_read=$length_of_read", ; if ($clear_range_start == -1) { warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline"; next; } my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1)); print ">$defline\n$seq\n"; $abif->close_abif(); } } main (); > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Charles Tilford > Sent: Thursday, June 18, 2009 8:39 AM > To: BioPerl List > Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? > > Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace > channels. > Can anyone confirm? > > Hi all, > > I'm using the SCF Bio::SeqIO module to parse trace data out > of chromatograms. The SCF files are being produced by phred > using the "-cd" > parameter. The traces come out great, and the corresponding > base calls from the .phd files align with the peaks > wonderfully when I visualize them on a rendered trace. > However, only the A bases align to the appropriate trace > channel, the rest are mixed up. I find that if I do the > following re-mapping, the phred base calls match the > > SeqIO : Remapped > A : A > C : G > G : T > T : C > > The relevant part of Bio::SeqIO::scf is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/SeqIO/scf.html#CODE9 > > ... which indicates that it expects the pack()ed trace data > to be in order ATGC. The base call parsing code is here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/SeqIO/scf.html#CODE8 > > ... which is unpacking in order ACGT. As far as I can tell, > the relevant official SCF documentation is here: > > http://staden.sourceforge.net/manual/formats_unix_4.html > > ... which indicates that both trace and base order should be > ACGT (matching the SeqIO unpack() for bases, but not traces). > My empirical channel unscrambling mapping implies order ACTG, > which is different from either of the two orders above. The > sequence from the SCF file (should be that from original AB1 > file, I think) is not perfectly identical to that called by > phred, but is very similar (to be expected); that is, I don't > need to remap C, G and T to get it to align with the phred data. > > So it looks like the SeqIO module is not mapping the sections > of the packed trace data to the appropriate bases. The unpack > order is different than the staden documentation ... but so > is the order I impose to correct the problem. I am still > unclear as to the differences between > V2 and V3 of the format. The major difference appears to be > coding the trace absolutely (V2) or relatively to prior > values (V3); I'd expect if I was using one format and SeqIO > was trying to parse the other that I would get garbage out. > Running in verbose reports "scf.pm is working with a version 2 scf." > > Thoughts on this would be appreciated - can anyone confirm a > problem with trace extraction from SCF? > > I'm hoping that once I convince our admin to (properly) > install staden::read that I can work directly with the ab1 > files, but I need to stopgap on SCF for the time being.... > > -CAT > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From carze at som.umaryland.edu Thu Jun 18 13:51:43 2009 From: carze at som.umaryland.edu (Cesar Arze) Date: Thu, 18 Jun 2009 10:51:43 -0700 (PDT) Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file Message-ID: <24095355.post@talk.nabble.com> Hi all, I've searched through the mailing list and bug-tracker looking for any indication of this (what I presume to be) bug I have been encountering when parsing certain Genbank files using SeqIO::GenBank but have yet to find anything. I apologize in advance if this is something that has already been addressed. When parsing these files and extracting the scientific name it seems that line breaks are causing the lineage info found in the ORGANISM section to be captured as part of the scientific name. An example of this is accession NC_005945: ORGANISM Bacillus anthracis str. Sterne Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus cereus group. Bacillus cereus has a line break which then causes scientific name to capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus" ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name. Not sure if anyone has ever ran into this problem but I would very much appreciate any help or direction. -- View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From charles.tilford at bms.com Thu Jun 18 15:59:01 2009 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 18 Jun 2009 15:59:01 -0400 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu> References: <4A3A435A.8000505@bms.com> <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu> Message-ID: <4A3A9C85.4000603@bms.com> Chris Fields wrote: > Charles, > > The best way to make sure this is addressed is to file a ticket (bug > report) on it so we can properly track it. Ok, I'll put that in. > > AFAIK this module doesn't use staden::read but is pure perl. Yes, that's my understanding too. I'm using the SeqIO module because of ongoing hiccups with the staden installation. > Note: there is also Bio::SCF (non-bp): > > http://search.cpan.org/~lds/Bio-SCF-1.01/ > I have that installed, but have not tried it out yet. Thanks! -CAT > chris > > On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote: > > >> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace >> channels. Can anyone confirm? >> >> Hi all, >> >> I'm using the SCF Bio::SeqIO module to parse trace data out of >> chromatograms. The SCF files are being produced by phred using the "- >> cd" parameter. The traces come out great, and the corresponding base >> calls from the .phd files align with the peaks wonderfully when I >> visualize them on a rendered trace. However, only the A bases align >> to the appropriate trace channel, the rest are mixed up. I find that >> if I do the following re-mapping, the phred base calls match the >> >> SeqIO : Remapped >> A : A >> C : G >> G : T >> T : C >> >> The relevant part of Bio::SeqIO::scf is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9 >> >> ... which indicates that it expects the pack()ed trace data to be in >> order ATGC. The base call parsing code is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8 >> >> ... which is unpacking in order ACGT. As far as I can tell, the >> relevant official SCF documentation is here: >> >> http://staden.sourceforge.net/manual/formats_unix_4.html >> >> ... which indicates that both trace and base order should be ACGT >> (matching the SeqIO unpack() for bases, but not traces). My >> empirical channel unscrambling mapping implies order ACTG, which is >> different from either of the two orders above. The sequence from the >> SCF file (should be that from original AB1 file, I think) is not >> perfectly identical to that called by phred, but is very similar (to >> be expected); that is, I don't need to remap C, G and T to get it to >> align with the phred data. >> >> So it looks like the SeqIO module is not mapping the sections of the >> packed trace data to the appropriate bases. The unpack order is >> different than the staden documentation ... but so is the order I >> impose to correct the problem. I am still unclear as to the >> differences between V2 and V3 of the format. The major difference >> appears to be coding the trace absolutely (V2) or relatively to >> prior values (V3); I'd expect if I was using one format and SeqIO >> was trying to parse the other that I would get garbage out. Running >> in verbose reports "scf.pm is working with a version 2 scf." >> >> Thoughts on this would be appreciated - can anyone confirm a problem >> with trace extraction from SCF? >> >> I'm hoping that once I convince our admin to (properly) install >> staden::read that I can work directly with the ab1 files, but I need >> to stopgap on SCF for the time being.... >> >> -CAT >> > > > > From charles.tilford at bms.com Thu Jun 18 16:02:53 2009 From: charles.tilford at bms.com (Charles Tilford) Date: Thu, 18 Jun 2009 16:02:53 -0400 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: References: <4A3A435A.8000505@bms.com> Message-ID: <4A3A9D6D.2010106@bms.com> Cook, Malcolm wrote: > Charles, > > Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF > > http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm > > It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters. > > Its not in the bioperl project but it is an easy install from CPAN. > Thanks - we installed that a few weeks ago, and it was on my list of things to try, but I had not gotten to it yet since I was getting data out of the SCF SeqIO module. Even though the SeqIO::scf data looks ok, the fact that I need to unscramble it makes me nervous... Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF module and see if it works with our files. Thanks, CAT > I am familiar with staden::read installation woes. > > Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box".... > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > > #!/usr/bin/env perl > > # PURPOSE: extract from AB1 files into fasta format the sequence in > # the 'clear range' defined by 3 parameters. If there is no clear > # range, emit warning and skip the sequence. The fasta 'defline' > # identifier is taken as the sample name. Other useful attributes are > # also embedded into the defline using attribute=value syntax. > > # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1 > > # NOTE: 20 4 20 is ABI default settings > > # EXAMPLE: > # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta > > # AUTHOR: malcolm_cook at stowers-institute.org > > use strict; > use warnings; > use Bio::Trace::ABIF; > use Text::Wrap qw(wrap); > $Text::Wrap::columns = 72; # wrap the sequence > > use File::Basename; > my ($window_width, > $bad_bases_threshold, > $quality_threshold, > @ARGV) = @ARGV; > > my $abif = Bio::Trace::ABIF->new(); > > sub main {} { > foreach (@ARGV) { > $abif->open_abif($_) or die "error opening $_ as ABIF"; > my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width, > $bad_bases_threshold, > $quality_threshold > ); > my $sample_score = $abif->sample_score( > $window_width, > $bad_bases_threshold, > $quality_threshold > ); > # my $contiguous_read_length = $abif->contiguous_read_length($window_width, > # $quality_threshold, > # 0, # ==> trim_ends > # ); > # my $length_of_read = $abif->length_of_read( > # $window_width, > # $quality_threshold, > # # $method > # ); > my $defline = > join "\t", > $abif->sample_name, > #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline > #$abif->container_identifier . ':' . $abif->well_id, # or this, for container:well_id formatted defline identifiers > (map {my $method = $_; > "$method=". ($abif->$method() || '')} > qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment > # sample_tracking_id - don't use this - it is internal to ABI software > "clear_range_start=$clear_range_start", > "clear_range_stop=$clear_range_stop", > "sample_score=$sample_score", > #"contiguous_read_length=$contiguous_read_length", > #"length_of_read=$length_of_read", > ; > if ($clear_range_start == -1) { > warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline"; > next; > } > my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1)); > print ">$defline\n$seq\n"; > $abif->close_abif(); > > } > } > > main (); > > > > > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Charles Tilford >> Sent: Thursday, June 18, 2009 8:39 AM >> To: BioPerl List >> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? >> >> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace >> channels. >> Can anyone confirm? >> >> Hi all, >> >> I'm using the SCF Bio::SeqIO module to parse trace data out >> of chromatograms. The SCF files are being produced by phred >> using the "-cd" >> parameter. The traces come out great, and the corresponding >> base calls from the .phd files align with the peaks >> wonderfully when I visualize them on a rendered trace. >> However, only the A bases align to the appropriate trace >> channel, the rest are mixed up. I find that if I do the >> following re-mapping, the phred base calls match the >> >> SeqIO : Remapped >> A : A >> C : G >> G : T >> T : C >> >> The relevant part of Bio::SeqIO::scf is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B >> io/SeqIO/scf.html#CODE9 >> >> ... which indicates that it expects the pack()ed trace data >> to be in order ATGC. The base call parsing code is here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B >> io/SeqIO/scf.html#CODE8 >> >> ... which is unpacking in order ACGT. As far as I can tell, >> the relevant official SCF documentation is here: >> >> http://staden.sourceforge.net/manual/formats_unix_4.html >> >> ... which indicates that both trace and base order should be >> ACGT (matching the SeqIO unpack() for bases, but not traces). >> My empirical channel unscrambling mapping implies order ACTG, >> which is different from either of the two orders above. The >> sequence from the SCF file (should be that from original AB1 >> file, I think) is not perfectly identical to that called by >> phred, but is very similar (to be expected); that is, I don't >> need to remap C, G and T to get it to align with the phred data. >> >> So it looks like the SeqIO module is not mapping the sections >> of the packed trace data to the appropriate bases. The unpack >> order is different than the staden documentation ... but so >> is the order I impose to correct the problem. I am still >> unclear as to the differences between >> V2 and V3 of the format. The major difference appears to be >> coding the trace absolutely (V2) or relatively to prior >> values (V3); I'd expect if I was using one format and SeqIO >> was trying to parse the other that I would get garbage out. >> Running in verbose reports "scf.pm is working with a version 2 scf." >> >> Thoughts on this would be appreciated - can anyone confirm a >> problem with trace extraction from SCF? >> >> I'm hoping that once I convince our admin to (properly) >> install staden::read that I can work directly with the ab1 >> files, but I need to stopgap on SCF for the time being.... >> >> -CAT >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Thu Jun 18 16:27:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Jun 2009 15:27:02 -0500 Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled? In-Reply-To: <4A3A9D6D.2010106@bms.com> References: <4A3A435A.8000505@bms.com> <4A3A9D6D.2010106@bms.com> Message-ID: <2A9A3AB7-7773-48F1-993C-A679495D0B95@illinois.edu> On Jun 18, 2009, at 3:02 PM, Charles Tilford wrote: > Cook, Malcolm wrote: >> Charles, >> >> Another possible stopgap that might work for you, if you're working >> with AB1 chromatograms and have ABIs kb-basecaller turned on, is to >> use Bio::Trace::ABIF >> >> http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm >> >> It works great and includes implementation of ABIs algorithm >> allowing to (re)compute trace clear ranges using kc-basecallers >> quality scores and any windowing/quality parameters. >> >> Its not in the bioperl project but it is an easy install from CPAN. >> > Thanks - we installed that a few weeks ago, and it was on my list of > things to try, but I had not gotten to it yet since I was getting > data out of the SCF SeqIO module. Even though the SeqIO::scf data > looks ok, the fact that I need to unscramble it makes me nervous... > Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF > module and see if it works with our files. > > Thanks, > CAT You definitely shouldn't need to unscramble it; my guess is this is a legit bug that just has gone unnoticed. I see that you have filed a ticket on it so we can at least track it. Thanks! chris From scott at scottcain.net Thu Jun 18 23:25:35 2009 From: scott at scottcain.net (Scott Cain) Date: Thu, 18 Jun 2009 23:25:35 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4A3A13D3.7050208@ii.uib.no> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> <4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no> Message-ID: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com> Hi Xianjun, The attached script (which is not too different from yours--I only did a little clean up and made the padding consistent) makes the attached image, which is what I think you want. I'm using bioperl-live. Scott On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong wrote: > Hi, Scott, > > Do you mind to have a look of the code (below my signature) if I use the > -postgrid callback correctly? > I still cannnot get the background for the whole panel. > > Thanks > > Xianjun > > > Xianjun Dong wrote: >> >> Hi, Scott >> >> Before I gave up my own whole solution to use GBrowse, I still want to >> bother you once: >> >> As you suggested, I put -postgrid option when the panel, which will call a >> function to draw the background. The code below is almost copied from the >> online POD of Bio::Graphics::Panel (see >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html >> ) >> >> But it still does not work. Could you help to have a look? I paste it >> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap >> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?) >> >> THanks >> >> Xianjun >> >> ----------------------------------------------- mytestcode.pl >> -------------------------- >> >> #!/usr/bin/perl >> >> use strict; >> use lib "$ENV{HOME}/lib"; >> >> use Bio::Graphics; >> use Bio::Graphics::Feature; >> my $ftr= 'Bio::Graphics::Feature'; >> >> # processed_transcript >> my $trans1 = >> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >> my $trans2 = >> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >> my $trans3 = >> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans4 = >> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >> -source=>'a'); >> my $trans5 = >> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >> my $trans ?= >> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >> >> # hightlight >> my $trans31 = >> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >> -source=>'a'); >> my $trans41 = >> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >> -source=>'b'); >> >> my $panel= Bio::Graphics::Panel->new(-width=>1200, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12 >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it); >> >> sub gap_it { >> ? ?my $gd ? ?= shift; >> ? ?my $panel = shift; >> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600); >> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top; >> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom; >> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red'); >> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); >> } >> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >> and 1.6 >> #$panel->add_track([$trans41,$trans31], >> # ? ? ? ? ?-glyph ? => 'background', >> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >> 'a')?'#cccccc':'#fffc22'}, >> # ? ? ? ? ? ? ? ? ?); >> >> $panel->add_track($ftr->new(-start=>100,-end=>1000), >> ? ? ? ? ? ? ? ? -glyph=>'arrow', >> ? ? ? ? ? ? ? ? -double=>1, >> ? ? ? ? ? ? ? ? -tick=>2); >> >> $panel->add_track($trans, >> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr', >> ? ? ? ? ? ? ? ? -fgcolor => 'darkred', >> ? ? ? ? ? ? ? ? -bgcolor => 'darkred', >> ? ? ? ? ? ? ? ? -title => '$source', >> ? ? ? ? ? ? ? ? -link => >> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL >> ? ? ? ? ? ? ? ? ); >> ?print $panel->png; >> >> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >> 1.2.3 >> my $map = $panel->create_web_map("image"); >> $panel->finished(); >> >> >> >> >> >> >> >> >> >> >> Scott Cain wrote: >>> >>> Hi Xianjun, >>> >>> I understand what you want to do, as the current version of gbrowse >>> does this, which uses bioperl 1.6. ?Without digging through the code, >>> I can't tell you exactly how this works and you didn't send your code >>> that uses this callback, so I can't try it either. >>> >>> One thing that is different between your code and gbrowse is that each >>> of the tracks is actually a seperate panel (to allow track dragging), >>> so it possible that this sort of callback doesn't work for >>> Bio::Graphics any more. >>> >>> Scott >>> >>> On Saturday, June 13, 2009, Xianjun Dong >>> wrote: >>> >>>> >>>> Hi, Scott >>>> >>>> Thanks for your reply first. >>>> >>>> I still have question: I dig out the code from GBrowse (which I paste >>>> below). Method make_postgrid_callback gets all highlight region and then use >>>> hilite_regions_closure function to draw them out, using the following GD >>>> function: >>>> >>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>> >>>> where the $bottom=$panel->bottom. This is the only difference from my >>>> code, where I use $gd->height. I guess they are almost same (except the >>>> pad_bottom), we can see this in the code of >>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >>>> >>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for >>>> my highlight regions. The output is same, when using the library of Bioperl >>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") >>>> >>>> OK. I might have not explained my question explicitly. My question is: >>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can >>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"), >>>> where the highlight range will go from the roof to the floor. While in >>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, >>>> not the whole panel. OK, did I explain clearly now? you can see the >>>> difference of the two images. >>>> >>>> [I am not sure the mailist allow to attach image, otherwise, I put them >>>> in the following links: >>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png >>>> test.bioperl1.2.3.png: >>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ] >>>> >>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on >>>> your computer? >>>> >>>> Really want to know how this works in bioperl 1.2.3 (Even though this >>>> might be a bug at that version, or whatever) >>>> >>>> Thanks >>>> >>>> Xianjun >>>> ============================================= >>>> >>>> # this generates the callback for highlighting a region >>>> sub make_postgrid_callback { >>>> ?my $settings = shift; >>>> ?return unless ref $settings->{h_region}; >>>> >>>> ?my @h_regions = map { >>>> ? my ($h_ref,$h_start,$h_end,$h_color) = >>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >>>> ? defined($h_ref) && $h_ref eq $settings->{ref} >>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey'] >>>> ? ? ? ? ? ? ? ?: () >>>> ?} >>>> ? @{$settings->{h_region}}; >>>> >>>> ?return unless @h_regions; >>>> ?return hilite_regions_closure(@h_regions); >>>> } >>>> >>>> # this subroutine generates a Bio::Graphics::Panel callback closure >>>> # suitable for hilighting a region of a panel. >>>> # The args are a list of [start,end,color] >>>> sub hilite_regions_closure { >>>> ?my @h_regions = @_; >>>> >>>> ?return sub { >>>> ? my $gd ? ? = shift; >>>> ? my $panel ?= shift; >>>> ? my $left ? = $panel->pad_left; >>>> ? my $top ? ?= $panel->top; >>>> ? my $bottom = $panel->bottom; >>>> ? for my $r (@h_regions) { >>>> ? ? my ($h_start,$h_end,$h_color) = @$r; >>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end); >>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see >>>> something >>>> ? ? # assuming top is 0 so as to ignore top padding >>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>> ? } >>>> ?}; >>>> } >>>> >>>> >>>> Scott Cain wrote: >>>> >>>> Hello Xianjun, >>>> >>>> I don't think that approach will work. ?What you almost certainly need >>>> to do is a postgrid callback that does the drawing of the highlighted >>>> region. ?For example code of how to do this, take a look at the >>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option >>>> -postgrid is a method of Bio::Graphics::Panel. >>>> >>>> Scott >>>> >>>> >>>> >>>> >>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong >>>> wrote: >>>> >>>> >>>> HI, >>>> >>>> I am not sure this is the right place I can get help. >>>> >>>> I've suffered by a problem for several days: I want to highlight parts >>>> of >>>> regions in my track, using a different background color. To do that, I >>>> defined a glyph named "background", based on the >>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >>>> method, by adding code like below: >>>> >>>> $gd->filledRectangle($left,0,$right,$gd->height, >>>> $self->factory->translate_color($color)); >>>> >>>> # the script is pasted at the end >>>> >>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the >>>> highlight regions into a list of features, and add_track with >>>> -glyph=>'background'. (see the following script, test.pl) This really >>>> works >>>> as I expect, which will add a colored block at background of all tracks >>>> in a >>>> panel (including the ruler arrow). You can see the output image in >>>> attached >>>> file "test.bioperl1.2.3.png" >>>> >>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does >>>> not >>>> work. Well, it works, but the highlight part only shrink to a low >>>> height, >>>> instead of covering all tracks in the panel. I also attached the output >>>> here, see the file "test.bioperl1.6.png". >>>> >>>> I tried to think about the reason, the 'background' module is based on >>>> the >>>> generic module. What can cause the difference? Is it because $gd->height >>>> is >>>> different, or the tracks followed with 'background' track can not draw >>>> from >>>> the first position? >>>> >>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart >>>> person >>>> solve problem, wise person avoid problem"...) But another problem is >>>> coming: >>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >>>> function, which means I have to use some higher version if I want to >>>> create >>>> web map for my graphics, but then I have to give up using highlight >>>> background. >>>> >>>> OK. It's long enough for my first-time submission here. Hope someone can >>>> throw me some clue. >>>> >>>> Thanks ahead!! >>>> >>>> Xianjun >>>> >>>> >>>> ==================== test.pl ======================= >>>> #!/usr/bin/perl >>>> >>>> use strict; >>>> use lib "$ENV{HOME}/lib"; >>>> >>>> use Bio::Graphics; >>>> use Bio::Graphics::Feature; >>>> my $ftr= 'Bio::Graphics::Feature'; >>>> >>>> # processed_transcript >>>> my $trans1 = >>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>>> my $trans2 = >>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>>> my $trans3 = >>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>>> -source=>'a'); >>>> my $trans4 = >>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>>> -source=>'a'); >>>> my $trans5 = >>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>>> my $trans ?= >>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>>> >>>> # hightlight >>>> my $trans31 = >>>> >>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>>> -source=>'a'); >>>> my $trans41 = >>>> >>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>>> -source=>'b'); >>>> >>>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12, >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12); >>>> >>>> # the following track works as I expected in bioperl 1.2.3, but not in >>>> 1.5 >>>> and 1.6 >>>> $panel->add_track([$trans41,$trans31], >>>> ? ? ? ?-glyph ? => 'background', >>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >>>> 'a')?'#cccccc':'#fffc22'}, >>>> ? ? ? ? ? ? ? ?); >>>> >>>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>>> ? ? ? ? ? ? ? ?-glyph=>'arrow', >>>> ? ? ? ? ? ? ? ?-double=>1, >>>> ? ? ? ? ? ? ? ?-tick=>2); >>>> >>>> $panel->add_track($trans, >>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr', >>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred', >>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred', >>>> ? ? ? ? ? ? ? ?-title => '$source', >>>> ? ? ? ? ? ? ? ?-link => >>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', >>>> ?#EnsEMBL >>>> ? ? ? ? ? ? ? ?); >>>> ?print $panel->png; >>>> >>>> # the following part works in bioperl 1.5 and 1.6, but not work in >>>> Bioperl >>>> 1.2.3 >>>> my $map = $panel->create_web_map("image"); >>>> $panel->finished(); >>>> >>>> 1; >>>> >>>> ==================== background.pm ======================= >>>> package Bio::Graphics::Glyph::background; >>>> >>>> use strict; >>>> use base 'Bio::Graphics::Glyph::generic'; >>>> sub pad_top{ >>>> ?return 0; >>>> } >>>> >>>> sub draw_component { >>>> ?my $self = shift; >>>> ?#$self->SUPER::draw_component(@_); >>>> ?my ($gd,$dx,$dy) = @_; >>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >>>> >>>> ?# draw an arrow to indicate the direction of transcript >>>> ?my $color = $self->option('block_bgcolor') || '#cccccc'; >>>> ?$gd->filledRectangle($left,0,$right,$gd->height, >>>> $self->factory->translate_color($color)); >>>> } >>>> >>>> 1; >>>> >>>> -- >>>> ========================================== >>>> Xianjun Dong >>>> PhD student, Lenhard group >>>> Computational Biology Unit >>>> Bergen Center for Computational Science >>>> University of Bergen >>>> Hoyteknologisenteret, Thormohlensgate 55 >>>> N-5008 Bergen, Norway >>>> E-mail: xianjun.dong at bccs.uib.no >>>> Tel.: +47 555 84022 >>>> Fax : +47 555 84295 >>>> ========================================== >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> ========================================== >>>> Xianjun Dong >>>> PhD student, Lenhard group >>>> Computational Biology Unit >>>> Bergen Center for Computational Science >>>> University of Bergen >>>> Hoyteknologisenteret, Thormohlensgate 55 >>>> N-5008 Bergen, Norway >>>> E-mail: xianjun.dong at bccs.uib.no >>>> Tel.: +47 555 84022 >>>> Fax : +47 555 84295 >>>> ========================================== >>>> >>>> >>>> >>> >>> >> > > -- > ========================================== > Xianjun Dong > PhD student, Lenhard group > Computational Biology Unit > Bergen Center for Computational Science > University of Bergen > Hoyteknologisenteret, Thormohlensgate 55 > N-5008 Bergen, Norway > E-mail: xianjun.dong at bccs.uib.no > Tel.: +47 555 84022 > Fax : +47 555 84295 > ========================================== > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- A non-text attachment was scrubbed... Name: postgrid.pl Type: application/x-perl Size: 2140 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postgrid_highlight.png Type: image/png Size: 7195 bytes Desc: not available URL: From scott at scottcain.net Thu Jun 18 23:30:37 2009 From: scott at scottcain.net (Scott Cain) Date: Thu, 18 Jun 2009 23:30:37 -0400 Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for Bio::Graphics::Glyph In-Reply-To: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com> References: <4A32BCDA.4080605@ii.uib.no> <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com> <4A339621.2060702@ii.uib.no> <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com> <4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no> <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com> Message-ID: <4536f7700906182030n74f4293k60ad04ea62b97476@mail.gmail.com> Actually, to be clear, that's bioperl-live and Bio::Graphics version 1.96 from CPAN. On Thu, Jun 18, 2009 at 11:25 PM, Scott Cain wrote: > Hi Xianjun, > > The attached script (which is not too different from yours--I only did > a little clean up and made the padding consistent) makes the attached > image, which is what I think you want. ?I'm using bioperl-live. > > Scott > > > On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong wrote: >> Hi, Scott, >> >> Do you mind to have a look of the code (below my signature) if I use the >> -postgrid callback correctly? >> I still cannnot get the background for the whole panel. >> >> Thanks >> >> Xianjun >> >> >> Xianjun Dong wrote: >>> >>> Hi, Scott >>> >>> Before I gave up my own whole solution to use GBrowse, I still want to >>> bother you once: >>> >>> As you suggested, I put -postgrid option when the panel, which will call a >>> function to draw the background. The code below is almost copied from the >>> online POD of Bio::Graphics::Panel (see >>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html >>> ) >>> >>> But it still does not work. Could you help to have a look? I paste it >>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap >>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?) >>> >>> THanks >>> >>> Xianjun >>> >>> ----------------------------------------------- mytestcode.pl >>> -------------------------- >>> >>> #!/usr/bin/perl >>> >>> use strict; >>> use lib "$ENV{HOME}/lib"; >>> >>> use Bio::Graphics; >>> use Bio::Graphics::Feature; >>> my $ftr= 'Bio::Graphics::Feature'; >>> >>> # processed_transcript >>> my $trans1 = >>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>> my $trans2 = >>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>> my $trans3 = >>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans4 = >>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>> -source=>'a'); >>> my $trans5 = >>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>> my $trans ?= >>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>> >>> # hightlight >>> my $trans31 = >>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>> -source=>'a'); >>> my $trans41 = >>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>> -source=>'b'); >>> >>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12 >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it); >>> >>> sub gap_it { >>> ? ?my $gd ? ?= shift; >>> ? ?my $panel = shift; >>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600); >>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top; >>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom; >>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red'); >>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray); >>> } >>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5 >>> and 1.6 >>> #$panel->add_track([$trans41,$trans31], >>> # ? ? ? ? ?-glyph ? => 'background', >>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >>> 'a')?'#cccccc':'#fffc22'}, >>> # ? ? ? ? ? ? ? ? ?); >>> >>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>> ? ? ? ? ? ? ? ? -glyph=>'arrow', >>> ? ? ? ? ? ? ? ? -double=>1, >>> ? ? ? ? ? ? ? ? -tick=>2); >>> >>> $panel->add_track($trans, >>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr', >>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred', >>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred', >>> ? ? ? ? ? ? ? ? -title => '$source', >>> ? ? ? ? ? ? ? ? -link => >>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL >>> ? ? ? ? ? ? ? ? ); >>> ?print $panel->png; >>> >>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl >>> 1.2.3 >>> my $map = $panel->create_web_map("image"); >>> $panel->finished(); >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Scott Cain wrote: >>>> >>>> Hi Xianjun, >>>> >>>> I understand what you want to do, as the current version of gbrowse >>>> does this, which uses bioperl 1.6. ?Without digging through the code, >>>> I can't tell you exactly how this works and you didn't send your code >>>> that uses this callback, so I can't try it either. >>>> >>>> One thing that is different between your code and gbrowse is that each >>>> of the tracks is actually a seperate panel (to allow track dragging), >>>> so it possible that this sort of callback doesn't work for >>>> Bio::Graphics any more. >>>> >>>> Scott >>>> >>>> On Saturday, June 13, 2009, Xianjun Dong >>>> wrote: >>>> >>>>> >>>>> Hi, Scott >>>>> >>>>> Thanks for your reply first. >>>>> >>>>> I still have question: I dig out the code from GBrowse (which I paste >>>>> below). Method make_postgrid_callback gets all highlight region and then use >>>>> hilite_regions_closure function to draw them out, using the following GD >>>>> function: >>>>> >>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>>> >>>>> where the $bottom=$panel->bottom. This is the only difference from my >>>>> code, where I use $gd->height. I guess they are almost same (except the >>>>> pad_bottom), we can see this in the code of >>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 >>>>> >>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for >>>>> my highlight regions. The output is same, when using the library of Bioperl >>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png") >>>>> >>>>> OK. I might have not explained my question explicitly. My question is: >>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can >>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"), >>>>> where the highlight range will go from the roof to the floor. While in >>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, >>>>> not the whole panel. OK, did I explain clearly now? you can see the >>>>> difference of the two images. >>>>> >>>>> [I am not sure the mailist allow to attach image, otherwise, I put them >>>>> in the following links: >>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png >>>>> test.bioperl1.2.3.png: >>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ] >>>>> >>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on >>>>> your computer? >>>>> >>>>> Really want to know how this works in bioperl 1.2.3 (Even though this >>>>> might be a bug at that version, or whatever) >>>>> >>>>> Thanks >>>>> >>>>> Xianjun >>>>> ============================================= >>>>> >>>>> # this generates the callback for highlighting a region >>>>> sub make_postgrid_callback { >>>>> ?my $settings = shift; >>>>> ?return unless ref $settings->{h_region}; >>>>> >>>>> ?my @h_regions = map { >>>>> ? my ($h_ref,$h_start,$h_end,$h_color) = >>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/; >>>>> ? defined($h_ref) && $h_ref eq $settings->{ref} >>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey'] >>>>> ? ? ? ? ? ? ? ?: () >>>>> ?} >>>>> ? @{$settings->{h_region}}; >>>>> >>>>> ?return unless @h_regions; >>>>> ?return hilite_regions_closure(@h_regions); >>>>> } >>>>> >>>>> # this subroutine generates a Bio::Graphics::Panel callback closure >>>>> # suitable for hilighting a region of a panel. >>>>> # The args are a list of [start,end,color] >>>>> sub hilite_regions_closure { >>>>> ?my @h_regions = @_; >>>>> >>>>> ?return sub { >>>>> ? my $gd ? ? = shift; >>>>> ? my $panel ?= shift; >>>>> ? my $left ? = $panel->pad_left; >>>>> ? my $top ? ?= $panel->top; >>>>> ? my $bottom = $panel->bottom; >>>>> ? for my $r (@h_regions) { >>>>> ? ? my ($h_start,$h_end,$h_color) = @$r; >>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end); >>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see >>>>> something >>>>> ? ? # assuming top is 0 so as to ignore top padding >>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color)); >>>>> ? } >>>>> ?}; >>>>> } >>>>> >>>>> >>>>> Scott Cain wrote: >>>>> >>>>> Hello Xianjun, >>>>> >>>>> I don't think that approach will work. ?What you almost certainly need >>>>> to do is a postgrid callback that does the drawing of the highlighted >>>>> region. ?For example code of how to do this, take a look at the >>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option >>>>> -postgrid is a method of Bio::Graphics::Panel. >>>>> >>>>> Scott >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong >>>>> wrote: >>>>> >>>>> >>>>> HI, >>>>> >>>>> I am not sure this is the right place I can get help. >>>>> >>>>> I've suffered by a problem for several days: I want to highlight parts >>>>> of >>>>> regions in my track, using a different background color. To do that, I >>>>> defined a glyph named "background", based on the >>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component() >>>>> method, by adding code like below: >>>>> >>>>> $gd->filledRectangle($left,0,$right,$gd->height, >>>>> $self->factory->translate_color($color)); >>>>> >>>>> # the script is pasted at the end >>>>> >>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the >>>>> highlight regions into a list of features, and add_track with >>>>> -glyph=>'background'. (see the following script, test.pl) This really >>>>> works >>>>> as I expect, which will add a colored block at background of all tracks >>>>> in a >>>>> panel (including the ruler arrow). You can see the output image in >>>>> attached >>>>> file "test.bioperl1.2.3.png" >>>>> >>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does >>>>> not >>>>> work. Well, it works, but the highlight part only shrink to a low >>>>> height, >>>>> instead of covering all tracks in the panel. I also attached the output >>>>> here, see the file "test.bioperl1.6.png". >>>>> >>>>> I tried to think about the reason, the 'background' module is based on >>>>> the >>>>> generic module. What can cause the difference? Is it because $gd->height >>>>> is >>>>> different, or the tracks followed with 'background' track can not draw >>>>> from >>>>> the first position? >>>>> >>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart >>>>> person >>>>> solve problem, wise person avoid problem"...) But another problem is >>>>> coming: >>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map() >>>>> function, which means I have to use some higher version if I want to >>>>> create >>>>> web map for my graphics, but then I have to give up using highlight >>>>> background. >>>>> >>>>> OK. It's long enough for my first-time submission here. Hope someone can >>>>> throw me some clue. >>>>> >>>>> Thanks ahead!! >>>>> >>>>> Xianjun >>>>> >>>>> >>>>> ==================== test.pl ======================= >>>>> #!/usr/bin/perl >>>>> >>>>> use strict; >>>>> use lib "$ENV{HOME}/lib"; >>>>> >>>>> use Bio::Graphics; >>>>> use Bio::Graphics::Feature; >>>>> my $ftr= 'Bio::Graphics::Feature'; >>>>> >>>>> # processed_transcript >>>>> my $trans1 = >>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR"); >>>>> my $trans2 = >>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS'); >>>>> my $trans3 = >>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', >>>>> -source=>'a'); >>>>> my $trans4 = >>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', >>>>> -source=>'a'); >>>>> my $trans5 = >>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR"); >>>>> my $trans ?= >>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]); >>>>> >>>>> # hightlight >>>>> my $trans31 = >>>>> >>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', >>>>> -source=>'a'); >>>>> my $trans41 = >>>>> >>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', >>>>> -source=>'b'); >>>>> >>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12, >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12); >>>>> >>>>> # the following track works as I expected in bioperl 1.2.3, but not in >>>>> 1.5 >>>>> and 1.6 >>>>> $panel->add_track([$trans41,$trans31], >>>>> ? ? ? ?-glyph ? => 'background', >>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq >>>>> 'a')?'#cccccc':'#fffc22'}, >>>>> ? ? ? ? ? ? ? ?); >>>>> >>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000), >>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow', >>>>> ? ? ? ? ? ? ? ?-double=>1, >>>>> ? ? ? ? ? ? ? ?-tick=>2); >>>>> >>>>> $panel->add_track($trans, >>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr', >>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred', >>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred', >>>>> ? ? ? ? ? ? ? ?-title => '$source', >>>>> ? ? ? ? ? ? ? ?-link => >>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', >>>>> ?#EnsEMBL >>>>> ? ? ? ? ? ? ? ?); >>>>> ?print $panel->png; >>>>> >>>>> # the following part works in bioperl 1.5 and 1.6, but not work in >>>>> Bioperl >>>>> 1.2.3 >>>>> my $map = $panel->create_web_map("image"); >>>>> $panel->finished(); >>>>> >>>>> 1; >>>>> >>>>> ==================== background.pm ======================= >>>>> package Bio::Graphics::Glyph::background; >>>>> >>>>> use strict; >>>>> use base 'Bio::Graphics::Glyph::generic'; >>>>> sub pad_top{ >>>>> ?return 0; >>>>> } >>>>> >>>>> sub draw_component { >>>>> ?my $self = shift; >>>>> ?#$self->SUPER::draw_component(@_); >>>>> ?my ($gd,$dx,$dy) = @_; >>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy); >>>>> >>>>> ?# draw an arrow to indicate the direction of transcript >>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc'; >>>>> ?$gd->filledRectangle($left,0,$right,$gd->height, >>>>> $self->factory->translate_color($color)); >>>>> } >>>>> >>>>> 1; >>>>> >>>>> -- >>>>> ========================================== >>>>> Xianjun Dong >>>>> PhD student, Lenhard group >>>>> Computational Biology Unit >>>>> Bergen Center for Computational Science >>>>> University of Bergen >>>>> Hoyteknologisenteret, Thormohlensgate 55 >>>>> N-5008 Bergen, Norway >>>>> E-mail: xianjun.dong at bccs.uib.no >>>>> Tel.: +47 555 84022 >>>>> Fax : +47 555 84295 >>>>> ========================================== >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ========================================== >>>>> Xianjun Dong >>>>> PhD student, Lenhard group >>>>> Computational Biology Unit >>>>> Bergen Center for Computational Science >>>>> University of Bergen >>>>> Hoyteknologisenteret, Thormohlensgate 55 >>>>> N-5008 Bergen, Norway >>>>> E-mail: xianjun.dong at bccs.uib.no >>>>> Tel.: +47 555 84022 >>>>> Fax : +47 555 84295 >>>>> ========================================== >>>>> >>>>> >>>>> >>>> >>>> >>> >> >> -- >> ========================================== >> Xianjun Dong >> PhD student, Lenhard group >> Computational Biology Unit >> Bergen Center for Computational Science >> University of Bergen >> Hoyteknologisenteret, Thormohlensgate 55 >> N-5008 Bergen, Norway >> E-mail: xianjun.dong at bccs.uib.no >> Tel.: +47 555 84022 >> Fax : +47 555 84295 >> ========================================== >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From roy.chaudhuri at gmail.com Fri Jun 19 06:34:24 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 19 Jun 2009 11:34:24 +0100 Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file In-Reply-To: <24095355.post@talk.nabble.com> References: <24095355.post@talk.nabble.com> Message-ID: <4A3B69B0.8080305@gmail.com> Hi Cesar, I can replicate this using an old Bioperl (version 1.5.2), but it appears to be fixed in version 1.6 and bioperl-live - the scientific_name method returns "Bacillus anthracis str. Sterne". Hope this helps. Roy. Cesar Arze wrote: > Hi all, > I've searched through the mailing list and bug-tracker looking for any > indication of this (what I presume to be) bug I have been encountering when > parsing certain Genbank files using SeqIO::GenBank but have yet to find > anything. I apologize in advance if this is something that has already been > addressed. > > When parsing these files and extracting the scientific name it seems that > line breaks are causing the lineage info found in the ORGANISM section to be > captured as part of the scientific name. An example of this is accession > NC_005945: > > ORGANISM Bacillus anthracis str. Sterne > Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; > Bacillus > cereus group. > > Bacillus cereus has a line break which then causes scientific name to > capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus" > ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes; > Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name. > > Not sure if anyone has ever ran into this problem but I would very much > appreciate any help or direction. From cjfields at illinois.edu Fri Jun 19 16:57:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 19 Jun 2009 15:57:36 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> Message-ID: So, to follow up (and make sure we don't have any overlapping tuits) we should probably determine who wants to work on what (i.e. fastq updating, etc). I think it's possible to quickly add in Solexa/ Illumina/Sanger fastq similar to BioPython, just don't want to step on anyone's toes if they are halfway through doing this. chris On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote: > Better than colorspaced discussions for sure ;) > > Elia > > On 17 Jun 2009, at 21:35, Chris Fields wrote: > >> So, #1 priority is to get fastq up-to-speed, then maybe assess >> other options. >> >> Illuminating discussion, thanks Elia! >> >> urgh, excuse unintended bad pun above... >> >> chris >> >> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: >> >>> Interesting that you mention the database issue. We found that for >>> specific memory/CPU intenstive things we also switch to using dbs. >>> For example, after many years of loyal use of disconnected_ranges >>> we switched to a simple SQL implementation of it, because of the >>> large performance gains it would give us. Similarly in Ensembl as >>> well as in the old days of bioperl-db we opted for doing subseq >>> within SQL where possible. >>> >>> Some lean way of SQL'izing specific components could be less >>> "disruptive" than avoiding object creation and provide significant >>> gains in performance. Could be set as an optional flag, and could >>> use temporary ad hoc SQL databases? >>> >>> Still, priority now is to make SeqIO compliant with all those >>> formats, than we can worry about performance :) >>> >>> Elia >>> >>> On 17 Jun 2009, at 20:30, Chris Fields wrote: >>> >>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>>> >>>>> Tristan Lefebure wrote: >>>>>> Hello, >>>>>> Regarding next-gen sequences and bioperl, following my >>>>>> experience, another issue is bioperl speed. For example, if you >>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads >>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, >>>>>> well, you've got to be patient (but may be I missed some >>>>>> shortcuts...). >>>>> >>>>> This is my concern as well. Or, rather, is there actually a >>>>> significant set of users out there who are dealing with next-gen >>>>> sequencing and would consider using BioPerl for their work? >>>>> >>>>> I'm working with all the 1000-genomes data at the Sanger, and we >>>>> at least are probably never going to use BioPerl for the work. >>>> >>>> Are you using pure perl or (gasp) something else? ;> >>>> >>>> Judging by the feedback there are definitely a set of users who >>>> would like to integrate nextgen into bioperl somehow, probably to >>>> take advantage of other aspects of bioperl. >>>> >>>>>> A pure perl solution will be between 100 to 1000x faster... >>>>>> Would it be possible to have an ultra-light quality object with >>>>>> few simple methods for next-gen reads? >>>>> >>>>> The fastq parser itself already seems pretty fast. The way to >>>>> get the speedup is to not create any Bio::Seq* objects but just >>>>> return the data directly. At that point it's not taking much >>>>> advantage of BioPerl. But certainly it could be done... >>>> >>>> >>>> I suppose the best way to assess what needs to be done is come up >>>> with a set of 'use cases' specifying what users want so we can >>>> design around them, otherwise we're shooting in the dark. >>>> >>>> I'm personally wondering if this could be done as a sequence >>>> database, something similar in theme to Lincoln's >>>> SeqFeature::Store, but sequence only, and returns quality objects >>>> in a similar manner (ala Storable)? Not sure whether that's >>>> feasible, but it's appears at least scalable. >>>> >>>> chris >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> --- >>> Senior Lecturer, Bioinformatics >>> UCL Cancer Institute >>> Paul O' Gorman Building >>> University College London >>> Gower Street >>> WC1E 6BT >>> London >>> UK >>> >>> Office (UCL): +44 207 679 6493 >>> Office (ICMS): +44 0207 8822374 >>> >>> Mobile: +44 7597 566 194 >>> Mobile (Italy): +39 338 8448801 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > --- > Senior Lecturer, Bioinformatics > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Office (UCL): +44 207 679 6493 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 7597 566 194 > Mobile (Italy): +39 338 8448801 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Sat Jun 20 04:46:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 20 Jun 2009 09:46:31 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com> <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu> Message-ID: <320fb6e00906200146t547a0492r23d5f123e01098e8@mail.gmail.com> On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote: > > > On Jun 17, 2009, at 8:25 AM, Peter wrote: > >>> Peter's suggestions also are reasonable, though does biopython have a >>> separate module for each of these variations? ?Our version (I believe) >>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the >>> fastq variant passed in as a separate named argument. >> >> Biopython's SeqIO gives the three FASTQ variants their own unique >> names. This format name is a required argument for parsing/writing >> (we don't try and guess the file format from the data contents). >> Internally we have three separate FASTQ parsers/writers although >> they do share code. > > We could easily do the same if others agree. ?Actually, if we specified that > shorthand for a variant on a format would be designated as -format => > 'format-variant', I think we could easily hack SeqIO to deal with that by > splitting on '-' and passing everything to the constructor as (-format => > 'format', -variant => 'variant'). ?Very little repeated code in this case, > just an additional named parameter indicating the format variant (and the > SeqIO class can do the type checking on that within the constructor). Yes, when I started using names like "fastq-solexa" I did have in mind "main-variant" naming convention, and potentially Biopython may one day actually use this structure when allocating a Bio.SeqIO job to the appropriate parser or writer. For now, the Biopython list of formats is fairly short (and there are relatively few of these sub-formats) so to keep things simple we just have a flat mapping from the format name (e.g. "fasta", "fastq", "fastq-solexa") to the parser/write code. Peter From e.stupka at ucl.ac.uk Sat Jun 20 16:12:18 2009 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Sat, 20 Jun 2009 21:12:18 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk> <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu> <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk> Message-ID: Hi Chris, I agree. I have not written a single line of code so far, while Heikki has some (but has been silent for a while) and you have perhaps some code ready to roll. I am happy to help where needed, just let me know what you'd like me to focus on. If you want to go ahead and implement the fastq staff discussed I can focus on bioperl-run. cheers Elia On 19 Jun 2009, at 21:57, Chris Fields wrote: > So, to follow up (and make sure we don't have any overlapping tuits) > we should probably determine who wants to work on what (i.e. fastq > updating, etc). I think it's possible to quickly add in Solexa/ > Illumina/Sanger fastq similar to BioPython, just don't want to step > on anyone's toes if they are halfway through doing this. > > chris > > On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote: > >> Better than colorspaced discussions for sure ;) >> >> Elia >> >> On 17 Jun 2009, at 21:35, Chris Fields wrote: >> >>> So, #1 priority is to get fastq up-to-speed, then maybe assess >>> other options. >>> >>> Illuminating discussion, thanks Elia! >>> >>> urgh, excuse unintended bad pun above... >>> >>> chris >>> >>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote: >>> >>>> Interesting that you mention the database issue. We found that >>>> for specific memory/CPU intenstive things we also switch to using >>>> dbs. For example, after many years of loyal use of >>>> disconnected_ranges we switched to a simple SQL implementation of >>>> it, because of the large performance gains it would give us. >>>> Similarly in Ensembl as well as in the old days of bioperl-db we >>>> opted for doing subseq within SQL where possible. >>>> >>>> Some lean way of SQL'izing specific components could be less >>>> "disruptive" than avoiding object creation and provide >>>> significant gains in performance. Could be set as an optional >>>> flag, and could use temporary ad hoc SQL databases? >>>> >>>> Still, priority now is to make SeqIO compliant with all those >>>> formats, than we can worry about performance :) >>>> >>>> Elia >>>> >>>> On 17 Jun 2009, at 20:30, Chris Fields wrote: >>>> >>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote: >>>>> >>>>>> Tristan Lefebure wrote: >>>>>>> Hello, >>>>>>> Regarding next-gen sequences and bioperl, following my >>>>>>> experience, another issue is bioperl speed. For example, if >>>>>>> you want to trim bad quality bases at ends of 1E6 Solexa reads >>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, >>>>>>> well, you've got to be patient (but may be I missed some >>>>>>> shortcuts...). >>>>>> >>>>>> This is my concern as well. Or, rather, is there actually a >>>>>> significant set of users out there who are dealing with next- >>>>>> gen sequencing and would consider using BioPerl for their work? >>>>>> >>>>>> I'm working with all the 1000-genomes data at the Sanger, and >>>>>> we at least are probably never going to use BioPerl for the work. >>>>> >>>>> Are you using pure perl or (gasp) something else? ;> >>>>> >>>>> Judging by the feedback there are definitely a set of users who >>>>> would like to integrate nextgen into bioperl somehow, probably >>>>> to take advantage of other aspects of bioperl. >>>>> >>>>>>> A pure perl solution will be between 100 to 1000x faster... >>>>>>> Would it be possible to have an ultra-light quality object >>>>>>> with few simple methods for next-gen reads? >>>>>> >>>>>> The fastq parser itself already seems pretty fast. The way to >>>>>> get the speedup is to not create any Bio::Seq* objects but just >>>>>> return the data directly. At that point it's not taking much >>>>>> advantage of BioPerl. But certainly it could be done... >>>>> >>>>> >>>>> I suppose the best way to assess what needs to be done is come >>>>> up with a set of 'use cases' specifying what users want so we >>>>> can design around them, otherwise we're shooting in the dark. >>>>> >>>>> I'm personally wondering if this could be done as a sequence >>>>> database, something similar in theme to Lincoln's >>>>> SeqFeature::Store, but sequence only, and returns quality >>>>> objects in a similar manner (ala Storable)? Not sure whether >>>>> that's feasible, but it's appears at least scalable. >>>>> >>>>> chris >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> --- >>>> Senior Lecturer, Bioinformatics >>>> UCL Cancer Institute >>>> Paul O' Gorman Building >>>> University College London >>>> Gower Street >>>> WC1E 6BT >>>> London >>>> UK >>>> >>>> Office (UCL): +44 207 679 6493 >>>> Office (ICMS): +44 0207 8822374 >>>> >>>> Mobile: +44 7597 566 194 >>>>