From barry.moore at genetics.utah.edu Thu Nov 1 00:03:01 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 31 Oct 2007 22:03:01 -0600 Subject: [Bioperl-l] BLAST output parsing In-Reply-To: References: <13519112.post@talk.nabble.com> Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu> Swapna- If you are using NCBI fasta files you can use files from NCBIs gene database to map your gene IDs to names and organisms. Look in particular at the files gene2accession, gene2refseq, and gene_info. For example, if you had RefSeq protein IDs like NP_123456, you could use gene2refseq to map those RefSeq accessions to gene IDs and then gene_info to map the gene IDs to organisms and gene name. B On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote: > Swapna, > >> I am new to bioperl. I did BLAST search of ~4000 genes and I need >> to parse >> it. I did use -m 9 option to get a tabular information of the >> blast data. >> But it does not include the gene names or the names of the >> organisms of each >> hit. Are there any parsers that can do this job ?? > > The -m 9 tabular output does not include gene descriptions and > organisms. It only includes the "gene id" that was present immediately > after the ">" sign in the FASTA file that was used to create the BLAST > database you specified with the -d option when you ran BLAST. > > Hence, no parser will help you. You either have to re-do the BLAST > with a different -m value that includes the information you desire, or > write code to convert your gene IDs into what you want. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 05:45:43 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 10:45:43 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de> Dear all, I have emboss installed on a windows machine. (Embosswin). I can run this from the dos command line and the path is present. However, when I try to call an emboss application from bioperl I get a "Application not found error" my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); gives the following error -------------------- WARNING --------------------- MSG: Application [fuzznuc] is not available! --------------------------------------------------- Can't call method "run" on an undefined value at searchPatterns.pl line 102. Can somebody help me fix this ? best regards Rohit -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 10:22:14 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:22:14 -0400 Subject: [Bioperl-l] PAML/Codeml parsing Message-ID: PAML4 breaks our PAML parser right now because the order of things in the result file has changed. Now sequences precede the information about the version or the program run. This means that $result- >get_seqs() fails because we don't parse the sequences. We'll see what we can do, but as usual with supporting 3rd party programs it is brittle when file formats change. Th -jason -- Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Nov 1 10:32:06 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:32:06 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Presumably the PATH is not getting set properly - you should play around printing the $ENV{PATH} variable in a perl script to see if actually contains the directory where the emboss programs are installed. Bioperl can only guess so much as to where to find an application. It is also possible that we aren't creating the proper path to the executable - you can print the executable path with print $fuzznuc->executable I believe unless it is throwing an error at the program() line. It looks like the code in the Factory object is a little fragile assuming that the programs HAVE to be in your $PATH. I don't know if windows+perl is special in any way that it run things so I can't really tell if there is specific things you have to do here. You may have to run this through cygwin in case PATH and such are just not available properly to windowsPerl. -jason On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. However, > when I > try to call > an emboss application from bioperl I get a "Application not found > error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at searchPatterns.pl > line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Thu Nov 1 10:54:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 09:54:09 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu> This worked for me previously when I tested with WinXP on my old machine using EMBOSS v5: ftp://emboss.open-bio.org/pub/EMBOSS/windows I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably better to use the latest EMBOSS version anyway so I suggest trying the version in the above link. I'll test it again today and let you know what I find. chris On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, >> when I >> try to call >> an emboss application from bioperl I get a "Application not found >> error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl >> line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Thu Nov 1 11:31:40 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 11:31:40 -0400 Subject: [Bioperl-l] PAML3 vs 4 Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org> Small tweaks were needed to parse PAML4 results. Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly now on both PAML 3 and 4. You'll need to get the latest code from CVS in order to see the changes to Bio/Tools/Phylo/PAML.pm I've added tests for PAML4 in the parser and the run code. If you have scripts that use codeml please give it a try. I have not attempted to play with BASEML or AAML results at this point so if you also have codes that use those programs, please try it out and provide bugreports if we need to fix things. -jason -- Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Nov 1 13:25:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 1 Nov 2007 10:25:30 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl onwindows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu> Sounds like a path issue. Try to tell bioperl the full path to the executable rather than just the executable name. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 2:46 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl: cannot run emboss programs > using bioperl onwindows > > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. > However, when I > try to call > an emboss application from bioperl I get a "Application not > found error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at > searchPatterns.pl line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 14:06:48 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:06:48 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de> Thanks for all the suggestions... but I unfortunately still cannot run emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), and the path is set correctly. I printed $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct location. I also tried setting the path directly but I'm not sure how to do this, so I tried this... my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); this also did not work. Also tried printing... $fuzznuc->executable() gave the following error again -------------------- WARNING --------------------- MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! --------------------------------------------------- Any more ideas ? thanks ! Rohit here's the code... use strict; use Bio::Factory::EMBOSS; use Data::Dumper; # # print "PATH=$ENV{PATH}\n"; # path contains C:\EMBOSSwin which is the correct location # embossversion is 2.10.0-Win-0.8 my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper ($f); my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe as well, print Dump ($fuzznuc); #dump of fuzznuc #$VAR1 = bless( { # '_programgroup' => {}, # '_programs' => {}, # '_groups' => {} # }, 'Bio::Factory::EMBOSS' ); #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work my $infile = "temp.fasta"; my $motif = "ATGTCGATC"; my $outfile = "test.out"; $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); Here's the error again.... #-------------------- WARNING --------------------- #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! #--------------------------------------------------- Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, when I >> try to call >> an emboss application from bioperl I get a "Application not found error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 14:37:24 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 14:37:24 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> You could try this - can't test it though so not sure. my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); -jason On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > Thanks for all the suggestions... but I unfortunately still cannot run > emboss. I am running the latest version of embosswin (2.10.0- > Win-0.8), > and the > path is set correctly. I printed $ENV{$PATH} and this contains > C:\EMBOSSwin which is the correct location. > I also tried setting the path directly but I'm not sure how to do > this, > so I tried this... > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > this also did not work. > > Also tried printing... > $fuzznuc->executable() > > gave the following error again > -------------------- WARNING --------------------- > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > --------------------------------------------------- > > Any more ideas ? > > thanks ! > Rohit > > > here's the code... > > use strict; > use Bio::Factory::EMBOSS; > use Data::Dumper; > > # > # print "PATH=$ENV{PATH}\n"; > # path contains C:\EMBOSSwin which is the correct location > # embossversion is 2.10.0-Win-0.8 > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > print Dumper ($f); > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > fuzznuc.exe > as well, > print Dump ($fuzznuc); > > #dump of fuzznuc > #$VAR1 = bless( { > # '_programgroup' => {}, > # '_programs' => {}, > # '_groups' => {} > # }, 'Bio::Factory::EMBOSS' ); > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > my $infile = "temp.fasta"; > my $motif = "ATGTCGATC"; > my $outfile = "test.out"; > > > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > > Here's the error again.... > > #-------------------- WARNING --------------------- > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > #--------------------------------------------------- > > > > > Jason Stajich wrote: >> Presumably the PATH is not getting set properly - you should play >> around printing the $ENV{PATH} variable in a perl script to see if >> actually contains the directory where the emboss programs are >> installed. Bioperl can only guess so much as to where to find an >> application. It is also possible that we aren't creating the proper >> path to the executable - you can print the executable path with >> print $fuzznuc->executable >> I believe unless it is throwing an error at the program() line. >> >> It looks like the code in the Factory object is a little fragile >> assuming that the programs HAVE to be in your $PATH. I don't know if >> windows+perl is special in any way that it run things so I can't >> really tell if there is specific things you have to do here. You may >> have to run this through cygwin in case PATH and such are just not >> available properly to windowsPerl. >> >> -jason >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >> >>> Dear all, >>> >>> I have emboss installed on a windows machine. (Embosswin). I can run >>> this from the dos command line and the path is present. However, >>> when I >>> try to call >>> an emboss application from bioperl I get a "Application not found >>> error" >>> >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> gives the following error >>> >>> -------------------- WARNING --------------------- >>> MSG: Application [fuzznuc] is not available! >>> --------------------------------------------------- >>> Can't call method "run" on an undefined value at >>> searchPatterns.pl line >>> 102. >>> >>> Can somebody help me fix this ? >>> >>> best regards >>> Rohit >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 14:41:41 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:41:41 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de> Hi Jason I tried this as well. This also gives the same error message. -Rohit Jason Stajich wrote: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > >> >> >> Thanks for all the suggestions... but I unfortunately still cannot run >> emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), >> and the >> path is set correctly. I printed $ENV{$PATH} and this contains >> C:\EMBOSSwin which is the correct location. >> I also tried setting the path directly but I'm not sure how to do this, >> so I tried this... >> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >> >> this also did not work. >> >> Also tried printing... >> $fuzznuc->executable() >> >> gave the following error again >> -------------------- WARNING --------------------- >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> --------------------------------------------------- >> >> Any more ideas ? >> >> thanks ! >> Rohit >> >> >> here's the code... >> >> use strict; >> use Bio::Factory::EMBOSS; >> use Data::Dumper; >> >> # >> # print "PATH=$ENV{PATH}\n"; >> # path contains C:\EMBOSSwin which is the correct location >> # embossversion is 2.10.0-Win-0.8 >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> print Dumper ($f); >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >> print Dump ($fuzznuc); >> >> #dump of fuzznuc >> #$VAR1 = bless( { >> # '_programgroup' => {}, >> # '_programs' => {}, >> # '_groups' => {} >> # }, 'Bio::Factory::EMBOSS' ); >> >> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >> >> my $infile = "temp.fasta"; >> my $motif = "ATGTCGATC"; >> my $outfile = "test.out"; >> >> >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> >> Here's the error again.... >> >> #-------------------- WARNING --------------------- >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> #--------------------------------------------------- >> >> >> >> >> Jason Stajich wrote: >>> Presumably the PATH is not getting set properly - you should play >>> around printing the $ENV{PATH} variable in a perl script to see if >>> actually contains the directory where the emboss programs are >>> installed. Bioperl can only guess so much as to where to find an >>> application. It is also possible that we aren't creating the proper >>> path to the executable - you can print the executable path with >>> print $fuzznuc->executable >>> I believe unless it is throwing an error at the program() line. >>> >>> It looks like the code in the Factory object is a little fragile >>> assuming that the programs HAVE to be in your $PATH. I don't know if >>> windows+perl is special in any way that it run things so I can't >>> really tell if there is specific things you have to do here. You may >>> have to run this through cygwin in case PATH and such are just not >>> available properly to windowsPerl. >>> >>> -jason >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>> >>>> Dear all, >>>> >>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>> this from the dos command line and the path is present. However, >>>> when I >>>> try to call >>>> an emboss application from bioperl I get a "Application not found >>>> error" >>>> >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> my $fuzznuc = $f->program('fuzznuc'); >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> gives the following error >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: Application [fuzznuc] is not available! >>>> --------------------------------------------------- >>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>> line >>>> 102. >>>> >>>> Can somebody help me fix this ? >>>> >>>> best regards >>>> Rohit >>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From MEC at stowers-institute.org Thu Nov 1 14:57:33 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 1 Nov 2007 13:57:33 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: in the code http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 there is a call to `wossname` (c.f. http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html ) is wossname in your path? Maybe it needs to be wossname.exe under windows? Malcolm Cook > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 1:42 PM > To: Jason Stajich > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs > usingbioperlonwindows > > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: > > You could try this - can't test it though so not sure. > > my $fuzznuc = $f->program('fuzznuc'); > > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > > > -jason > > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > >> > >> > >> Thanks for all the suggestions... but I unfortunately still cannot > >> run emboss. I am running the latest version of embosswin > >> (2.10.0-Win-0.8), and the path is set correctly. I printed > >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct > >> location. > >> I also tried setting the path directly but I'm not sure how to do > >> this, so I tried this... > >> > >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > >> > >> this also did not work. > >> > >> Also tried printing... > >> $fuzznuc->executable() > >> > >> gave the following error again > >> -------------------- WARNING --------------------- > >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> --------------------------------------------------- > >> > >> Any more ideas ? > >> > >> thanks ! > >> Rohit > >> > >> > >> here's the code... > >> > >> use strict; > >> use Bio::Factory::EMBOSS; > >> use Data::Dumper; > >> > >> # > >> # print "PATH=$ENV{PATH}\n"; > >> # path contains C:\EMBOSSwin which is the correct location # > >> embossversion is 2.10.0-Win-0.8 > >> > >> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS > application > >> object from the factory print Dumper ($f); my $fuzznuc = > >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe > as well, > >> print Dump ($fuzznuc); > >> > >> #dump of fuzznuc > >> #$VAR1 = bless( { > >> # '_programgroup' => {}, > >> # '_programs' => {}, > >> # '_groups' => {} > >> # }, 'Bio::Factory::EMBOSS' ); > >> > >> #print "executing -- >", $fuzznuc->executable, "\n" ; # > doesn't work > >> > >> my $infile = "temp.fasta"; > >> my $motif = "ATGTCGATC"; > >> my $outfile = "test.out"; > >> > >> > >> $fuzznuc->run( > >> { -sequence => $infile, > >> -pattern => $motif, > >> -outfile => $outfile > >> }); > >> > >> Here's the error again.... > >> > >> #-------------------- WARNING --------------------- > >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> #--------------------------------------------------- > >> > >> > >> > >> > >> Jason Stajich wrote: > >>> Presumably the PATH is not getting set properly - you should play > >>> around printing the $ENV{PATH} variable in a perl script > to see if > >>> actually contains the directory where the emboss programs are > >>> installed. Bioperl can only guess so much as to where to find an > >>> application. It is also possible that we aren't creating > the proper > >>> path to the executable - you can print the executable path with > >>> print $fuzznuc->executable I believe unless it is > throwing an error > >>> at the program() line. > >>> > >>> It looks like the code in the Factory object is a little fragile > >>> assuming that the programs HAVE to be in your $PATH. I > don't know > >>> if > >>> windows+perl is special in any way that it run things so I can't > >>> really tell if there is specific things you have to do > here. You may > >>> have to run this through cygwin in case PATH and such are > just not > >>> available properly to windowsPerl. > >>> > >>> -jason > >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >>> > >>>> Dear all, > >>>> > >>>> I have emboss installed on a windows machine. (Embosswin). I can > >>>> run this from the dos command line and the path is present. > >>>> However, when I try to call an emboss application from bioperl I > >>>> get a "Application not found error" > >>>> > >>>> > >>>> my $f = Bio::Factory::EMBOSS->new(); > >>>> # get an EMBOSS application object from the factory > >>>> my $fuzznuc = $f->program('fuzznuc'); > >>>> $fuzznuc->run( > >>>> { -sequence => $infile, > >>>> -pattern => $motif, > >>>> -outfile => $outfile > > >>>> }); > >>>> gives the following error > >>>> > >>>> -------------------- WARNING --------------------- > >>>> MSG: Application [fuzznuc] is not available! > >>>> --------------------------------------------------- > >>>> Can't call method "run" on an undefined value at > searchPatterns.pl > >>>> line 102. > >>>> > >>>> Can somebody help me fix this ? > >>>> > >>>> best regards > >>>> Rohit > >>>> > >>>> -- > >>>> > >>>> Dr. Rohit Ghai > >>>> Institute of Medical Microbiology > >>>> Faculty of Medicine > >>>> Justus-Liebig University > >>>> Frankfurter Strasse 107 > >>>> 35392 - Giessen > >>>> GERMANY > >>>> > >>>> Tel : 0049 (0)641-9946413 > >>>> Fax : 0049 (0)641-9946409 > >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> -- > >>> Jason Stajich > >>> jason at bioperl.org > >>> > >> > >> -- > >> > >> Dr. Rohit Ghai > >> Institute of Medical Microbiology > >> Faculty of Medicine > >> Justus-Liebig University > >> Frankfurter Strasse 107 > >> 35392 - Giessen > >> GERMANY > >> > >> Tel : 0049 (0)641-9946413 > >> Fax : 0049 (0)641-9946409 > >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Thu Nov 1 15:51:41 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Nov 2007 13:51:41 -0600 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx> Doesn't EMBOSS binaries live under 'bin'? Perhaps setting PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this: my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc'); Adding .exe might be worth trying as well. Mauricio. Cook, Malcolm wrote: > in the code > http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 > > there is a call to `wossname` (c.f. > http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html > ) > > is wossname in your path? > > Maybe it needs to be wossname.exe under windows? > > > Malcolm Cook > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai >> Sent: Thursday, November 01, 2007 1:42 PM >> To: Jason Stajich >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs >> usingbioperlonwindows >> >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot >>>> run emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), and the path is set correctly. I printed >>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct >>>> location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location # >>>> embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS >> application >>>> object from the factory print Dumper ($f); my $fuzznuc = >>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # >> doesn't work >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script >> to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating >> the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable I believe unless it is >> throwing an error >>>>> at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I >> don't know >>>>> if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do >> here. You may >>>>> have to run this through cygwin in case PATH and such are >> just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can >>>>>> run this from the dos command line and the path is present. >>>>>> However, when I try to call an emboss application from bioperl I >>>>>> get a "Application not found error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >> >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at >> searchPatterns.pl >>>>>> line 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> >>>>>> Dr. Rohit Ghai >>>>>> Institute of Medical Microbiology >>>>>> Faculty of Medicine >>>>>> Justus-Liebig University >>>>>> Frankfurter Strasse 107 >>>>>> 35392 - Giessen >>>>>> GERMANY >>>>>> >>>>>> Tel : 0049 (0)641-9946413 >>>>>> Fax : 0049 (0)641-9946409 >>>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Nov 1 16:07:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 15:07:39 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> I did a little investigating using my old PC and was able to get fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a hoop or two but I managed to get it working. First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. You need to remove EMBOSSWin and install the one I linked to previously (this is an actual EMBOSS beta release). It's possible older EMBOSSWin can be configured, but I don't plan on checking it out myself. Next, you need to ensure the binaries are in your PATH env. variable (test by running 'wossname' on the command line), then set EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP recognizes the UNIX'y form as a valid path. If you don't know how to set env. variables go here: http://vlaurie.com/computers2/Articles/environment.htm Once that is set up you should be able to run the script using the latest (greatest?) EMBOSS. chris On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: >> You could try this - can't test it though so not sure. >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >> >> -jason >> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >> >>> >>> >>> Thanks for all the suggestions... but I unfortunately still >>> cannot run >>> emboss. I am running the latest version of embosswin (2.10.0- >>> Win-0.8), >>> and the >>> path is set correctly. I printed $ENV{$PATH} and this contains >>> C:\EMBOSSwin which is the correct location. >>> I also tried setting the path directly but I'm not sure how to do >>> this, >>> so I tried this... >>> >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>> >>> this also did not work. >>> >>> Also tried printing... >>> $fuzznuc->executable() >>> >>> gave the following error again >>> -------------------- WARNING --------------------- >>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> --------------------------------------------------- >>> >>> Any more ideas ? >>> >>> thanks ! >>> Rohit >>> >>> >>> here's the code... >>> >>> use strict; >>> use Bio::Factory::EMBOSS; >>> use Data::Dumper; >>> >>> # >>> # print "PATH=$ENV{PATH}\n"; >>> # path contains C:\EMBOSSwin which is the correct location >>> # embossversion is 2.10.0-Win-0.8 >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> print Dumper ($f); >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>> fuzznuc.exe >>> as well, >>> print Dump ($fuzznuc); >>> >>> #dump of fuzznuc >>> #$VAR1 = bless( { >>> # '_programgroup' => {}, >>> # '_programs' => {}, >>> # '_groups' => {} >>> # }, 'Bio::Factory::EMBOSS' ); >>> >>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't >>> work >>> >>> my $infile = "temp.fasta"; >>> my $motif = "ATGTCGATC"; >>> my $outfile = "test.out"; >>> >>> >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> >>> Here's the error again.... >>> >>> #-------------------- WARNING --------------------- >>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> #--------------------------------------------------- >>> >>> >>> >>> >>> Jason Stajich wrote: >>>> Presumably the PATH is not getting set properly - you should play >>>> around printing the $ENV{PATH} variable in a perl script to see if >>>> actually contains the directory where the emboss programs are >>>> installed. Bioperl can only guess so much as to where to find an >>>> application. It is also possible that we aren't creating the >>>> proper >>>> path to the executable - you can print the executable path with >>>> print $fuzznuc->executable >>>> I believe unless it is throwing an error at the program() line. >>>> >>>> It looks like the code in the Factory object is a little fragile >>>> assuming that the programs HAVE to be in your $PATH. I don't >>>> know if >>>> windows+perl is special in any way that it run things so I can't >>>> really tell if there is specific things you have to do here. You >>>> may >>>> have to run this through cygwin in case PATH and such are just not >>>> available properly to windowsPerl. >>>> >>>> -jason >>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>> >>>>> Dear all, >>>>> >>>>> I have emboss installed on a windows machine. (Embosswin). I >>>>> can run >>>>> this from the dos command line and the path is present. However, >>>>> when I >>>>> try to call >>>>> an emboss application from bioperl I get a "Application not found >>>>> error" >>>>> >>>>> >>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>> # get an EMBOSS application object from the factory >>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>> $fuzznuc->run( >>>>> { -sequence => $infile, >>>>> -pattern => $motif, >>>>> -outfile => $outfile >>>>> }); >>>>> gives the following error >>>>> >>>>> -------------------- WARNING --------------------- >>>>> MSG: Application [fuzznuc] is not available! >>>>> --------------------------------------------------- >>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>> line >>>>> 102. >>>>> >>>>> Can somebody help me fix this ? >>>>> >>>>> best regards >>>>> Rohit >>>>> >>>>> -- >>>>> >>>>> Dr. Rohit Ghai >>>>> Institute of Medical Microbiology >>>>> Faculty of Medicine >>>>> Justus-Liebig University >>>>> Frankfurter Strasse 107 >>>>> 35392 - Giessen >>>>> GERMANY >>>>> >>>>> Tel : 0049 (0)641-9946413 >>>>> Fax : 0049 (0)641-9946409 >>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From neetisomaiya at gmail.com Fri Nov 2 00:20:27 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 2 Nov 2007 09:50:27 +0530 Subject: [Bioperl-l] need help Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Hi, This is a perl question, not bioperl. Can anyone point me to a perl program/code/function which can calculate the number of days between any two given dates. Any help will be deeply appreciated. Thanks. -- -Neeti Even my blood says, B positive From whs at ebi.ac.uk Fri Nov 2 01:01:20 2007 From: whs at ebi.ac.uk (Will Spooner) Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT) Subject: [Bioperl-l] need help In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Message-ID: Hi Neeti, A non-bioperl answer to your perl questio; Date::Calc should do the trick. Will On Fri, 2 Nov 2007, neeti somaiya wrote: > Hi, > > This is a perl question, not bioperl. > Can anyone point me to a perl program/code/function which can calculate the > number of days between any two given dates. > Any help will be deeply appreciated. > Thanks. > > From smarkel at accelrys.com Sat Nov 3 02:01:38 2007 From: smarkel at accelrys.com (Scott Markel) Date: Fri, 2 Nov 2007 23:01:38 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: I set multiple environment variables in my code. $ENV{EMBOSS_ROOT} = $embossPath; $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); $ENV{EMBOSS_DB_DIR} = File::Spec->catdir($embossPath, "test"); $ENV{EMBOSS_DATA} = File::Spec->catdir($embossPath, "data"); $ENV{PATH} = $embossPath; I found it necessary to set both PATH and EMBOSS_ROOT. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > > > > > Thanks for all the suggestions... but I unfortunately still cannot run > > emboss. I am running the latest version of embosswin (2.10.0- > > Win-0.8), > > and the > > path is set correctly. I printed $ENV{$PATH} and this contains > > C:\EMBOSSwin which is the correct location. > > I also tried setting the path directly but I'm not sure how to do > > this, > > so I tried this... > > > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > > > this also did not work. > > > > Also tried printing... > > $fuzznuc->executable() > > > > gave the following error again > > -------------------- WARNING --------------------- > > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > --------------------------------------------------- > > > > Any more ideas ? > > > > thanks ! > > Rohit > > > > > > here's the code... > > > > use strict; > > use Bio::Factory::EMBOSS; > > use Data::Dumper; > > > > # > > # print "PATH=$ENV{PATH}\n"; > > # path contains C:\EMBOSSwin which is the correct location > > # embossversion is 2.10.0-Win-0.8 > > > > my $f = Bio::Factory::EMBOSS->new(); > > # get an EMBOSS application object from the factory > > print Dumper ($f); > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > > fuzznuc.exe > > as well, > > print Dump ($fuzznuc); > > > > #dump of fuzznuc > > #$VAR1 = bless( { > > # '_programgroup' => {}, > > # '_programs' => {}, > > # '_groups' => {} > > # }, 'Bio::Factory::EMBOSS' ); > > > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > > > my $infile = "temp.fasta"; > > my $motif = "ATGTCGATC"; > > my $outfile = "test.out"; > > > > > > $fuzznuc->run( > > { -sequence => $infile, > > -pattern => $motif, > > -outfile => $outfile > > }); > > > > Here's the error again.... > > > > #-------------------- WARNING --------------------- > > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > #--------------------------------------------------- > > > > > > > > > > Jason Stajich wrote: > >> Presumably the PATH is not getting set properly - you should play > >> around printing the $ENV{PATH} variable in a perl script to see if > >> actually contains the directory where the emboss programs are > >> installed. Bioperl can only guess so much as to where to find an > >> application. It is also possible that we aren't creating the proper > >> path to the executable - you can print the executable path with > >> print $fuzznuc->executable > >> I believe unless it is throwing an error at the program() line. > >> > >> It looks like the code in the Factory object is a little fragile > >> assuming that the programs HAVE to be in your $PATH. I don't know if > >> windows+perl is special in any way that it run things so I can't > >> really tell if there is specific things you have to do here. You may > >> have to run this through cygwin in case PATH and such are just not > >> available properly to windowsPerl. > >> > >> -jason > >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> > >>> Dear all, > >>> > >>> I have emboss installed on a windows machine. (Embosswin). I can run > >>> this from the dos command line and the path is present. However, > >>> when I > >>> try to call > >>> an emboss application from bioperl I get a "Application not found > >>> error" > >>> > >>> > >>> my $f = Bio::Factory::EMBOSS->new(); > >>> # get an EMBOSS application object from the factory > >>> my $fuzznuc = $f->program('fuzznuc'); > >>> $fuzznuc->run( > >>> { -sequence => $infile, > >>> -pattern => $motif, > >>> -outfile => $outfile > >>> }); > >>> gives the following error > >>> > >>> -------------------- WARNING --------------------- > >>> MSG: Application [fuzznuc] is not available! > >>> --------------------------------------------------- > >>> Can't call method "run" on an undefined value at > >>> searchPatterns.pl line > >>> 102. > >>> > >>> Can somebody help me fix this ? > >>> > >>> best regards > >>> Rohit > >>> > >>> -- > >>> > >>> Dr. Rohit Ghai > >>> Institute of Medical Microbiology > >>> Faculty of Medicine > >>> Justus-Liebig University > >>> Frankfurter Strasse 107 > >>> 35392 - Giessen > >>> GERMANY > >>> > >>> Tel : 0049 (0)641-9946413 > >>> Fax : 0049 (0)641-9946409 > >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > > > > -- > > > > Dr. Rohit Ghai > > Institute of Medical Microbiology > > Faculty of Medicine > > Justus-Liebig University > > Frankfurter Strasse 107 > > 35392 - Giessen > > GERMANY > > > > Tel : 0049 (0)641-9946413 > > Fax : 0049 (0)641-9946409 > > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Sat Nov 3 10:07:52 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Sat, 03 Nov 2007 15:07:52 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. #however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; Chris Fields wrote: > I did a little investigating using my old PC and was able to get > fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a > hoop or two but I managed to get it working. > > First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. > You need to remove EMBOSSWin and install the one I linked to > previously (this is an actual EMBOSS beta release). It's possible > older EMBOSSWin can be configured, but I don't plan on checking it out > myself. > > Next, you need to ensure the binaries are in your PATH env. variable > (test by running 'wossname' on the command line), then set EMBOSS_DATA > to point at the EMBOSS data directory using a UNIX-like path (i.e. > 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP > recognizes the UNIX'y form as a valid path. If you don't know how to > set env. variables go here: > > http://vlaurie.com/computers2/Articles/environment.htm > > Once that is set up you should be able to run the script using the > latest (greatest?) EMBOSS. > > chris > > On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot run >>>> emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), >>>> and the >>>> path is set correctly. I printed $ENV{$PATH} and this contains >>>> C:\EMBOSSwin which is the correct location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, >>>> so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location >>>> # embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> print Dumper ($f); >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>>> fuzznuc.exe >>>> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >>>> >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable >>>>> I believe unless it is throwing an error at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I don't know if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do here. You may >>>>> have to run this through cygwin in case PATH and such are just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>>>> this from the dos command line and the path is present. However, >>>>>> when I >>>>>> try to call >>>>>> an emboss application from bioperl I get a "Application not found >>>>>> error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>>> line >>>>>> 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> > > From hlapp at gmx.net Sun Nov 4 12:42:13 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 4 Nov 2007 12:42:13 -0500 Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net> Hi Stefanie, sorry for taking so long to respond - your email got buried in a pile while I was away on travel. The Bio::SeqFeature::Gene::* modules were written mostly with the motivation to have a model that can represent the results of gene predictors. GenBank AFAIK doesn't annotate introns explicitly, though they should be implicit from cDNA (or mRNA? or gene, as you say) features on genomic sequence. The Bioperl SeqIO parsers won't transform those into a Bio::SeqFeature::Gene-based model, but instead will yield just plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent processing to build these into more hierarchical models. I'm not sure whether someone's done this already for GenBank-type feature tables. There is a Unflattener that at least attempts to build a feature hierarchy from the flat array that's compliant with the Sequence Ontology (or so I recall). I'm copying the list in case others have additional suggestions. -hilmar On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote: > > > Hello Hilmar, > > I have a question about your bioperl module > Bio::SeqFeature::Gene::Transcript: > > I can't figure out how to generate the $gene object for use in this > line: > @introns = $gene->introns(); > > The data I'm working with is a local file in genbank format, and > I'm interested in extracting intron sequences (and maybe flanking > exons) for certain genes. I have been trying to get the introns via > the sequence features ('CDS' or 'gene'), but this has not been > working. Which approach will I have to take? > I'd be very grateful if you could point me into the right direction! > > Hope things are going well in Durham! And thank you in advance! > > Stefanie > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From downloadondemand at gmail.com Sun Nov 4 13:39:42 2007 From: downloadondemand at gmail.com (download on demand) Date: Sun, 4 Nov 2007 20:39:42 +0200 Subject: [Bioperl-l] Help with Bio::SeqIO Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Hi to all. I have a problem with a simplest script: use Bio::SeqIO; # get command-line arguments, or die with a usage statement my $usage = "x2y.pl infile infileformat outfile outfileformat\n"; my $infile = shift or die $usage; my $infileformat = shift or die $usage; # my $outfile = shift or die $usage; my $outfileformat = shift or die $usage; # create one SeqIO object to read in,and another to write out my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, '-format' => $outfileformat); # write each entry in the input file to the output file while (my $inseq = $seq_in->next_seq) { # $seq_out->write_seq($inseq); # Whole sequence not needed for my $feat_object ($inseq->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->get_tag_values('product'),"\n"; print $feat_object->location->start,"..",$feat_object->location->end,"\n"; print $feat_object->spliced_seq->seq,"\n\n"; } } The result seems OK to me, but in case of first CDS of NC_005213.gbk from here the output is wrong: It is: hypothetical protein 1..490885 TAAATGCGATTGCTATTAGAA..................................Truncated sequence................................... Should be: hypothetical protein 879..490883 ATGCGATTGCTATTAGAA...................................Truncated sequence....................................TAA This CDS have an unnatural location string: CDS complement(join(490883..490885,1..879)), but spliced_seq should handle these things? Please help me! Best regards, N. From cjfields at uiuc.edu Sun Nov 4 19:08:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 4 Nov 2007 18:08:34 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Pass in (-nosort => 1) to spliced_seq: print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; This ensures no sorting of sublocations occurs, if you want for instance typical GenBank/EMBL 'join' behavior. To the other devs: shouldn't -nosort be the default behavior when the split location is a 'join'? In other words, should spliced_seq() be modified to take into account the split location type when returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly indicates the order of the sequences is important when joined together; the current behavior is more like that for 'order'. chris On Nov 4, 2007, at 12:39 PM, download on demand wrote: > Hi to all. > > I have a problem with a simplest script: > > > > use Bio::SeqIO; > # get command-line arguments, or die with a usage statement > my $usage = "x2y.pl infile infileformat outfile > outfileformat\n"; > my $infile = shift or die $usage; > my $infileformat = shift or die $usage; > # my $outfile = shift or die $usage; > my $outfileformat = shift or die $usage; > > # create one SeqIO object to read in,and another to write out > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, > '-format' => $outfileformat); > > # write each entry in the input file to the output file > while (my $inseq = $seq_in->next_seq) { > > # $seq_out->write_seq($inseq); # Whole sequence not needed > > for my $feat_object ($inseq->get_SeqFeatures) > { > if ($feat_object->primary_tag eq "CDS") > { > print $feat_object->get_tag_values('product'),"\n"; > print > $feat_object->location->start,"..",$feat_object->location->end,"\n"; > print $feat_object->spliced_seq->seq,"\n\n"; > } > } > > > > The result seems OK to me, but in case of first CDS of > NC_005213.gbk from > here > the > output is wrong: > > It is: > hypothetical protein > 1..490885 > TAAATGCGATTGCTATTAGAA..................................Truncated > sequence................................... > > Should be: > hypothetical protein > 879..490883 > ATGCGATTGCTATTAGAA...................................Truncated > sequence....................................TAA > > > > This CDS have an unnatural location string: > CDS complement(join(490883..490885,1..879)), but > spliced_seq > should handle these things? > > Please help me! > Best regards, N. > _______________________________________________ > From jean-luc.jany at univ-brest.fr Mon Nov 5 03:26:52 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Mon, 05 Nov 2007 09:26:52 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <472ED3CC.2050305@univ-brest.fr> Dear Bioperl and Mac users, I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables. I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?) Actually, my blast file is in myname directory and comprises a /bin and a /data file. I have got my blastall and other executables in myname/blast/bin/blastall. Thank you in anticipation for your help. Jean-Luc From Rohit.Ghai at mikrobio.med.uni-giessen.de Mon Nov 5 06:36:16 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Mon, 05 Nov 2007 12:36:16 +0100 Subject: [Bioperl-l] bioperl and emboss on windows Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing # # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. # # # # however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; From neetisomaiya at gmail.com Mon Nov 5 07:20:04 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 5 Nov 2007 17:50:04 +0530 Subject: [Bioperl-l] perl question Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Again a perl question, and maybe a very trivial one. How do I terminate a number like 3.1232010098 to only 3 decimal places in perl? -- -Neeti Even my blood says, B positive From biology0046 at hotmail.com Mon Nov 5 07:16:13 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Mon, 05 Nov 2007 12:16:13 +0000 Subject: [Bioperl-l] how to extract intron information from gff files. Message-ID: Dear all: i got a poplar genome gff file like this: LG_I src exon 2598 3280 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 2598 3280 . - 0 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 4 LG_I src start_codon 3278 3280 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src stop_codon 2598 2600 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src exon 3544 3918 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 3544 3918 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 3 LG_I src exon 4258 4740 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 4258 4740 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 2 LG_I src exon 5344 6388 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 5344 6388 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 1 LG_I src exon 8259 8528 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8259 8528 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 3 LG_I src stop_codon 8259 8261 . - 0 name "fgenesh1_pg.C_LG_I000002" LG_I src exon 8897 8987 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8897 8987 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 2 LG_I src exon 9831 9892 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 9831 9892 . - 1 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 1 LG_I src start_codon 9890 9892 . - 0 name "fgenesh1_pg.C_LG_I000002" I try to use Bio::DB::GFF, but this module only applies to methods given in the gff file. what i want to get is "intron, 5utr, 3utr", but this information do not contain in this gff file. how can i get these information through bioperl? This file do not contain intron information if i consider gaps between exons as introns, non cds parts of the first and last exon as utrs, how can i extract them through this gff file. Thanks~~ Wenkai _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From spiros at lokku.com Mon Nov 5 07:36:36 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Mon, 5 Nov 2007 12:36:36 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: Hey, use the `sprintf` function. More information can be found at , http://perldoc.perl.org/functions/sprintf.html. For more proper rounding, you could use the Math::Round module, http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm. hope this helps, spiros On 11/5/07, neeti somaiya wrote: > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ak at ebi.ac.uk Mon Nov 5 07:43:06 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 12:43:06 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <20071105124305.GC4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? When displaying: printf( "The number is %.3f\n", $number ); When making a string: my $string = sprintf( "%.3f", $number ); BTW, this is cutting, not rounding. Cheers, Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From t.nugent at cs.ucl.ac.uk Mon Nov 5 07:37:15 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 05 Nov 2007 12:37:15 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F0E7B.60303@cs.ucl.ac.uk> Use Math:Round and nearest_ceil: http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From bix at sendu.me.uk Mon Nov 5 07:47:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 12:47:17 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F10D5.5060006@sendu.me.uk> neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? Please don't use this list to ask general Perl questions. See these instead: http://perldoc.perl.org/perlfaq4.html http://lists.cpan.org/ http://www.perlmonks.org/ $rounded = sprintf("%.3f", $number); From Marc.Logghe at DEVGEN.com Mon Nov 5 07:39:36 2007 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Mon, 5 Nov 2007 13:39:36 +0100 Subject: [Bioperl-l] perl question References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com> Hi, Have a look at http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w idth In your particular case: my $f = 3.1232010098; printf "%0.3f", $f; HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > neeti somaiya > Sent: Monday, November 05, 2007 1:20 PM > To: bioperl-l > Subject: [Bioperl-l] perl question > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 > decimal places in perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Mon Nov 5 08:24:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 13:24:25 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <20071105124305.GC4491@ebi.ac.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> Message-ID: <472F1989.90105@sendu.me.uk> Andreas Kahari wrote: > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: >> Again a perl question, and maybe a very trivial one. >> How do I terminate a number like 3.1232010098 to only 3 decimal places in >> perl? > > When displaying: > > printf( "The number is %.3f\n", $number ); > > When making a string: > > my $string = sprintf( "%.3f", $number ); > > > BTW, this is cutting, not rounding. (s)printf rounds (ie. doesn't simply truncate), though for critical applications you should use your own rounding algorithm. From ak at ebi.ac.uk Mon Nov 5 08:56:24 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 13:56:24 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <472F1989.90105@sendu.me.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk> Message-ID: <20071105135624.GD4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote: > Andreas Kahari wrote: > > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > >> Again a perl question, and maybe a very trivial one. > >> How do I terminate a number like 3.1232010098 to only 3 decimal places in > >> perl? > > > > When displaying: > > > > printf( "The number is %.3f\n", $number ); > > > > When making a string: > > > > my $string = sprintf( "%.3f", $number ); > > > > > > BTW, this is cutting, not rounding. > > (s)printf rounds (ie. doesn't simply truncate), though for critical > applications you should use your own rounding algorithm. They do indeed. Mea culpa. Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From jay at jays.net Mon Nov 5 10:14:17 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 10:14:17 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > To the other devs: shouldn't -nosort be the default behavior when the > split location is a 'join'? I certainly think so. > In other words, should spliced_seq() be > modified to take into account the split location type when returning > sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly > indicates the order of the sequences is important when joined > together; the current behavior is more like that for 'order'. I don't see any value to the sorting algorithm. All tests invoke - nosort => 1 (except a phase test where nosort doesn't matter anyway). In my limited experience the sorting only serves to break real-world splicing. If there is no valid use then we can remove ~20 lines from SeqFeatureI.pm circa line 505. If there is a valid use and someone would be so kind as to educate me I'd be happy to add tests which demonstrate them. :) P.S. CSHL is neato. I plan on understanding some of this stuff some day. :) j http://www.bioperl.org/wiki/User:Jhannah From hlapp at duke.edu Mon Nov 5 11:03:16 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 11:03:16 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: I agree that there should be a meaningful default that results in "doing the right thing" in most cases if the user doesn't intervene. I'm not sure I understand all the details, but it sounds sorting or not sorting should depend on the split location type unless the user overrides it by argument. That's what you're suggesting, right? -hilmar On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > Pass in (-nosort => 1) to spliced_seq: > > print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; > > This ensures no sorting of sublocations occurs, if you want for > instance typical GenBank/EMBL 'join' behavior. > > To the other devs: shouldn't -nosort be the default behavior when > the split location is a 'join'? In other words, should spliced_seq > () be modified to take into account the split location type when > returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' > explicitly indicates the order of the sequences is important when > joined together; the current behavior is more like that for 'order'. > > chris > > On Nov 4, 2007, at 12:39 PM, download on demand wrote: > >> Hi to all. >> >> I have a problem with a simplest script: >> >> >> >> use Bio::SeqIO; >> # get command-line arguments, or die with a usage statement >> my $usage = "x2y.pl infile infileformat outfile >> outfileformat\n"; >> my $infile = shift or die $usage; >> my $infileformat = shift or die $usage; >> # my $outfile = shift or die $usage; >> my $outfileformat = shift or die $usage; >> >> # create one SeqIO object to read in,and another to write >> out >> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >> '-format' => $infileformat); >> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >> '-format' => $outfileformat); >> >> # write each entry in the input file to the output file >> while (my $inseq = $seq_in->next_seq) { >> >> # $seq_out->write_seq($inseq); # Whole sequence not needed >> >> for my $feat_object ($inseq->get_SeqFeatures) >> { >> if ($feat_object->primary_tag eq "CDS") >> { >> print $feat_object->get_tag_values('product'),"\n"; >> print >> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >> print $feat_object->spliced_seq->seq,"\n\n"; >> } >> } >> >> >> >> The result seems OK to me, but in case of first CDS of >> NC_005213.gbk from >> here > Nanoarchaeum_equitans/> the >> output is wrong: >> >> It is: >> hypothetical protein >> 1..490885 >> TAAATGCGATTGCTATTAGAA..................................Truncated >> sequence................................... >> >> Should be: >> hypothetical protein >> 879..490883 >> ATGCGATTGCTATTAGAA...................................Truncated >> sequence....................................TAA >> >> >> >> This CDS have an unnatural location string: >> CDS complement(join(490883..490885,1..879)), but >> spliced_seq >> should handle these things? >> >> Please help me! >> Best regards, N. >> _______________________________________________ >> > > > From bernd.web at gmail.com Mon Nov 5 11:53:01 2007 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 5 Nov 2007 17:53:01 +0100 Subject: [Bioperl-l] PSI-BLAST Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com> Hi, Is it possible with SearchIO to select a specific iteration (Results from round i) part of the PSI-blast report, when parsing this with SearchIO::blast? It seems the parser parses the complete report. If not implemented I could of course extract the specific part of the psi-blast report and then give it too SearchIO (e.g. with IO::String), but maybe I am missing a built-in option? Regards, Bernd From jay at jays.net Mon Nov 5 11:54:13 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 11:54:13 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? If someone knows why spliced_seq() should ever sort then I'm suggesting we add a test demonstrating a useful example of that. If no one has a useful example of when you would want spliced_seq() to sort then I'm suggesting we remove the sorting altogether and nosort goes away. I can provide/add many examples where sorting is bad. I do not know of a case where sorting is good. j http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Mon Nov 5 12:07:10 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Nov 2007 12:07:10 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: At one point the location order was not respected/saved I believe. I guess we will just assume the user will build up a SplitLocation in order (i.e. add_SubLocation). I'll try and remember if there were any other particular reasons. -jason On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar > > On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > >> Pass in (-nosort => 1) to spliced_seq: >> >> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >> >> This ensures no sorting of sublocations occurs, if you want for >> instance typical GenBank/EMBL 'join' behavior. >> >> To the other devs: shouldn't -nosort be the default behavior when >> the split location is a 'join'? In other words, should spliced_seq >> () be modified to take into account the split location type when >> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >> explicitly indicates the order of the sequences is important when >> joined together; the current behavior is more like that for 'order'. >> >> chris >> >> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >> >>> Hi to all. >>> >>> I have a problem with a simplest script: >>> >>> >>> >>> use Bio::SeqIO; >>> # get command-line arguments, or die with a usage statement >>> my $usage = "x2y.pl infile infileformat outfile >>> outfileformat\n"; >>> my $infile = shift or die $usage; >>> my $infileformat = shift or die $usage; >>> # my $outfile = shift or die $usage; >>> my $outfileformat = shift or die $usage; >>> >>> # create one SeqIO object to read in,and another to write >>> out >>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>> '-format' => $infileformat); >>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>> '-format' => $outfileformat); >>> >>> # write each entry in the input file to the output file >>> while (my $inseq = $seq_in->next_seq) { >>> >>> # $seq_out->write_seq($inseq); # Whole sequence not >>> needed >>> >>> for my $feat_object ($inseq->get_SeqFeatures) >>> { >>> if ($feat_object->primary_tag eq "CDS") >>> { >>> print $feat_object->get_tag_values('product'),"\n"; >>> print >>> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >>> print $feat_object->spliced_seq->seq,"\n\n"; >>> } >>> } >>> >>> >>> >>> The result seems OK to me, but in case of first CDS of >>> NC_005213.gbk from >>> here >> Nanoarchaeum_equitans/> the >>> output is wrong: >>> >>> It is: >>> hypothetical protein >>> 1..490885 >>> TAAATGCGATTGCTATTAGAA..................................Truncated >>> sequence................................... >>> >>> Should be: >>> hypothetical protein >>> 879..490883 >>> ATGCGATTGCTATTAGAA...................................Truncated >>> sequence....................................TAA >>> >>> >>> >>> This CDS have an unnatural location string: >>> CDS complement(join(490883..490885,1..879)), but >>> spliced_seq >>> should handle these things? >>> >>> Please help me! >>> Best regards, N. >>> _______________________________________________ >>> >> >> >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Mon Nov 5 12:16:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:16:10 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Yes, we would sort based on the splittype() and default to a particular behavior ('join') if one isn't designated, maybe with a warning indicating the splittype() isn't defined. Using an 'order' or other defined types could also delineate a default sort/nosort behavior (probably the previous as it would replicate prior behavior). chris On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar From cjfields at uiuc.edu Mon Nov 5 12:20:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:20:35 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu> On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote: > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? > > If someone knows why spliced_seq() should ever sort then I'm > suggesting we add a test demonstrating a useful example of that. > > If no one has a useful example of when you would want spliced_seq() > to sort then I'm suggesting we remove the sorting altogether and > nosort goes away. > > I can provide/add many examples where sorting is bad. I do not know > of a case where sorting is good. > > j > http://www.bioperl.org/wiki/User:Jhannah The behavior would be based on the current use of 'join', 'order', and 'bond' (the latter in GenPept records). I documented some cases here a while back: http://www.bioperl.org/wiki/BioPerl_Locations#Split chris From hlapp at duke.edu Mon Nov 5 12:32:24 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 12:32:24 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu> Sounds good to me. -hilmar On Nov 5, 2007, at 12:16 PM, Chris Fields wrote: > Yes, we would sort based on the splittype() and default to a > particular behavior ('join') if one isn't designated, maybe with a > warning indicating the splittype() isn't defined. Using an 'order' > or other defined types could also delineate a default sort/nosort > behavior (probably the previous as it would replicate prior behavior). > > chris > > On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 12:41:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:41:27 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: It may have something to do with remote locations or setting strand() in sublocations. This may have popped up in relation to a LocationI code audit I proposed a while back on the list which I never got around to. Oh well... I at least managed getting a wiki page started in case we decided to make changes, with the intention of making it a HOWTO at some point: http://www.bioperl.org/wiki/BioPerl_Locations If we go through with the changes to spliced_seq(), should it be implemented for inclusion in v1.6 or wait until v1.7? chris On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote: > > At one point the location order was not respected/saved I believe. > I guess we will just assume the user will build up a SplitLocation > in order (i.e. add_SubLocation). I'll try and remember if there > were any other particular reasons. > > > -jason > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar >> >> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: >> >>> Pass in (-nosort => 1) to spliced_seq: >>> >>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >>> >>> This ensures no sorting of sublocations occurs, if you want for >>> instance typical GenBank/EMBL 'join' behavior. >>> >>> To the other devs: shouldn't -nosort be the default behavior when >>> the split location is a 'join'? In other words, should spliced_seq >>> () be modified to take into account the split location type when >>> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >>> explicitly indicates the order of the sequences is important when >>> joined together; the current behavior is more like that for 'order'. >>> >>> chris >>> >>> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >>> >>>> Hi to all. >>>> >>>> I have a problem with a simplest script: >>>> >>>> >>>> >>>> use Bio::SeqIO; >>>> # get command-line arguments, or die with a usage >>>> statement >>>> my $usage = "x2y.pl infile infileformat outfile >>>> outfileformat\n"; >>>> my $infile = shift or die $usage; >>>> my $infileformat = shift or die $usage; >>>> # my $outfile = shift or die $usage; >>>> my $outfileformat = shift or die $usage; >>>> >>>> # create one SeqIO object to read in,and another to write >>>> out >>>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>>> '-format' => $infileformat); >>>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>>> '-format' => >>>> $outfileformat); >>>> >>>> # write each entry in the input file to the output file >>>> while (my $inseq = $seq_in->next_seq) { >>>> >>>> # $seq_out->write_seq($inseq); # Whole sequence not >>>> needed >>>> >>>> for my $feat_object ($inseq->get_SeqFeatures) >>>> { >>>> if ($feat_object->primary_tag eq "CDS") >>>> { >>>> print $feat_object->get_tag_values('product'),"\n"; >>>> print >>>> $feat_object->location->start,"..",$feat_object->location- >>>> >end,"\n"; >>>> print $feat_object->spliced_seq->seq,"\n\n"; >>>> } >>>> } >>>> >>>> >>>> >>>> The result seems OK to me, but in case of first CDS of >>>> NC_005213.gbk from >>>> here >>> Nanoarchaeum_equitans/> the >>>> output is wrong: >>>> >>>> It is: >>>> hypothetical protein >>>> 1..490885 >>>> TAAATGCGATTGCTATTAGAA..................................Truncated >>>> sequence................................... >>>> >>>> Should be: >>>> hypothetical protein >>>> 879..490883 >>>> ATGCGATTGCTATTAGAA...................................Truncated >>>> sequence....................................TAA >>>> >>>> >>>> >>>> This CDS have an unnatural location string: >>>> CDS complement(join(490883..490885,1..879)), but >>>> spliced_seq >>>> should handle these things? >>>> >>>> Please help me! >>>> Best regards, N. >>>> _______________________________________________ >>>> >>> >>> >>> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Mon Nov 5 11:05:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 05 Nov 2007 12:05:41 -0400 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: <472ED3CC.2050305@univ-brest.fr> Message-ID: Jean-luc, >From what you written it sounds like you're using bash and not some other shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file in your home directory, as well as a .ncbirc file. This should work. I'm no Unix expert but I've always configured tcsh on the Mac in the same ways I'd configure it on Linux machines. Similarly, if you're using bash then it will read its .bashrc file, regardless of what flavor of Unix you use (and the same thing holds true for zsh or csh or ...). Brian O. On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > Dear Bioperl and Mac users, > > I am a Mac user and would like to run a script I made using > Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate > to Bioperl the pathway to Blastall and other executables. > > I read carefully the following link > http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the > path to Blast, but I guess the way to proceed is slightly different in Mac and > that I should not create .ncbirc and .bashrc files (e.g. should I modify the > .profile file instead of .bashrc?) > > Actually, my blast file is in myname directory and comprises a /bin and a > /data file. I have got my blastall and other executables in > myname/blast/bin/blastall. > > Thank you in anticipation for your help. > > Jean-Luc > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Nov 5 13:35:56 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 05 Nov 2007 12:35:56 -0600 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: References: Message-ID: <472F628C.2000506@campus.iztacala.unam.mx> If the ~/.bashrc file doesn't work for you, try renaming it to ~/.bash_profile and re-login, that might work best. ~/.bashrc works as an individual per-interactive-shell startup file, whereas ~/.bash_profile is a personal initialization file, executed for login shells. Hope this helps. Regards, Mauricio. Brian Osborne wrote: > Jean-luc, > >>From what you written it sounds like you're using bash and not some other > shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file > in your home directory, as well as a .ncbirc file. This should work. > > I'm no Unix expert but I've always configured tcsh on the Mac in the same > ways I'd configure it on Linux machines. Similarly, if you're using bash > then it will read its .bashrc file, regardless of what flavor of Unix you > use (and the same thing holds true for zsh or csh or ...). > > Brian O. > > > On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > >> Dear Bioperl and Mac users, >> >> I am a Mac user and would like to run a script I made using >> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate >> to Bioperl the pathway to Blastall and other executables. >> >> I read carefully the following link >> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the >> path to Blast, but I guess the way to proceed is slightly different in Mac and >> that I should not create .ncbirc and .bashrc files (e.g. should I modify the >> .profile file instead of .bashrc?) >> >> Actually, my blast file is in myname directory and comprises a /bin and a >> /data file. I have got my blastall and other executables in >> myname/blast/bin/blastall. >> >> Thank you in anticipation for your help. >> >> Jean-Luc >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at duke.edu Mon Nov 5 16:04:11 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 16:04:11 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > If we go through with the changes to spliced_seq(), should it be > implemented for inclusion in v1.6 or wait until v1.7? I would say they should be implemented ASAP because they 1) should not change behavior for those for which the current default behavior was already broken (and who therefore pass in --no_sort), and 2) fix the behavior for those who erroneously assumed that the code was going to do the right thing by default. I.e., it sounds mostly like a bugfix to me. Am I overlooking something? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 17:12:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 16:12:23 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu> On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote: > > On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > >> If we go through with the changes to spliced_seq(), should it be >> implemented for inclusion in v1.6 or wait until v1.7? > > I would say they should be implemented ASAP because they 1) should > not change behavior for those for which the current default > behavior was already broken (and who therefore pass in --no_sort), > and 2) fix the behavior for those who erroneously assumed that the > code was going to do the right thing by default. > > I.e., it sounds mostly like a bugfix to me. Am I overlooking > something? > > -hilmar > -- Okay; I'll try to get this in soon. chris From jean-luc.jany at univ-brest.fr Tue Nov 6 04:00:07 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Tue, 06 Nov 2007 10:00:07 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <47302D17.2030500@univ-brest.fr> Thanks Brian. Yes I use bash. I am going to follow your advice as soon as possible (for some reasons I am unable to run bioperl) and come back to you to tell you if it runs. Jean-Luc From jason at bioperl.org Tue Nov 6 16:18:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 16:18:35 -0500 Subject: [Bioperl-l] lightweight sequence features Message-ID: I started a branch for implementing and playing with lightweight feature object. The branch is called 'lightweight_feature_branch'. Right now it is about 70% faster just in object creation based on parsing features using Bio::Tools::GFF and swapping the types of features that are created. It uses arrays instead of hashes under the hood. So the objects don't have locations under the hood. My hope is if this works okay we could use it for creating objects where we KNOW the underlying features have simple locations so such as parsing in GFF data. -jason -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Tue Nov 6 16:57:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Nov 2007 15:57:17 -0600 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: References: Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Bravo! I once benchmarked Location instance creation once and found it contributed quite a bit of overhead so the speedup with that and the use of arrays makes quite a bit of sense to me. You mention only simple locations; I'm guessing this doesn't handle 'fuzzy' ends? If it did I could see layering the feature data from the get-go, so it could be used just about anywhere in the place of SF::Generic. Maybe something to test out in 1.7? chris On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > I started a branch for implementing and playing with lightweight > feature object. The branch is called 'lightweight_feature_branch'. > > Right now it is about 70% faster just in object creation based on > parsing features using Bio::Tools::GFF and swapping the types of > features that are created. It uses arrays instead of hashes under > the hood. > > So the objects don't have locations under the hood. My hope is if > this works okay we could use it for creating objects where we KNOW > the underlying features have simple locations so such as parsing in > GFF data. > > -jason > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue Nov 6 23:14:55 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 23:14:55 -0500 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> References: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Message-ID: Right - only for simple locations. I've got a bunch more tests and fixes to put in. I am hoping this can be fast replacement in the case where we're dealing with this "unflattened" data (i.e. GFF in FeatureIO & Gbrowse). This is sort of a playground until I feel like it can really get it tested a bit more. I'll give an all clear when the dust settles in terms of the design if anyone wants to play/help. -jason On Nov 6, 2007, at 4:57 PM, Chris Fields wrote: > Bravo! I once benchmarked Location instance creation once and > found it contributed quite a bit of overhead so the speedup with > that and the use of arrays makes quite a bit of sense to me. > > You mention only simple locations; I'm guessing this doesn't handle > 'fuzzy' ends? If it did I could see layering the feature data from > the get-go, so it could be used just about anywhere in the place of > SF::Generic. Maybe something to test out in 1.7? > > chris > > On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > >> I started a branch for implementing and playing with lightweight >> feature object. The branch is called 'lightweight_feature_branch'. >> >> Right now it is about 70% faster just in object creation based on >> parsing features using Bio::Tools::GFF and swapping the types of >> features that are created. It uses arrays instead of hashes under >> the hood. >> >> So the objects don't have locations under the hood. My hope is if >> this works okay we could use it for creating objects where we KNOW >> the underlying features have simple locations so such as parsing in >> GFF data. >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki at sanbi.ac.za Wed Nov 7 05:05:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 7 Nov 2007 12:05:59 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Mdust Message-ID: <200711071205.59576.heikki@sanbi.ac.za> Hi Donald, I started using your Mdust module in bioperl-run and run into problems immediately. * Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects, although the docs say otherwise * Sequences are modified in place. That is really bad, because that means that the user has to know to create a copy before running Mdust on it. * The docs say that you have to set MDUSTDIR envvar to tell the program where to find the binary. That is actually optional if the binary is on your path. * The tests do not cover any of the options to the program As a quick fix, I suggest that we: * leave the current way of working for Bio::SeqI objects: sequence string is not masked but seqfeatures to that effect are added * Modify run() to return the new masked sequence object when the target is a Bio::PrimarySeqI. * fix the documentation After that it will be possible to simply write: use Bio::Tools::Run::Mdust; $mdust = Bio::Tools::Run::Mdust->new(); $seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI); Are you happy for me to do this or do you want to do it yourself? Yours, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho _/_/_/_/_/ heikki at_sanbi _ac _za skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Kevin.M.Brown at asu.edu Wed Nov 7 13:04:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 7 Nov 2007 11:04:50 -0700 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> I installed bioperl-ext from CVS, but can't figure out what else is missing to utilize Bio::Tools::pSW. The error I get from the example script in the wiki is: The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. Compilation failed in require at ./align_test.pl line 3. BEGIN failed--compilation aborted at ./align_test.pl line 3. In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called Align, but no Align.pm file. I followed the directions in the wiki to install 1.5.2_102 (think I had _100 installed previously). Any thoughts on what I'm missing? From jason at bioperl.org Wed Nov 7 14:52:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 14:52:16 -0500 Subject: [Bioperl-l] (no subject) Message-ID: The array-based Bio::SeqFeature::Slim is only about 7% faster than Bio::Graphics::Feature so I suspect most of the speedup comes from removing location objects. Generic 6.75 -- -37% -41% GraphicsF 4.26 58% -- -7% Slim 3.98 70% 7% -- this is using code on the lightweight_feature_branch so cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r lightweight_feature_branch -d core_lwf bioperl-live http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl and the GFF3 file I used to parse http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 -jason From lstein at cshl.edu Wed Nov 7 15:04:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Nov 2007 15:04:24 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: References: Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> I wonder if it is worth moving to the array-based version more generally, then. How does the array based feature object deal with tags? Lincoln On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > The array-based Bio::SeqFeature::Slim is only about 7% faster than > Bio::Graphics::Feature so I suspect most of the speedup comes from removing > location objects. > > Generic 6.75 -- -37% -41% > GraphicsF 4.26 58% -- -7% > Slim 3.98 70% 7% -- > > this is using code on the lightweight_feature_branch so > cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r > lightweight_feature_branch -d core_lwf bioperl-live > > http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl > and the GFF3 file I used to parse > http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 > > -jason > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Wed Nov 7 15:09:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 15:09:35 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> It uses hashes there so technically it is not entirely array based. -jason On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > I wonder if it is worth moving to the array-based version more > generally, > then. > > How does the array based feature object deal with tags? > > Lincoln > > On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > >> The array-based Bio::SeqFeature::Slim is only about 7% faster than >> Bio::Graphics::Feature so I suspect most of the speedup comes from >> removing >> location objects. >> >> Generic 6.75 -- -37% -41% >> GraphicsF 4.26 58% -- -7% >> Slim 3.98 70% 7% -- >> >> this is using code on the lightweight_feature_branch so >> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >> lightweight_feature_branch -d core_lwf bioperl-live >> >> http://jason.open-bio.org/~jason/bioperl/ >> seqfeature_speed.pl> seqfeature_speed.pl> >> and the GFF3 file I used to parse >> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >> >> -jason >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Nov 7 16:12:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 15:12:35 -0600 Subject: [Bioperl-l] (no subject) In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> I can see preferring a lightweight simple SF over SF::Generic in the next BioPerl dev cycle. I guess we would just layer split locations as simple sub-features/segments, typing when necessary? That shouldn't be much more overhead than creating a layered Location::Split. chris On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > It uses hashes there so technically it is not entirely array based. > > -jason > On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > >> I wonder if it is worth moving to the array-based version more >> generally, >> then. >> >> How does the array based feature object deal with tags? >> >> Lincoln >> >> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >> >>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>> removing >>> location objects. >>> >>> Generic 6.75 -- -37% -41% >>> GraphicsF 4.26 58% -- -7% >>> Slim 3.98 70% 7% -- >>> >>> this is using code on the lightweight_feature_branch so >>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>> lightweight_feature_branch -d core_lwf bioperl-live >>> >>> http://jason.open-bio.org/~jason/bioperl/ >>> seqfeature_speed.pl>> seqfeature_speed.pl> >>> and the GFF3 file I used to parse >>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>> >>> -jason >>> >> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Wed Nov 7 18:19:15 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 7 Nov 2007 18:19:15 -0500 Subject: [Bioperl-l] lightweight features In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: It seems to me that there are applications where you're dealing with a huge number of features (such as GFF) and where therefore a lightweight object makes tremendous sense. But when you parse a genbank file, I'm not sure that's the bottleneck, unless maybe it's a large contig with lots of feature annotations. I guess we'll ultimately want a way to control the type of feature being instantiated by a parser, e..g using a factory. -hilmar On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > I can see preferring a lightweight simple SF over SF::Generic in the > next BioPerl dev cycle. I guess we would just layer split locations > as simple sub-features/segments, typing when necessary? That > shouldn't be much more overhead than creating a layered > Location::Split. > > chris > > On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > >> It uses hashes there so technically it is not entirely array based. >> >> -jason >> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >> >>> I wonder if it is worth moving to the array-based version more >>> generally, >>> then. >>> >>> How does the array based feature object deal with tags? >>> >>> Lincoln >>> >>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>> >>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>> removing >>>> location objects. >>>> >>>> Generic 6.75 -- -37% -41% >>>> GraphicsF 4.26 58% -- -7% >>>> Slim 3.98 70% 7% -- >>>> >>>> this is using code on the lightweight_feature_branch so >>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>>> lightweight_feature_branch -d core_lwf bioperl-live >>>> >>>> http://jason.open-bio.org/~jason/bioperl/ >>>> seqfeature_speed.pl>>> seqfeature_speed.pl> >>>> and the GFF3 file I used to parse >>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>> >>>> -jason >>>> >>> >>> >>> >>> -- >>> Lincoln D. Stein >>> Cold Spring Harbor Laboratory >>> 1 Bungtown Road >>> Cold Spring Harbor, NY 11724 >>> (516) 367-8380 (voice) >>> (516) 367-8389 (fax) >>> FOR URGENT MESSAGES & SCHEDULING, >>> PLEASE CONTACT MY ASSISTANT, >>> SANDRA MICHELSEN, AT michelse at cshl.edu >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Nov 7 20:04:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 19:04:05 -0600 Subject: [Bioperl-l] lightweight features In-Reply-To: References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: I'm also thinking a factory is a good possibility; maybe something to take the place of FTHelper. chris On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote: > It seems to me that there are applications where you're dealing with > a huge number of features (such as GFF) and where therefore a > lightweight object makes tremendous sense. But when you parse a > genbank file, I'm not sure that's the bottleneck, unless maybe it's a > large contig with lots of feature annotations. > > I guess we'll ultimately want a way to control the type of feature > being instantiated by a parser, e..g using a factory. > > -hilmar > > On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > >> I can see preferring a lightweight simple SF over SF::Generic in the >> next BioPerl dev cycle. I guess we would just layer split locations >> as simple sub-features/segments, typing when necessary? That >> shouldn't be much more overhead than creating a layered >> Location::Split. >> >> chris >> >> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: >> >>> It uses hashes there so technically it is not entirely array based. >>> >>> -jason >>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >>> >>>> I wonder if it is worth moving to the array-based version more >>>> generally, >>>> then. >>>> >>>> How does the array based feature object deal with tags? >>>> >>>> Lincoln >>>> >>>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>>> >>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>>> removing >>>>> location objects. >>>>> >>>>> Generic 6.75 -- -37% -41% >>>>> GraphicsF 4.26 58% -- -7% >>>>> Slim 3.98 70% 7% -- >>>>> >>>>> this is using code on the lightweight_feature_branch so >>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl >>>>> co -r >>>>> lightweight_feature_branch -d core_lwf bioperl-live >>>>> >>>>> http://jason.open-bio.org/~jason/bioperl/ >>>>> seqfeature_speed.pl>>>> seqfeature_speed.pl> >>>>> and the GFF3 file I used to parse >>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>>> >>>>> -jason >>>>> >>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Cold Spring Harbor Laboratory >>>> 1 Bungtown Road >>>> Cold Spring Harbor, NY 11724 >>>> (516) 367-8380 (voice) >>>> (516) 367-8389 (fax) >>>> FOR URGENT MESSAGES & SCHEDULING, >>>> PLEASE CONTACT MY ASSISTANT, >>>> SANDRA MICHELSEN, AT michelse at cshl.edu >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 7 23:45:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 22:45:26 -0600 Subject: [Bioperl-l] test please ignore Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> From cjfields at uiuc.edu Thu Nov 8 10:50:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Nov 2007 09:50:02 -0600 Subject: [Bioperl-l] test please ignore In-Reply-To: <47332534.5090205@bms.com> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> <47332534.5090205@bms.com> Message-ID: And respond back! Just checking the mail list; the open-bio wiki pages were down last night. chris On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote: > Chris Fields wrote: >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > This is the best way to make everyone open this e-mail ;-) > Stefan From stefan.kirov at bms.com Thu Nov 8 10:03:16 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 08 Nov 2007 10:03:16 -0500 Subject: [Bioperl-l] test please ignore In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> Message-ID: <47332534.5090205@bms.com> Chris Fields wrote: > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > This is the best way to make everyone open this e-mail ;-) Stefan From Kevin.M.Brown at asu.edu Thu Nov 8 17:30:24 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Nov 2007 15:30:24 -0700 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org> References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> <20071108003638.GA5892@eniac.jgi-psf.org> Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu> OK, found the issue. For whatever reason the Align.pm file is inside the Align folder and so the package name and path don't match up once it is installed. This would cause it to have a name of "Bio::Ext::Align::Align" instead of "Bio::Ext::Align". Not sure why this wasn't caught when I did "perl Makefile.pl && make && make test && make install" > -----Original Message----- > From: Joel Martin [mailto:j_martin at lbl.gov] > Sent: Wednesday, November 07, 2007 5:37 PM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Ext::Align? > > Hello, > Might be a side effect of fixing the other bioperl-ext package, > what steps exactly did this entail: > > > I installed bioperl-ext from CVS, > > ? > > you can probably bypass it at the moment by doing this after > unpacking the > bioperl-ext package > > cd Bio/Ext/Align > perl Makefile.PL > make > make test > make install > > and > > cd Bio/Ext/HMM > perl Makefile.PL > make > make test > make install > > Joel > > but can't figure out what else is > > missing to utilize Bio::Tools::pSW. The error I get from > the example > > script in the wiki is: > > > > The C-compiled engine for Smith Waterman alignments > (Bio::Ext::Align) > > has not been installed. > > Please read the install the bioperl-ext package > > > > BEGIN failed--compilation aborted at > > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. > > Compilation failed in require at ./align_test.pl line 3. > > BEGIN failed--compilation aborted at ./align_test.pl line 3. > > > > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called > > Align, but no Align.pm file. > > > > I followed the directions in the wiki to install 1.5.2_102 > (think I had > > _100 installed previously). Any thoughts on what I'm missing? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From akarger at CGR.Harvard.edu Fri Nov 9 09:53:02 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 9 Nov 2007 09:53:02 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? Message-ID: When I tblastn ENSP00000349467 against the human genome, I get a few hits on chr10, among which are: Score = 192 bits (487), Expect(2) = 5e-64 Identities = 99/109 (90%), Positives = 99/109 (90%) Frame = +2 Query: 40 LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99 L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F VFDKDGNG Sbjct: 71593562 LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741 Query: 100 YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148 YIS EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA 71593885 Score = 75.1 bits (183), Expect(2) = 5e-64 Identities = 36/43 (83%), Positives = 39/43 (90%) Frame = +1 Query: 1 MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43 MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS ++ Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575 As you can see from Sbjct lines, these two hits are basically contiguous. I was surprised to see that the bit scores and identities and alignment lengths here are totally different but the expectation values are identical. After a bit of grepping in the BLAST source, I found reference to "sum segments" and "a collection [of] multiple distinct alignments with asymmetric gaps between the alignments" and decided it was time to cry for help. When does BLAST decide that two or more alignments belong "together" and how does the affect the evalue? Is the evalue really showing how good those two alignments combined are, despite the frame shift? (It so happens that that's what I want.) And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University From cjfields at uiuc.edu Fri Nov 9 12:58:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Nov 2007 11:58:16 -0600 Subject: [Bioperl-l] GFF3loader and indexing Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu> Quick question: shouldn't the new Index attribute be passed on to seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping purposes (for instance, properly reloading dumped gff3 data)? I'm testing out a feature editor using volvox.gff3 data in GBrowse and the mRNA features appear to drop this attribute once loaded: Original data: ctgA example gene 1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN. 1;Note=Eden splice form 1;Index=1 ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.1 partial gff3_string(1) output: ctgA example gene 1050 9000 . + . Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . Name=EDEN. 1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1 ctgA example five_prime_UTR 1050 1200 . + . Parent=51;ID=52 ... chris From David.Messina at sbc.su.se Sat Nov 10 06:04:25 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 10 Nov 2007 12:04:25 +0100 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave From sac at bioperl.org Sat Nov 10 17:59:28 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Nov 2007 14:59:28 -0800 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> The Bioperl blast parser should extract that value and you can obtain it from an HSP object, via the HSPI::n() method, documented here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23 Dave's basically correct in his explanation. It's a result of the application of sum statistics by the blast algorithm. You can read all about it in Korf et al's BLAST book. Here's the relevant section: http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1 Steve On Nov 10, 2007 3:04 AM, Dave Messina wrote: > Hi Amir, > > I don't have my BLAST book handy, and my memory is a little fuzzy, but I > think the Expect(2) you're seeing is the E-value based on both HSPs > combined. And I think this is why you see the same Expect value for both -- > because it is shared between them (which sounds like what you wanted). > > Again, this is just from memory, but I think this is an option that has to > be turned on rather than something which Blast decides to do on its own. > > > I don't know whether BioPerl reports this or not. Would you mind e-mailing > me a entire BLAST report as a sample? When I have some time I'd like to play > around with this a bit. > > Thanks, > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Tue Nov 13 06:57:04 2007 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 13 Nov 2007 12:57:04 +0100 Subject: [Bioperl-l] Panel link Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com> Hi, Is it possible with Panel to provide javascript event handlers? With -link we can provide hrefs as: -link => 'http://www.google.com/search?q=$description' or use a coderef that returns a href. However, I'd like to set-up links as: Is this possible by default with Panel? Regards, Bernd From akarger at CGR.Harvard.edu Tue Nov 13 12:12:32 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:12:32 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: Thanks for the reply. I'm curious as to how BLAST decides to do this, but not curious enough to buy the BLAST book. If you want to see this, you could just tblastn the ENSP00000349467 sequence vs. the genome: MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE EVDEMIREADIDGDGQVNYEEFVQMMTAK against the human genome at NCBI or locally. I've attached the tblastn report for that protein, which includes the results I quoted. (It was done as part of a blast of 150 proteins vs. the genome.) -Amir ________________________________ From: dave at davemessina.com [mailto:dave at davemessina.com] On Behalf Of Dave Messina Sent: Saturday, November 10, 2007 6:04 AM To: Amir Karger Cc: bioperl-l Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: ENSP00000349467_tblastn.txt.gz Type: application/x-gzip Size: 9755 bytes Desc: ENSP00000349467_tblastn.txt.gz Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071113/f8853e76/attachment.gz From akarger at CGR.Harvard.edu Tue Nov 13 12:30:52 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:30:52 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: > From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > Of Steve Chervitz > > The Bioperl blast parser should extract that value and you can obtain > it from an HSP object, via the HSPI::n() method, documented here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B io/Search/HSP/HSPI.html#POD23 As I mentioned in my email: And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) And the docs for n() actually say, "This value is not defined with NCBI Blast2 with gapping" although they don't say why. Which may explain why, when I ran the following code on the blast result I included in my last email, I got empty values for all of the n's. (Why is n() undefined for gapped blast if I'm getting n's in my results from that blast?) use warnings; use strict; use Bio::SearchIO; my $blast_out = $ARGV[0]; my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_out, -report_type => 'tblastn'); print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N Evalue)), "\n"; while(my $query = $in->next_result) { while(my $subject = $query->next_hit) { while (my $hsp = $subject->next_hsp) { print join("\t", $query->query_name, $hsp->start("query"), $hsp->end("query"), $hsp->strand("hit"), $subject->name, $hsp->start("hit"), $hsp->end("hit"), $subject->frame, $hsp->n, $hsp->evalue, ),"\n"; } } } > Dave's basically correct in his explanation. It's a result of the > application of sum statistics by the blast algorithm. You can read all > about it in Korf et al's BLAST book. Here's the relevant section: [snip] Thanks, -Amir From cjfields at uiuc.edu Tue Nov 13 12:42:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Nov 2007 11:42:07 -0600 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Amir, Can you file this as a bug? Dave mentioned he would look into it but I think it warrants tracking to make sure it gets fixed: http://www.bioperl.org/wiki/Bugs Attach the example BLAST report from your last post to the report. BTW, I wonder how this appears in XML output? chris On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf >> Of Steve Chervitz >> >> The Bioperl blast parser should extract that value and you can obtain >> it from an HSP object, via the HSPI::n() method, documented here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/Search/HSP/HSPI.html#POD23 > > As I mentioned in my email: > > And does anyone know off-hand if Bioperl will tell me when situations > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > subroutine > would help, but I just get a bunch of empty strings for that, > whether or > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > {"_n"} is > undef.) > > And the docs for n() actually say, "This value is not defined with > NCBI > Blast2 with gapping" although they don't say why. Which may explain > why, > when I ran the following code on the blast result I included in my > last > email, I got empty values for all of the n's. (Why is n() undefined > for > gapped blast if I'm getting n's in my results from that blast?) > > use warnings; > use strict; > use Bio::SearchIO; > > my $blast_out = $ARGV[0]; > my $in = new Bio::SearchIO(-format => 'blast', > -file => $blast_out, > -report_type => 'tblastn'); > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N > Evalue)), "\n"; > while(my $query = $in->next_result) { > while(my $subject = $query->next_hit) { > while (my $hsp = $subject->next_hsp) { > print join("\t", > $query->query_name, > $hsp->start("query"), > $hsp->end("query"), > $hsp->strand("hit"), > $subject->name, > $hsp->start("hit"), > $hsp->end("hit"), > $subject->frame, > $hsp->n, > $hsp->evalue, > ),"\n"; > } > } > } > >> Dave's basically correct in his explanation. It's a result of the >> application of sum statistics by the blast algorithm. You can read >> all >> about it in Korf et al's BLAST book. Here's the relevant section: > > [snip] > > Thanks, > > -Amir > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lskatz at gatech.edu Tue Nov 13 20:27:45 2007 From: lskatz at gatech.edu (Lee Katz) Date: Tue, 13 Nov 2007 20:27:45 -0500 Subject: [Bioperl-l] chromatogram Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Hi, I would like to know how to draw a chromatogram file. Does anyone have any sample code where you read in an scf file and create a jpeg or other image file? For that matter, I want to be able to customize these images with base calls if possible. I really appreciate the help, so thanks! -- Lee Katz From mvrmakam at yahoo.com Wed Nov 14 04:52:13 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST) Subject: [Bioperl-l] Installing Bioperl on Windows XP Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com> Hi, I am encountering a problem while installing Bioperl on Windows XP. I have installed ActivePerl version 5.8.8.822. I am using Perl Package Manager GUI. Also, I am following the instructions outlined for installing Bioperl on Windows. I am getting an error. The error is as follows: Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com') I do not know how to overcome this problem. The other issue is when I type bioperl in the search box I do not see any packages of bioperl. I do not know what the problem is. If anyone of you could guide me through the installation process I would appreciate it. Thanks, Roshan ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From cjfields at uiuc.edu Wed Nov 14 09:02:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Nov 2007 08:02:05 -0600 Subject: [Bioperl-l] Installing Bioperl on Windows XP In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com> References: <235423.72586.qm@web33703.mail.mud.yahoo.com> Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu> The instructions are pretty specific: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Note the section on adding new repositories. As for the PPM connection error, it's more than likely an error with the default address but it isn't bioperl-related; maybe answers lie here: http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- faq2.html#ppm_repositories chris On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote: > Hi, > > I am encountering a problem while installing Bioperl on Windows > XP. I have installed ActivePerl version 5.8.8.822. I am using > Perl Package Manager GUI. Also, I am following the instructions > outlined for installing Bioperl on Windows. I am getting an > error. The error is as follows: > > Downloading ActiveState Package Repository packlist ... failed 500 > Can't connect to ppm4.activestate.com:80 (Bad hostname > 'ppm4.activestate.com') > > I do not know how to overcome this problem. The other issue is > when I type bioperl in the search box I do not see any packages of > bioperl. I do not know what the problem is. If anyone of you > could guide me through the installation process I would appreciate it. > > Thanks, > > Roshan From reshetovdenis at gmail.com Wed Nov 14 12:28:40 2007 From: reshetovdenis at gmail.com (Denis Reshetov) Date: Wed, 14 Nov 2007 20:28:40 +0300 Subject: [Bioperl-l] how to load all genomes Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Dear BioPerl-db Creators, I`m trying to load all genomes from NCBI ftp site to my BioSql database using common script load_seqdatabase.pl But it seems very slow. Let me know what is the better way to do it? Thank you very much, Denis. From barry.moore at genetics.utah.edu Wed Nov 14 14:18:29 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 14 Nov 2007 12:18:29 -0700 Subject: [Bioperl-l] how to load all genomes In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu> Denis, You might be interested in this thread from a couple years ago. I was having a similar problem, that I eventually resolved. Unfortunately the reason for the problem and the solution weren't entirely clear, but you may be able to glean some ideas from it. Also, you may have already done this, but I suggest searching the archives from this list because it seems like this comes up every now and then, so there may be other postings similar to the one I'm sending you that could help you. http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html Finally, if you are still having problems, you'll want to include a few more details about your situation. What DB are you using, have you preloaded taxonomy data etc. How fast/slow are your sequences loading? Barry On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote: > Dear BioPerl-db Creators, > > I`m trying to load all genomes from NCBI ftp site > to my BioSql database using common script load_seqdatabase.pl > > But it seems very slow. Let me know what is the better way to do it? > > Thank you very much, > > Denis. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Nov 14 14:57:49 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 08:57:49 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Here's my trace viewer. Please excuse my dodgy Perl and debugging code as it's still under development :-) Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ------------------------------------------------------------------------ ------------------ #!perl -w use ABI; use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Data::Dumper; use Getopt::Long; use constant HEIGHT => 300; GetOptions ('h|height=i' => \$HEIGHT, 'f|file=s' => \$FILE, 'o|out=s' => \$OUTFILE, 'l|left=s' => \$LEFT_SEQ, 'r|right=s' => \$RIGHT_SEQ, 's|size=i' => \$SIZE, ) || die < Set height of image (${\HEIGHT} pixels default) --file Filename for the ABI trace file --out Filename for the generated .png image --left --right --size Parse an ABI trace file and render a PNG image. See http://search.cpan.org/dist/ABI/ABI.pm or http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm USAGE my $height = $HEIGHT || HEIGHT; my $file = $FILE; my $outfile = $OUTFILE; my $abi = ABI->new(-file=> $file); my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" my @base_calls = $abi->get_base_calls(); # Get the base calls my $sequence =$abi->get_sequence(); @bp = split(//, $sequence); # iterate over array $size = $abi->get_trace_length(); for ($i=0,$count = 0; $i<$size; $i++) { if(grep(/\b$i\b/, @base_calls)){ $bases[$i] = $bp[$count]; $count++; }else{ $bases[$i] = ' '; } } # create the data. see GD::Graph::Data for details of the format my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); $graph->set( title => $abi->get_sample_name(), # y_max_value => $abi->get_max_trace() + 50, x_max_value => $abi->get_trace_length(), t_margin => 5, b_margin => 5, l_margin => 5, r_margin => 5, x_ticks => 0, text_space => 0, line_width => 1, transparent => 0, b_margin => 30, t_margin => 35, x_plot_values => 0, interlaced => 1, ); # allocate some colors for drawing the bases #use colors same as Chromas $graph->set( dclrs => [ qw( green blue black red pink) ] ); #plot the data my $gd = $graph->plot(\@data); $black = $gd->colorAllocate(0,0,0); # A $blue = $gd->colorAllocate(0,0,255); # C $red = $gd->colorAllocate(255,0,0); # G $green = $gd->colorAllocate(0,255,0); # T $magenta =$gd->colorAllocate(255,0,255); # N $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn $gray = $gd->colorAllocate(210,210,210); %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", $magenta, " ",$white); #$start_base = index(lc($sequence),lc($LEFT_SEQ)); $start_base = find_match($sequence,$LEFT_SEQ); #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ $end_base = find_match($sequence,$RIGHT_SEQ, 1); if($end_base){ $end_base += length($RIGHT_SEQ); } # get the coords of the features on the image @coords = $graph->get_hotspot(1); $size = @coords; $printed_num = 1; $basecount = 0; $numstoprint = $basecount - $start_base; # draw the colored bases and scale at top and bottom of image for ($i=0,$count = 0; $i<$size; $i++) { $c = $coords[$i]; (undef, $xs, undef, undef, undef, undef) = @$c; $base = $bases[$i]; if($base =~ /[ACGTN]/){ if($start_base - 1 == $basecount){$start_base_coord = $xs;} if($end_base - 1 == $basecount){$end_base_coord = $xs;} if(defined($SIZE) && $start_base+$SIZE -2 == $basecount){$end_base_coord_by_size = $xs;} $basecount++; $numstoprint++; $printed_num = 0; } # print the bases top and bottom $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); # print scale if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ if($LEFT_SEQ){ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; }else{ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; } } $top_right_corner = $xs; } # only draw the clipped region if the calculated size is + or - 6bp #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) - $SIZE >= -6 ){ # draw the clipped regions as gray #if LEFT_SEQ supplied and a match found if($LEFT_SEQ && $start_base > 0){ $gd->filledRectangle(38,35,$start_base_coord - 1,$height - 33,$red); $clipped = 1; } #if RIGHT_SEQ supplied and a match found if($RIGHT_SEQ && $end_base > 0){ print join("\t", ($end_base)),"\n"; $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - 33,$gray); $clipped = 1; } #if no RIGHT_SEQ supplied or no match found, use left match + seq length if(!$RIGHT_SEQ || $end_base < 0){ $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh t - 33,$blue); $clipped = 1; } # set height based on max trace within clipped region $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); # need to re-plot the data over the grayed out area $graph->plot(\@data) if $clipped; $gd->filledRectangle(0,0,$top_right_corner,33,$white); #} #print the graph open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; binmode OUT; print OUT $gd->png; close OUT; sub find_match{ my ($sequence,$query,$last) = @_; return -1 if length($query) < 6; my($odds, $evens, $ones, $twos, $threes, $match_pos); # try exact match $match_pos = do_regex($query, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every second base starting from the second base e.g. it will be .C.T.C.G.etc map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} ($query=~m/(\w\w)/g); $match_pos = do_regex($odds, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($evens, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every third base starting from the first base e.g. it will be C..T..G..T etc map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; $threes.="..$3"} ($query =~m/(\w\w\w)/g); $match_pos = do_regex($ones, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($twos, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($threes, $sequence,$last); return $match_pos if $match_pos > 0; # not found return -1; } sub do_regex(){ my ($query,$sequence,$last)= @_; #print "trying $query \n"; my $result = -1; $result = pos($sequence)-length($query)+1 if $last && ($sequence =~ m/.*($query)/ig); $result = pos($sequence)-length($query)+1 if($sequence =~ m/.*?($query)/ig); return $result; } ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 15:47:20 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 15:47:20 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: <473B5ED8.1090201@mail.nih.gov> I guess you need chromatogram from SCF. I can't help in that. ABI.pm is not in Bioperl distribution. But to make the record straight, you can use one step chromatogram drawing in SVG from ABI file using my BioSVG module, available at: http://www.bioinformatics.org/~malay/biosvg/ Malay Smithies, Russell wrote: > Here's my trace viewer. > Please excuse my dodgy Perl and debugging code as it's still under > development :-) > > > Russell Smithies > > Bioinformatics Software Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > ------------------------------------------------------------------------ > ------------------ > > #!perl -w > use ABI; > > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Data::Dumper; > > > use Getopt::Long; > > use constant HEIGHT => 300; > > GetOptions ('h|height=i' => \$HEIGHT, > 'f|file=s' => \$FILE, > 'o|out=s' => \$OUTFILE, > 'l|left=s' => \$LEFT_SEQ, > 'r|right=s' => \$RIGHT_SEQ, > 's|size=i' => \$SIZE, > ) || die < Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > test2.png -l actacgtacgta -r atgatcgtacgtac > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > Options: > --height Set height of image (${\HEIGHT} pixels default) > --file Filename for the ABI trace file > --out Filename for the generated .png image > --left > --right > --size > > Parse an ABI trace file and render a PNG image. > See http://search.cpan.org/dist/ABI/ABI.pm > or > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > USAGE > > my $height = $HEIGHT || HEIGHT; > my $file = $FILE; > my $outfile = $OUTFILE; > > my $abi = ABI->new(-file=> $file); > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > my @base_calls = $abi->get_base_calls(); # Get the base calls > my $sequence =$abi->get_sequence(); > @bp = split(//, $sequence); > > > > # iterate over array > $size = $abi->get_trace_length(); > for ($i=0,$count = 0; $i<$size; $i++) { > if(grep(/\b$i\b/, @base_calls)){ > $bases[$i] = $bp[$count]; > $count++; > }else{ > $bases[$i] = ' '; > } > } > > # create the data. see GD::Graph::Data for details of the format > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > $graph->set( > title => $abi->get_sample_name(), > # y_max_value => $abi->get_max_trace() + 50, > x_max_value => $abi->get_trace_length(), > t_margin => 5, > b_margin => 5, > l_margin => 5, > r_margin => 5, > x_ticks => 0, > text_space => 0, > line_width => 1, > transparent => 0, > b_margin => 30, > t_margin => 35, > x_plot_values => 0, > interlaced => 1, > ); > > # allocate some colors for drawing the bases > #use colors same as Chromas > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > #plot the data > my $gd = $graph->plot(\@data); > > $black = $gd->colorAllocate(0,0,0); # A > $blue = $gd->colorAllocate(0,0,255); # C > $red = $gd->colorAllocate(255,0,0); # G > $green = $gd->colorAllocate(0,255,0); # T > $magenta =$gd->colorAllocate(255,0,255); # N > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > $gray = $gd->colorAllocate(210,210,210); > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > $magenta, " ",$white); > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > $start_base = find_match($sequence,$LEFT_SEQ); > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > if($end_base){ > $end_base += length($RIGHT_SEQ); > } > > > # get the coords of the features on the image > @coords = $graph->get_hotspot(1); > $size = @coords; > $printed_num = 1; > $basecount = 0; > $numstoprint = $basecount - $start_base; > > # draw the colored bases and scale at top and bottom of image > for ($i=0,$count = 0; $i<$size; $i++) { > $c = $coords[$i]; > (undef, $xs, undef, undef, undef, undef) = @$c; > $base = $bases[$i]; > if($base =~ /[ACGTN]/){ > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > if(defined($SIZE) && $start_base+$SIZE -2 == > $basecount){$end_base_coord_by_size = $xs;} > $basecount++; > $numstoprint++; > $printed_num = 0; > } > # print the bases top and bottom > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > # print scale > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > if($LEFT_SEQ){ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > }else{ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > } > } > $top_right_corner = $xs; > } > > > > # only draw the clipped region if the calculated size is + or - 6bp > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > - $SIZE >= -6 ){ > # draw the clipped regions as gray > #if LEFT_SEQ supplied and a match found > if($LEFT_SEQ && $start_base > 0){ > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > 33,$red); > $clipped = 1; > } > #if RIGHT_SEQ supplied and a match found > if($RIGHT_SEQ && $end_base > 0){ > print join("\t", ($end_base)),"\n"; > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > 33,$gray); > $clipped = 1; > } > #if no RIGHT_SEQ supplied or no match found, use left match + seq > length > if(!$RIGHT_SEQ || $end_base < 0){ > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > t - 33,$blue); > $clipped = 1; > } > > > > # set height based on max trace within clipped region > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > # need to re-plot the data over the grayed out area > $graph->plot(\@data) if $clipped; > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > #} > > #print the graph > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > binmode OUT; > print OUT $gd->png; > close OUT; > > > sub find_match{ > my ($sequence,$query,$last) = @_; > return -1 if length($query) < 6; > my($odds, $evens, $ones, $twos, $threes, $match_pos); > # try exact match > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > $match_pos > 0; > > # try matching every second base starting from the second base e.g. > it will be .C.T.C.G.etc > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > ($query=~m/(\w\w)/g); > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > if $match_pos > 0; > > # try matching every third base starting from the first base e.g. it > will be C..T..G..T etc > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > if $match_pos > 0; > > # not found > return -1; > } > > sub do_regex(){ > my ($query,$sequence,$last)= @_; > #print "trying $query \n"; > my $result = -1; > $result = pos($sequence)-length($query)+1 if $last && ($sequence > =~ m/.*($query)/ig); > $result = pos($sequence)-length($query)+1 if($sequence =~ > m/.*?($query)/ig); > return $result; > } > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Lee Katz >> Sent: Wednesday, 14 November 2007 2:28 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] chromatogram >> >> Hi, >> I would like to know how to draw a chromatogram file. Does anyone >> have any sample code where you read in an scf file and create a jpeg >> or other image file? >> For that matter, I want to be able to customize these images with base >> calls if possible. I really appreciate the help, so thanks! >> >> -- >> Lee Katz >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Malay K Basu www.malaybasu.net From Russell.Smithies at agresearch.co.nz Wed Nov 14 15:58:19 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 09:58:19 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B5ED8.1090201@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: We try and avoid SVG at all costs as installing plugins and viewers in a locked down corporate environment can be more trouble than it's worth whereas generating .png images works for any browser with no extras required. We actually call this trace drawing code from Python which then generates webpages with the embedded image. It also means we don't need to licence, install and maintain a trace viewer like Chromas. :-) Russell > -----Original Message----- > From: Malay [mailto:mbasu at mail.nih.gov] > Sent: Thursday, 15 November 2007 9:47 a.m. > To: Smithies, Russell > Cc: Lee Katz; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] chromatogram > > I guess you need chromatogram from SCF. I can't help in that. ABI.pm is > not in Bioperl distribution. But to make the record straight, you can > use one step chromatogram drawing in SVG from ABI file using my BioSVG > module, available at: > > http://www.bioinformatics.org/~malay/biosvg/ > > Malay > > > > > Smithies, Russell wrote: > > Here's my trace viewer. > > Please excuse my dodgy Perl and debugging code as it's still under > > development :-) > > > > > > Russell Smithies > > > > Bioinformatics Software Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > ------------------------------------------------------------------------ > > ------------------ > > > > #!perl -w > > use ABI; > > > > use GD::Graph::lines; > > use GD::Graph::colour; > > use GD::Graph::Data; > > > > use Data::Dumper; > > > > > > use Getopt::Long; > > > > use constant HEIGHT => 300; > > > > GetOptions ('h|height=i' => \$HEIGHT, > > 'f|file=s' => \$FILE, > > 'o|out=s' => \$OUTFILE, > > 'l|left=s' => \$LEFT_SEQ, > > 'r|right=s' => \$RIGHT_SEQ, > > 's|size=i' => \$SIZE, > > ) || die < > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > > test2.png -l actacgtacgta -r atgatcgtacgtac > > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > > > Options: > > --height Set height of image (${\HEIGHT} pixels default) > > --file Filename for the ABI trace file > > --out Filename for the generated .png image > > --left > > --right > > --size > > > > Parse an ABI trace file and render a PNG image. > > See http://search.cpan.org/dist/ABI/ABI.pm > > or > > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > > USAGE > > > > my $height = $HEIGHT || HEIGHT; > > my $file = $FILE; > > my $outfile = $OUTFILE; > > > > my $abi = ABI->new(-file=> $file); > > > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > > > my @base_calls = $abi->get_base_calls(); # Get the base calls > > my $sequence =$abi->get_sequence(); > > @bp = split(//, $sequence); > > > > > > > > # iterate over array > > $size = $abi->get_trace_length(); > > for ($i=0,$count = 0; $i<$size; $i++) { > > if(grep(/\b$i\b/, @base_calls)){ > > $bases[$i] = $bp[$count]; > > $count++; > > }else{ > > $bases[$i] = ' '; > > } > > } > > > > # create the data. see GD::Graph::Data for details of the format > > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > > $graph->set( > > title => $abi->get_sample_name(), > > # y_max_value => $abi->get_max_trace() + 50, > > x_max_value => $abi->get_trace_length(), > > t_margin => 5, > > b_margin => 5, > > l_margin => 5, > > r_margin => 5, > > x_ticks => 0, > > text_space => 0, > > line_width => 1, > > transparent => 0, > > b_margin => 30, > > t_margin => 35, > > x_plot_values => 0, > > interlaced => 1, > > ); > > > > # allocate some colors for drawing the bases > > #use colors same as Chromas > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > > > #plot the data > > my $gd = $graph->plot(\@data); > > > > $black = $gd->colorAllocate(0,0,0); # A > > $blue = $gd->colorAllocate(0,0,255); # C > > $red = $gd->colorAllocate(255,0,0); # G > > $green = $gd->colorAllocate(0,255,0); # T > > $magenta =$gd->colorAllocate(255,0,255); # N > > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > > $gray = $gd->colorAllocate(210,210,210); > > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > > $magenta, " ",$white); > > > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > > $start_base = find_match($sequence,$LEFT_SEQ); > > > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > > if($end_base){ > > $end_base += length($RIGHT_SEQ); > > } > > > > > > # get the coords of the features on the image > > @coords = $graph->get_hotspot(1); > > $size = @coords; > > $printed_num = 1; > > $basecount = 0; > > $numstoprint = $basecount - $start_base; > > > > # draw the colored bases and scale at top and bottom of image > > for ($i=0,$count = 0; $i<$size; $i++) { > > $c = $coords[$i]; > > (undef, $xs, undef, undef, undef, undef) = @$c; > > $base = $bases[$i]; > > if($base =~ /[ACGTN]/){ > > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > > if(defined($SIZE) && $start_base+$SIZE -2 == > > $basecount){$end_base_coord_by_size = $xs;} > > $basecount++; > > $numstoprint++; > > $printed_num = 0; > > } > > # print the bases top and bottom > > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > > > # print scale > > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > > if($LEFT_SEQ){ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > }else{ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > } > > } > > $top_right_corner = $xs; > > } > > > > > > > > # only draw the clipped region if the calculated size is + or - 6bp > > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > > - $SIZE >= -6 ){ > > # draw the clipped regions as gray > > #if LEFT_SEQ supplied and a match found > > if($LEFT_SEQ && $start_base > 0){ > > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > > 33,$red); > > $clipped = 1; > > } > > #if RIGHT_SEQ supplied and a match found > > if($RIGHT_SEQ && $end_base > 0){ > > print join("\t", ($end_base)),"\n"; > > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > > 33,$gray); > > $clipped = 1; > > } > > #if no RIGHT_SEQ supplied or no match found, use left match + seq > > length > > if(!$RIGHT_SEQ || $end_base < 0){ > > > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > > t - 33,$blue); > > $clipped = 1; > > } > > > > > > > > # set height based on max trace within clipped region > > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > > > # need to re-plot the data over the grayed out area > > $graph->plot(\@data) if $clipped; > > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > > > #} > > > > #print the graph > > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > > binmode OUT; > > print OUT $gd->png; > > close OUT; > > > > > > sub find_match{ > > my ($sequence,$query,$last) = @_; > > return -1 if length($query) < 6; > > my($odds, $evens, $ones, $twos, $threes, $match_pos); > > # try exact match > > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > > $match_pos > 0; > > > > # try matching every second base starting from the second base e.g. > > it will be .C.T.C.G.etc > > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > > ($query=~m/(\w\w)/g); > > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # try matching every third base starting from the first base e.g. it > > will be C..T..G..T etc > > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # not found > > return -1; > > } > > > > sub do_regex(){ > > my ($query,$sequence,$last)= @_; > > #print "trying $query \n"; > > my $result = -1; > > $result = pos($sequence)-length($query)+1 if $last && ($sequence > > =~ m/.*($query)/ig); > > $result = pos($sequence)-length($query)+1 if($sequence =~ > > m/.*?($query)/ig); > > return $result; > > } > > > > ------------------------------------------------------------------------ > > ------------------ > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of Lee Katz > >> Sent: Wednesday, 14 November 2007 2:28 p.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] chromatogram > >> > >> Hi, > >> I would like to know how to draw a chromatogram file. Does anyone > >> have any sample code where you read in an scf file and create a jpeg > >> or other image file? > >> For that matter, I want to be able to customize these images with base > >> calls if possible. I really appreciate the help, so thanks! > >> > >> -- > >> Lee Katz > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ============================================================= > ========== > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > > ============================================================= > ========== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Malay K Basu > www.malaybasu.net ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 16:04:25 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 16:04:25 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: <473B62D9.8010004@mail.nih.gov> You don't need any plugin. Firefox natively can show most of the SVG files. -Malay Smithies, Russell wrote: > We try and avoid SVG at all costs as installing plugins and viewers in a > locked down corporate environment can be more trouble than it's worth > whereas generating .png images works for any browser with no extras > required. > We actually call this trace drawing code from Python which then > generates webpages with the embedded image. > It also means we don't need to licence, install and maintain a trace > viewer like Chromas. > :-) > > Russell > >> -----Original Message----- >> From: Malay [mailto:mbasu at mail.nih.gov] >> Sent: Thursday, 15 November 2007 9:47 a.m. >> To: Smithies, Russell >> Cc: Lee Katz; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] chromatogram >> >> I guess you need chromatogram from SCF. I can't help in that. ABI.pm > is >> not in Bioperl distribution. But to make the record straight, you can >> use one step chromatogram drawing in SVG from ABI file using my BioSVG >> module, available at: >> >> http://www.bioinformatics.org/~malay/biosvg/ >> >> Malay >> >> >> >> >> Smithies, Russell wrote: >>> Here's my trace viewer. >>> Please excuse my dodgy Perl and debugging code as it's still under >>> development :-) >>> >>> >>> Russell Smithies >>> >>> Bioinformatics Software Developer >>> T +64 3 489 9085 >>> E russell.smithies at agresearch.co.nz >>> >>> Invermay Research Centre >>> Puddle Alley, >>> Mosgiel, >>> New Zealand >>> T +64 3 489 3809 >>> F +64 3 489 9174 >>> www.agresearch.co.nz >>> >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>> #!perl -w >>> use ABI; >>> >>> use GD::Graph::lines; >>> use GD::Graph::colour; >>> use GD::Graph::Data; >>> >>> use Data::Dumper; >>> >>> >>> use Getopt::Long; >>> >>> use constant HEIGHT => 300; >>> >>> GetOptions ('h|height=i' => \$HEIGHT, >>> 'f|file=s' => \$FILE, >>> 'o|out=s' => \$OUTFILE, >>> 'l|left=s' => \$LEFT_SEQ, >>> 'r|right=s' => \$RIGHT_SEQ, >>> 's|size=i' => \$SIZE, >>> ) || die <>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o >>> test2.png -l actacgtacgta -r atgatcgtacgtac >>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 >>> --out test2.png --left actacgtacgta --right atgatcgtacgtac >>> >>> Options: >>> --height Set height of image (${\HEIGHT} pixels default) >>> --file Filename for the ABI trace file >>> --out Filename for the generated .png image >>> --left >>> --right >>> --size >>> >>> Parse an ABI trace file and render a PNG image. >>> See http://search.cpan.org/dist/ABI/ABI.pm >>> or >>> http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm >>> USAGE >>> >>> my $height = $HEIGHT || HEIGHT; >>> my $file = $FILE; >>> my $outfile = $OUTFILE; >>> >>> my $abi = ABI->new(-file=> $file); >>> >>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" >>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" >>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" >>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" >>> >>> my @base_calls = $abi->get_base_calls(); # Get the base calls >>> my $sequence =$abi->get_sequence(); >>> @bp = split(//, $sequence); >>> >>> >>> >>> # iterate over array >>> $size = $abi->get_trace_length(); >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> if(grep(/\b$i\b/, @base_calls)){ >>> $bases[$i] = $bp[$count]; >>> $count++; >>> }else{ >>> $bases[$i] = ' '; >>> } >>> } >>> >>> # create the data. see GD::Graph::Data for details of the format >>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); >>> >>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); >>> $graph->set( >>> title => $abi->get_sample_name(), >>> # y_max_value => $abi->get_max_trace() + 50, >>> x_max_value => $abi->get_trace_length(), >>> t_margin => 5, >>> b_margin => 5, >>> l_margin => 5, >>> r_margin => 5, >>> x_ticks => 0, >>> text_space => 0, >>> line_width => 1, >>> transparent => 0, >>> b_margin => 30, >>> t_margin => 35, >>> x_plot_values => 0, >>> interlaced => 1, >>> ); >>> >>> # allocate some colors for drawing the bases >>> #use colors same as Chromas >>> $graph->set( dclrs => [ qw( green blue black red pink) ] ); >>> >>> #plot the data >>> my $gd = $graph->plot(\@data); >>> >>> $black = $gd->colorAllocate(0,0,0); # A >>> $blue = $gd->colorAllocate(0,0,255); # C >>> $red = $gd->colorAllocate(255,0,0); # G >>> $green = $gd->colorAllocate(0,255,0); # T >>> $magenta =$gd->colorAllocate(255,0,255); # N >>> $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn >>> $gray = $gd->colorAllocate(210,210,210); >>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", >>> $magenta, " ",$white); >>> >>> #$start_base = index(lc($sequence),lc($LEFT_SEQ)); >>> $start_base = find_match($sequence,$LEFT_SEQ); >>> >>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ >>> $end_base = find_match($sequence,$RIGHT_SEQ, 1); >>> if($end_base){ >>> $end_base += length($RIGHT_SEQ); >>> } >>> >>> >>> # get the coords of the features on the image >>> @coords = $graph->get_hotspot(1); >>> $size = @coords; >>> $printed_num = 1; >>> $basecount = 0; >>> $numstoprint = $basecount - $start_base; >>> >>> # draw the colored bases and scale at top and bottom of image >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> $c = $coords[$i]; >>> (undef, $xs, undef, undef, undef, undef) = @$c; >>> $base = $bases[$i]; >>> if($base =~ /[ACGTN]/){ >>> if($start_base - 1 == $basecount){$start_base_coord = $xs;} >>> if($end_base - 1 == $basecount){$end_base_coord = $xs;} >>> if(defined($SIZE) && $start_base+$SIZE -2 == >>> $basecount){$end_base_coord_by_size = $xs;} >>> $basecount++; >>> $numstoprint++; >>> $printed_num = 0; >>> } >>> # print the bases top and bottom >>> $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); >>> $gd->string(GD::Font->Small(),$xs,$height - > 30,$base,$colors{$base}); >>> # print scale >>> if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ >>> if($LEFT_SEQ){ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> }else{ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> } >>> } >>> $top_right_corner = $xs; >>> } >>> >>> >>> >>> # only draw the clipped region if the calculated size is + or - 6bp >>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - > $start_base) >>> - $SIZE >= -6 ){ >>> # draw the clipped regions as gray >>> #if LEFT_SEQ supplied and a match found >>> if($LEFT_SEQ && $start_base > 0){ >>> $gd->filledRectangle(38,35,$start_base_coord - 1,$height - >>> 33,$red); >>> $clipped = 1; >>> } >>> #if RIGHT_SEQ supplied and a match found >>> if($RIGHT_SEQ && $end_base > 0){ >>> print join("\t", ($end_base)),"\n"; >>> $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height > - >>> 33,$gray); >>> $clipped = 1; >>> } >>> #if no RIGHT_SEQ supplied or no match found, use left match + seq >>> length >>> if(!$RIGHT_SEQ || $end_base < 0){ >>> >>> > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh >>> t - 33,$blue); >>> $clipped = 1; >>> } >>> >>> >>> >>> # set height based on max trace within clipped region >>> $graph->set( y_max_value => 3000);#$abi->get_max_trace() + > 50); >>> # need to re-plot the data over the grayed out area >>> $graph->plot(\@data) if $clipped; >>> $gd->filledRectangle(0,0,$top_right_corner,33,$white); >>> >>> #} >>> >>> #print the graph >>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; >>> binmode OUT; >>> print OUT $gd->png; >>> close OUT; >>> >>> >>> sub find_match{ >>> my ($sequence,$query,$last) = @_; >>> return -1 if length($query) < 6; >>> my($odds, $evens, $ones, $twos, $threes, $match_pos); >>> # try exact match >>> $match_pos = do_regex($query, $sequence,$last); return > $match_pos if >>> $match_pos > 0; >>> >>> # try matching every second base starting from the second base > e.g. >>> it will be .C.T.C.G.etc >>> map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} >>> ($query=~m/(\w\w)/g); >>> $match_pos = do_regex($odds, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($evens, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # try matching every third base starting from the first base > e.g. it >>> will be C..T..G..T etc >>> map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; >>> $threes.="..$3"} ($query =~m/(\w\w\w)/g); >>> $match_pos = do_regex($ones, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($twos, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($threes, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # not found >>> return -1; >>> } >>> >>> sub do_regex(){ >>> my ($query,$sequence,$last)= @_; >>> #print "trying $query \n"; >>> my $result = -1; >>> $result = pos($sequence)-length($query)+1 if $last && > ($sequence >>> =~ m/.*($query)/ig); >>> $result = pos($sequence)-length($query)+1 if($sequence =~ >>> m/.*?($query)/ig); >>> return $result; >>> } >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open- >>>> bio.org] On Behalf Of Lee Katz >>>> Sent: Wednesday, 14 November 2007 2:28 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] chromatogram >>>> >>>> Hi, >>>> I would like to know how to draw a chromatogram file. Does anyone >>>> have any sample code where you read in an scf file and create a > jpeg >>>> or other image file? >>>> For that matter, I want to be able to customize these images with > base >>>> calls if possible. I really appreciate the help, so thanks! >>>> >>>> -- >>>> Lee Katz >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ============================================================= >> ========== >>> Attention: The information contained in this message and/or > attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or > privileged >>> material. Any review, retransmission, dissemination or other use of, > or >>> taking of any action in reliance upon, this information by persons > or >>> entities other than the intended recipients is prohibited by > AgResearch >>> Limited. If you have received this message in error, please notify > the >>> sender immediately. >>> >> ============================================================= >> ========== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Malay K Basu >> www.malaybasu.net > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- Malay K Basu www.malaybasu.net From tomboy at cs.huji.ac.il Wed Nov 14 21:43:43 2007 From: tomboy at cs.huji.ac.il (Tomer Hertz) Date: Wed, 14 Nov 2007 18:43:43 -0800 Subject: [Bioperl-l] problems in stalling bio perl Message-ID: hi when I try to install bioperl I get the following error message: hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 $ perl Build.PL Can't find file lib/Module/Build.pm to determine version at /usr/lib/perl5/site_ perl/5.8/Module/Build/Base.pm line 950. can you please help. I have tried reinstalling the build command and that does not seem to help as well. many thanks --Tomer -- -------------------------------------------------------------------------------- Tomer Hertz Postdoctoral Researcher Machine Learning and Applied Statistics Microsoft Research One Microsoft Way, Redmond, WA, 98052, USA Homepage: www.cs.huji.ac.il/~tomboy Email: hertz at microsoft dot com Tel: (425)-421-8313 Fax: (425) 936-7329 -------------------------------------------------------------------------------- From lskatz at gatech.edu Thu Nov 15 08:24:02 2007 From: lskatz at gatech.edu (Lee Katz) Date: Thu, 15 Nov 2007 08:24:02 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B62D9.8010004@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Thank you all. Are you all sure in that there is no way to go from an scf to an image though? I do have abi files, but I am relying on Phred output for base calls for other things and I want to stay consistent. This means that if I use the fasta files that I get from Phred in another part of my program, I need to use the scf files it produces. If this is not possible, do you know if drawing an scf is in the works? Thanks. -- Lee Katz http://www.lskatz.com From cain.cshl at gmail.com Thu Nov 15 09:21:26 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 15 Nov 2007 09:21:26 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <1195136486.2785.12.camel@localhost.localdomain> Hi Lee, Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses Bio::SCF to draw trace files onto a Bio::Graphics::Panel. Bio::SCF is not part of bioperl, so you have to get it from CPAN and it depends on the Staden io-lib package, so you'll need that too. You can get GBrowse from http://www.gmod.org/gbrowse , and you can look at the tutorial for more information on configuring the trace glyph. Scott On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote: > Thank you all. > Are you all sure in that there is no way to go from an scf to an image > though? I do have abi files, but I am relying on Phred output for > base calls for other things and I want to stay consistent. This means > that if I use the fasta files that I get from Phred in another part of > my program, I need to use the scf files it produces. > > If this is not possible, do you know if drawing an scf is in the works? Thanks. > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From bosborne11 at verizon.net Thu Nov 15 09:18:05 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 09:18:05 -0500 Subject: [Bioperl-l] problems in stalling bio perl In-Reply-To: Message-ID: Tomer, Interesting. When I used Cygwin I always worked entirely within the C: drive, it looks like you're executing the script from the E: drive. Is Cygwin installed in C:/cygwin? You can see what I'm getting at, it's possible that you need to set $PERL5LIB to something like /cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say? Brian O. On 11/14/07 9:43 PM, "Tomer Hertz" wrote: > hi > when I try to install bioperl I get the following error message: > > hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 > $ perl Build.PL > Can't find file lib/Module/Build.pm to determine version at > /usr/lib/perl5/site_ > perl/5.8/Module/Build/Base.pm line 950. > can you please help. I have tried reinstalling the build command and that > does not seem to help as well. > > many thanks > --Tomer From bernd.web at gmail.com Thu Nov 15 10:26:42 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 16:26:42 +0100 Subject: [Bioperl-l] Graphics::Panel Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Hi, Has someone been able to access '$description' for the production of imagemaps with Graphics::Panel? The map below does not print the "title" tag at all, '$description' seems not available, although for the tracks ($panel->add_track) it is available. $map = $panel->create_web_map($mapname, $linkrule, '$description'); Replacing '$description' with a coderef for the titletag does work, if I use the code below my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } Regards, Bernd From luciap at sas.upenn.edu Thu Nov 15 10:44:21 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Thu, 15 Nov 2007 10:44:21 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Hi I was asked this question recently and it occurred to me I must be doing things inefficiently To produce gff file I was using SeqIO to parse the required fields, then according to the conventions just printing out whatever was required tab delimited, which is easy but if I wanted to generate a genbank file, extracting features from a gff file and a plain fasta file it was more complicated is there support for gff in bioperl now? anyone can contribute with smart way to go from/to gff, genebank and embl? thanks very much Lucia Peixoto Department of Biology,SAS University of Pennsylvania From lstein at cshl.edu Thu Nov 15 12:38:04 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Nov 2007 12:38:04 -0500 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Depending on which Feature object you use, you may have to use a tag named "note" instead of "description". Lincoln On Nov 15, 2007 10:26 AM, Bernd Web wrote: > Hi, > > Has someone been able to access '$description' for the production of > imagemaps with Graphics::Panel? > The map below does not print the "title" tag at all, '$description' > seems not available, although for the tracks ($panel->add_track) it is > available. > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > Replacing '$description' with a coderef for the titletag does work, if > I use the code below > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bernd.web at gmail.com Thu Nov 15 13:03:19 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 19:03:19 +0100 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com> On Nov 15, 2007 6:38 PM, Lincoln Stein wrote: > Depending on which Feature object you use, you may have to use a tag named > "note" instead of "description". > > Lincoln > > > > On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote: > > > > > > > > Hi, > > > > Has someone been able to access '$description' for the production of > > imagemaps with Graphics::Panel? > > The map below does not print the "title" tag at all, '$description' > > seems not available, although for the tracks ($panel->add_track) it is > > available. > > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > > > Replacing '$description' with a coderef for the titletag does work, if > > I use the code below > > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > > > > Regards, > > Bernd > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Nov 15 13:43:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Nov 2007 12:43:02 -0600 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> There are currently many ways to get what you want, but not all are consistent (particularly re: GFF3). We are aiming for more consistent, compliant GFF/GTF output in the next developer series (1.7) of Bioperl. You can try using bp_genbank2gff or bp_genbank2gff3 (both in the scripts directory); these are probably the most common way when working directly from a seq record. Bio::Tools::GFF is the most commonly used class though I'm unsure of it's status for GFF3 output. From within a Bio::SeqI you can call write_gff() (currently not very flexible) or from the SeqFeature itself gff_string(). Bio::Graphics::Feature has the additional method gff3_string(). Bio::FeatureIO is also an option, though I would consider it very experimental (it will likely undergo significant revision in the next bioperl dev series). Any others anyone can think of, maybe non-BioPerl related as well? chris On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > Hi > I was asked this question recently > and it occurred to me I must be doing things inefficiently > To produce gff file I was using SeqIO to parse the required fields, > then > according to the conventions just printing out whatever was > required tab > delimited, which is easy > > but if I wanted to generate a genbank file, extracting features > from a gff file > and a plain fasta file it was more complicated > is there support for gff in bioperl now? > anyone can contribute with smart way to go from/to gff, genebank > and embl? > > thanks very much > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Nov 15 14:19:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 14:19:41 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> Message-ID: Chris, There's also a genbank2gff3.PLS script in the GMOD package ( http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? revision=1.5&view=markup). However, it has not been modified for a couple of years, it may not be the "preferred" script. See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information on using Bioperl's bp_genbank2gff3 script. Brian O. On 11/15/07 1:43 PM, "Chris Fields" wrote: > There are currently many ways to get what you want, but not all are > consistent (particularly re: GFF3). We are aiming for more > consistent, compliant GFF/GTF output in the next developer series > (1.7) of Bioperl. > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > scripts directory); these are probably the most common way when > working directly from a seq record. Bio::Tools::GFF is the most > commonly used class though I'm unsure of it's status for GFF3 > output. From within a Bio::SeqI you can call write_gff() (currently > not very flexible) or from the SeqFeature itself gff_string(). > Bio::Graphics::Feature has the additional method gff3_string(). > Bio::FeatureIO is also an option, though I would consider it very > experimental (it will likely undergo significant revision in the next > bioperl dev series). > > Any others anyone can think of, maybe non-BioPerl related as well? > > chris > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > >> Hi >> I was asked this question recently >> and it occurred to me I must be doing things inefficiently >> To produce gff file I was using SeqIO to parse the required fields, >> then >> according to the conventions just printing out whatever was >> required tab >> delimited, which is easy >> >> but if I wanted to generate a genbank file, extracting features >> from a gff file >> and a plain fasta file it was more complicated >> is there support for gff in bioperl now? >> anyone can contribute with smart way to go from/to gff, genebank >> and embl? >> >> thanks very much >> >> Lucia Peixoto >> Department of Biology,SAS >> University of Pennsylvania >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Nov 15 17:31:28 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 16 Nov 2007 11:31:28 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Just to add to this, does anyone have any code for reading .sff 'traces' from 454 sequences? Thanx, Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From torsten.seemann at infotech.monash.edu.au Thu Nov 15 20:13:22 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 16 Nov 2007 12:13:22 +1100 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: > Just to add to this, does anyone have any code for reading .sff 'traces' > from 454 sequences? The .SFF files can be manipulated using the SFF tools which 454 distribute with their result data. eg. "sffinfo 454AllContigs.sff" will list all the reads with the original flowgram values etc. However, the SFF tools are i386.Linux binaries, so not really a portable solution. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From mvrmakam at yahoo.com Thu Nov 15 22:04:55 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST) Subject: [Bioperl-l] Problem with installing bioperl on Windows Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com> Hi, I have installed Perl Package Manager ver 5.8.8.822 on windows XP. I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View. However, I am not able to see any packages in the view box. Can anyone help me in this matter. Roshan ____________________________________________________________________________________ Get easy, one-click access to your favorites. Make Yahoo! your homepage. http://www.yahoo.com/r/hs From David.Messina at sbc.su.se Fri Nov 16 03:33:04 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 16 Nov 2007 09:33:04 +0100 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com> > If this is not possible, do you know if drawing an scf is in the > works? Thanks. > One non-BioPerl solution is 4peaks: http://mekentosj.com/4peaks/ Mac only, but really great software. I'm also a fan of their Papers journal article PDF library program. Dave From neetisomaiya at gmail.com Mon Nov 19 01:11:49 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 19 Nov 2007 11:41:49 +0530 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Hi, I am using Bio::SeqIO for parsing KEGG gene ent files. A part of my code is foreach my $key ( $ac->get_all_annotation_keys() ) { if($key eq "dblink") { my %values = $ac->get_Annotations($key); foreach my $value ( keys(%values )) { print "\n*****VALUE $value*****\n"; } } } Here not all dblinks present in the actual file get parsed. For eg, in the data below, ENTRY 116064 CDS H.sapiens NAME LRRC58 DEFINITION leucine rich repeat containing 58 POSITION 3q13.33 MOTIF Pfam: SdiA-regulated LRR_1 PROSITE: LEU_RICH DBLINKS NCBI-GI: 153792305 NCBI-GeneID: 116064 HGNC: 26968 Ensembl: ENSG00000163428 UniProt: Q96CX6 Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE, but doesnt give me HGNC and UniProt. For other entries it gives me other combinations of dbs. Can anyone help me with this. Why is this happenning? I have no clue. Thanks and Regards, Neeti. -- -Neeti Even my blood says, B positive From johnston at biochem.ucl.ac.uk Mon Nov 19 06:44:59 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT) Subject: [Bioperl-l] blast database names Message-ID: Hello, Is there a list of the possible database names for -data => $dbname in RemoteBlast somwhere? Cheers, Cass From cjfields at uiuc.edu Mon Nov 19 08:44:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 07:44:46 -0600 Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: Here's a recent list (don't know if it's up-to-date): http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html chris On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote: > Hello, > > Is there a list of the possible database names for -data => > $dbname in RemoteBlast somwhere? > > Cheers, > Cass > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Nov 19 09:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 08:33:46 -0600 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Message-ID: It makes sense in the light that you're (erroneously) using a hash: my %values = $ac->get_Annotations($key); This assigns key-value pairs of DBLink => DBLink; you don't see an error b/c the number of links happens to be even (I get 8) but you would if the number of links returned is odd (missing value for key error or something along those lines). So when you call: foreach my $value (keys(%values)) {....} you only get half of the DBLinks. You should use an array: my @values = $ac->get_Annotations($key); foreach my $value (@values) { print $value->as_text,"\n"; } Note the loop change; Bio::Annotation are no longer operator overloaded so your print statement wouldn't work in a bioperl 1.6 world. chris On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote: > Hi, > > I am using Bio::SeqIO for parsing KEGG gene ent files. > > A part of my code is > > foreach my $key ( $ac->get_all_annotation_keys() ) > { > if($key eq "dblink") > { > my %values = > $ac->get_Annotations($key); > foreach my $value ( > keys(%values )) > { > print > "\n*****VALUE > $value*****\n"; > } > } > } > > Here not all dblinks present in the actual file get parsed. For eg, > in the > data below, > ENTRY 116064 CDS H.sapiens > NAME LRRC58 > DEFINITION leucine rich repeat containing 58 > POSITION 3q13.33 > MOTIF Pfam: SdiA-regulated LRR_1 > PROSITE: LEU_RICH > DBLINKS NCBI-GI: 153792305 > NCBI-GeneID: 116064 > HGNC: 26968 > Ensembl: ENSG00000163428 > UniProt: Q96CX6 > > Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and > PROSITE, > but doesnt give me HGNC and UniProt. For other entries it gives me > other > combinations of dbs. > > Can anyone help me with this. Why is this happenning? I have no clue. > > Thanks and Regards, > Neeti. > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From akarger at CGR.Harvard.edu Mon Nov 19 10:38:26 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 19 Nov 2007 10:38:26 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> References: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Message-ID: > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 13, 2007 12:42 PM > To: Amir Karger > Cc: Steve Chervitz; Dave Messina; bioperl-l > Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? > > Amir, > > Can you file this as a bug? Done. http://bugzilla.open-bio.org/show_bug.cgi?id=2399 > Dave mentioned he would look > into it but > I think it warrants tracking to make sure it gets fixed: > > http://www.bioperl.org/wiki/Bugs > > Attach the example BLAST report from your last post to the report. > BTW, I wonder how this appears in XML output? > > chris > > On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: > > >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > >> Of Steve Chervitz > >> > >> The Bioperl blast parser should extract that value and you > can obtain > >> it from an HSP object, via the HSPI::n() method, documented here: > >> > >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > > io/Search/HSP/HSPI.html#POD23 > > > > As I mentioned in my email: > > > > And does anyone know off-hand if Bioperl will tell me when > situations > > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > > subroutine > > would help, but I just get a bunch of empty strings for that, > > whether or > > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > > {"_n"} is > > undef.) > > > > And the docs for n() actually say, "This value is not defined with > > NCBI > > Blast2 with gapping" although they don't say why. Which may > explain > > why, > > when I ran the following code on the blast result I included in my > > last > > email, I got empty values for all of the n's. (Why is n() > undefined > > for > > gapped blast if I'm getting n's in my results from that blast?) > > > > use warnings; > > use strict; > > use Bio::SearchIO; > > > > my $blast_out = $ARGV[0]; > > my $in = new Bio::SearchIO(-format => 'blast', > > -file => $blast_out, > > -report_type => 'tblastn'); > > > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart > Send Frame N > > Evalue)), "\n"; > > while(my $query = $in->next_result) { > > while(my $subject = $query->next_hit) { > > while (my $hsp = $subject->next_hsp) { > > print join("\t", > > $query->query_name, > > $hsp->start("query"), > > $hsp->end("query"), > > $hsp->strand("hit"), > > $subject->name, > > $hsp->start("hit"), > > $hsp->end("hit"), > > $subject->frame, > > $hsp->n, > > $hsp->evalue, > > ),"\n"; > > } > > } > > } > > > >> Dave's basically correct in his explanation. It's a result of the > >> application of sum statistics by the blast algorithm. You > can read > >> all > >> about it in Korf et al's BLAST book. Here's the relevant section: > > > > [snip] > > > > Thanks, > > > > -Amir > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From aaron.j.mackey at gsk.com Mon Nov 19 11:50:53 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 19 Nov 2007 11:50:53 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: Message-ID: While Lucia's subject line asked for genbank2gff, her message actually asked the reverse (gff + fasta -> genbank). e.g. pretend you had to prepare a genome annotation for submission to GenBank ... and no, I don't know of any generalized gff2genbank script out there ... Lucia, the SeqIO::genbank module will write GenBank format, but you have to get all the bits and bobs together in the right way, i.e. construct the various AnnotationCollections and SeqFeatures (with SplitLocations for exons, CDS, etc.) that a GenBank record expects. One way to do this is to start with a template GenBank file that you'd like to mimic, strip it down to only two gene models, use SeqIO::genbank to read it into memory, and then step through the object with the Perl debugger to see how it is composed. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM: > Chris, > > There's also a genbank2gff3.PLS script in the GMOD package ( > http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? > revision=1.5&view=markup). However, it has not been modified for a couple of > years, it may not be the "preferred" script. > > See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and > http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information > on using Bioperl's bp_genbank2gff3 script. > > Brian O. > > > On 11/15/07 1:43 PM, "Chris Fields" wrote: > > > There are currently many ways to get what you want, but not all are > > consistent (particularly re: GFF3). We are aiming for more > > consistent, compliant GFF/GTF output in the next developer series > > (1.7) of Bioperl. > > > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > > scripts directory); these are probably the most common way when > > working directly from a seq record. Bio::Tools::GFF is the most > > commonly used class though I'm unsure of it's status for GFF3 > > output. From within a Bio::SeqI you can call write_gff() (currently > > not very flexible) or from the SeqFeature itself gff_string(). > > Bio::Graphics::Feature has the additional method gff3_string(). > > Bio::FeatureIO is also an option, though I would consider it very > > experimental (it will likely undergo significant revision in the next > > bioperl dev series). > > > > Any others anyone can think of, maybe non-BioPerl related as well? > > > > chris > > > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > > > >> Hi > >> I was asked this question recently > >> and it occurred to me I must be doing things inefficiently > >> To produce gff file I was using SeqIO to parse the required fields, > >> then > >> according to the conventions just printing out whatever was > >> required tab > >> delimited, which is easy > >> > >> but if I wanted to generate a genbank file, extracting features > >> from a gff file > >> and a plain fasta file it was more complicated > >> is there support for gff in bioperl now? > >> anyone can contribute with smart way to go from/to gff, genebank > >> and embl? > >> > >> thanks very much > >> > >> Lucia Peixoto > >> Department of Biology,SAS > >> University of Pennsylvania > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From johnston at biochem.ucl.ac.uk Mon Nov 19 09:46:03 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT) Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: On Mon, 19 Nov 2007, Chris Fields wrote: > Here's a recent list (don't know if it's up-to-date): > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html Thanks. Perhaps I missed something in the docs, but I don't think I've quite understood how this is supposed to work. I'm trying to blast primer sequences against the ref genome sequence. Should I be using ref_contig? How can I limit the blast to a single species? cheers, Cass. From Kevin.M.Brown at asu.edu Mon Nov 19 13:31:38 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 19 Nov 2007 11:31:38 -0700 Subject: [Bioperl-l] pSW vs dpAlign Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu> I was able to get the Ext package installed, just had to copy the Align.pm file up one directory from where it was being put by the installer. Now I have a technician trying to use pSW (Bio::Tools::pSW) and it appears to have been last updated back in '99 and seems to lack certain methods to get things out of the alignment like the score. The test.pl script that Bio::Ext comes with actually uses Bio::Tools::dpAlign. Is dpAlign the replacement for pSW? From bernd.web at gmail.com Wed Nov 21 11:42:40 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 17:42:40 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Hi Russell, I came across your question. At first I thought all was well on my system, but indeed I also have these colouring problems. I noted that scrore in the bgcolor callback gets a different value! Printing score during hit parsing($hit->raw_score) gives the same score as -description my $score = $feature->score; However, printing score in the bgcolor sub gives 2573! All scores in the bgcolor routine all different and higher than the real scores. Were you able to solve this colouring issue? Regards, Bernd > Hi all, > I'm using a modified version of Lincoln's tutorial > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > to give a similar image to that from NCBI but for some reason, my > colours are coming out wrong (see attached example) > They seem to be off by one but I can't see why. > > Any ideas? > > I can't be certain but I think it's only started doing this since our > BLAST upgrade to 2.2.17 a few weeks ago. > > Here's the colouring code: > ------------------------------------------------------------------------ > ------- > my $track = $panel->add_track( > -glyph => 'segments', > -label => 1, > -connector => 'dashed', > -bgcolor => sub { > my $feature = shift; > my $score = $feature->score; > return 'red' if $score >= 200; > return 'fuchsia' if $score >= 80; > return 'lime' if $score >= 50; > return 'blue' if $score >= 40; > return 'black'; > }, > -font2color => 'gray', > -sort_order => 'high_score', > -description => sub { > my $feature = shift; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my $score = $feature->score; > "$description, score=$score"; > }, > ); > ------------------------------------------------------------------------ > --------- > > > Thanx, > > Russell Smithies > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Wed Nov 21 12:38:30 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 18:38:30 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Hi, I now found that bgcolor is using a $feature->score that is coming directly from the blast report, it is not the bit score. -bgcolor => sub {my $feature = shift; my $score = $feature->score; print "$score\n"; } always print the score, even if the score is not set in the Bio::SeqFeature::Generic object. -description callbacks are somehow using the score from the SeqFeature object. Does anyone have an idea why? Further is is possible to get the raw_score of a hit. $hit->raw_score actually gets the bitscore (w/o decimal point). Bernd On Nov 21, 2007 5:42 PM, Bernd Web wrote: > Hi Russell, > > I came across your question. At first I thought all was well on my > system, but indeed I also have these colouring problems. > I noted that scrore in the bgcolor callback gets a different value! > Printing score during hit parsing($hit->raw_score) gives the same > score as -description > my $score = $feature->score; However, printing score in the bgcolor > sub gives 2573! > All scores in the bgcolor routine all different and higher than the > real scores. Were you able to solve this colouring issue? > > Regards, > Bernd > > > > Hi all, > > I'm using a modified version of Lincoln's tutorial > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > to give a similar image to that from NCBI but for some reason, my > > colours are coming out wrong (see attached example) > > They seem to be off by one but I can't see why. > > > > Any ideas? > > > > I can't be certain but I think it's only started doing this since our > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > Here's the colouring code: > > ------------------------------------------------------------------------ > > ------- > > my $track = $panel->add_track( > > -glyph => 'segments', > > -label => 1, > > -connector => 'dashed', > > -bgcolor => sub { > > my $feature = shift; > > my $score = $feature->score; > > return 'red' if $score >= 200; > > return 'fuchsia' if $score >= 80; > > return 'lime' if $score >= 50; > > return 'blue' if $score >= 40; > > return 'black'; > > }, > > -font2color => 'gray', > > -sort_order => 'high_score', > > -description => sub { > > my $feature = shift; > > return unless > > $feature->has_tag('description'); > > my ($description) = > > $feature->each_tag_value('description'); > > my $score = $feature->score; > > "$description, score=$score"; > > }, > > ); > > ------------------------------------------------------------------------ > > --------- > > > > > > Thanx, > > > > Russell Smithies > > > > > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From sac at bioperl.org Wed Nov 21 13:43:54 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 21 Nov 2007 10:43:54 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> On Nov 21, 2007 9:38 AM, Bernd Web wrote: > [snip] > > Further is is possible to get the raw_score of a hit. $hit->raw_score > actually gets the bitscore (w/o decimal point). Hmmm. raw_score should not be the same as bit score. So given an example blast hit line such as: Score = 60.0 bits (30), Expect = 1e-06 $hit->raw_score() should return 30, not 60, as you seem to be getting. Could you submit a bug report for this? http://www.bioperl.org/wiki/Bugs Thanks, Steve > > On Nov 21, 2007 5:42 PM, Bernd Web wrote: > > Hi Russell, > > > > I came across your question. At first I thought all was well on my > > system, but indeed I also have these colouring problems. > > I noted that scrore in the bgcolor callback gets a different value! > > Printing score during hit parsing($hit->raw_score) gives the same > > score as -description > > my $score = $feature->score; However, printing score in the bgcolor > > sub gives 2573! > > All scores in the bgcolor routine all different and higher than the > > real scores. Were you able to solve this colouring issue? > > > > Regards, > > Bernd > > > > > > > Hi all, > > > I'm using a modified version of Lincoln's tutorial > > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > > to give a similar image to that from NCBI but for some reason, my > > > colours are coming out wrong (see attached example) > > > They seem to be off by one but I can't see why. > > > > > > Any ideas? > > > > > > I can't be certain but I think it's only started doing this since our > > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > > > Here's the colouring code: > > > ------------------------------------------------------------------------ > > > ------- > > > my $track = $panel->add_track( > > > -glyph => 'segments', > > > -label => 1, > > > -connector => 'dashed', > > > -bgcolor => sub { > > > my $feature = shift; > > > my $score = $feature->score; > > > return 'red' if $score >= 200; > > > return 'fuchsia' if $score >= 80; > > > return 'lime' if $score >= 50; > > > return 'blue' if $score >= 40; > > > return 'black'; > > > }, > > > -font2color => 'gray', > > > -sort_order => 'high_score', > > > -description => sub { > > > my $feature = shift; > > > return unless > > > $feature->has_tag('description'); > > > my ($description) = > > > $feature->each_tag_value('description'); > > > my $score = $feature->score; > > > "$description, score=$score"; > > > }, > > > ); > > > ------------------------------------------------------------------------ > > > --------- > > > > > > > > > Thanx, > > > > > > Russell Smithies > > > > > > > > > > > > > > > ======================================================================= > > > Attention: The information contained in this message and/or attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or privileged > > > material. Any review, retransmission, dissemination or other use of, or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From binkley at genome.stanford.edu Wed Nov 21 19:35:02 2007 From: binkley at genome.stanford.edu (Jonathan Binkley) Date: Wed, 21 Nov 2007 16:35:02 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Hi, I installed bioperl on a Mac (OS 10.4, Intel) via fink, which put it here: /sw/lib/perl5/5.8.6/Bio/ It seems to work fine, but I need bioperl-ext for Smith-Waterman alignments. So, into which directory should I download bioperl-ext and run the Makefile? Thanks. From dcj at sanger.ac.uk Thu Nov 22 09:47:09 2007 From: dcj at sanger.ac.uk (Daniel Jeffares) Date: Thu, 22 Nov 2007 14:47:09 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml Message-ID: Hi all, Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to be a little 'broken', at least in my hands. First, $bml->set_parameter('runmode', 0); does not work (sets runmode to -2). setting runmode to 1 is OK. Also, $bml->no_param_checks(1); doesn't seem to work. The result is that the baseml.ctl file created under /tmp is not runnable by baseml with runmode 0. The phylip file created is run OK by baeml(with another .ctl file). My script & baseml.ctl below. Hope it can be fixed, cheers, Dan #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; my $alignio = Bio::AlignIO->new(-format => 'phylip',-file => 'test.phy'); my $aln = $alignio->next_aln; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; The baseml.ctl file produced: seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA outfile = mlb fix_rho = 1 verbose = 0 noisy = 0 RateAncestor = 1 kappa = 2.5 model = 0 ndata = 5 Small_Diff = 1e-6 runmode = -2 alpha = 0 fix_kappa = 0 rho = 0 nhomo = 0 getSE = 0 cleandata = 1 fix_alpha = 1 clock = 0 Malpha = 0 ncatG = 5 fix_blength = -1 nparK = 0 Regards, Daniel Jeffares ______________________________ Population and Comparative Genomics Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK Phone: +44(0)1223 834244 x 7297 Fax: +44 (0)1223 494919 www.sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Nov 22 11:06:16 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 22 Nov 2007 17:06:16 +0100 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: References: Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Daniel, I don't have bioperl-run or PAML installed on my system to test it myself, but have you tried the latest version of bioperl-run from CVS? It looks like that code has been worked on since 1.5.2 was released. If that still doesn't work, could you file this as a bug to make sure it gets followed up? Dave You can grab the tarball here: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl and if necessary file the bug here: BioPerl Bugzilla tracking system From arareko at campus.iztacala.unam.mx Thu Nov 22 11:37:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 22 Nov 2007 10:37:24 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> Message-ID: <4745B044.5090102@campus.iztacala.unam.mx> Hi Peter, In BioPerl, there's no such mapping for db_xref's that I'm aware of. Each parser handles db_xref records on its own. Take a look at the Bio::SeqIO::genbank code, inside the next_seq() method for example: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup Regards, Mauricio. Peter wrote: > Dear all, > > I'm one of the Biopython developers. I've recently got going with > BioSQL and have been getting to grips with the Biopython BioSQL > interface. I'm aware that we need to try and be consistent with > BioPerl and BioJava, so I'd like to pose my first question related to > that. > > When loading GenBank records, many features have db_xref qualifiers, > e.g. from a random CDS feature in E. coli K12: > > /db_xref="ASAP:1309" > /db_xref="GI:16128366" > /db_xref="ECOCYC:EG10213" > /db_xref="GeneID:945313" > > Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", > "GeneID" before using recording these entries in the seqfeature_dbxref > and dbxref tables. For example, "GI" becomes "GeneIndex". > Biopython's current mapping is as follows: > > # Dictionary of database types, keyed by GenBank db_xref abbreviation > db_dict = {'GeneID': 'Entrez', > 'GI': 'GeneIndex', > 'COG': 'COG', > 'CDD': 'CDD', > 'DDBJ': 'DNA Databank of Japan', > 'Entrez': 'Entrez', > 'GeneIndex': 'GeneIndex', > 'PUBMED': 'PubMed', > 'taxon': 'Taxon', > 'ATCC': 'ATCC', > 'ISFinder': 'ISFinder', > 'GOA': 'Gene Ontology Annotation', > 'ASAP': 'ASAP', > 'PSEUDO': 'PSEUDO', > 'InterPro': 'InterPro', > 'GEO': 'Gene Expression Omnibus', > 'EMBL': 'EMBL', > 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', > 'ECOCYC': 'EcoCyc', > 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' > } > > In my testing, I've found several GenBank db_xref abbreviation for > which we don't have a mapping defined, such as "LocusID", "dbSNP", > "MGD", "MIM", or from an EMBL file, "REMTREMBL". > > I'd like to know if BioPerl and/or BioJava and/or BioRuby define a > similar mapping in their BioSQL code (or GenBank parser), so that > Biopython can follow your example. > > Thank you, > > Peter > > P.S. See also Biopython bug 2405 > http://bugzilla.open-bio.org/show_bug.cgi?id=2405 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From avilella at gmail.com Thu Nov 22 16:55:10 2007 From: avilella at gmail.com (Albert Vilella) Date: Thu, 22 Nov 2007 21:55:10 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Hi, Am I right in thinking that the '_symbols' hash in SimpleAlign is only used if one calls the symbol_chars method? When I comment out this line: map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if $seq->seq; # line 257 I get a nice speed boost on loading alignments. Can I comment this line out in the CVS HEAD? Cheers, Albert. [init] 5.96046447753906e-06 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.0022270679473877 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 2.14348912239075 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 6.91910791397095 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 15.8402290344238 secs... avilella at magneto:~$ perl /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl -dir /home/avilella/ensembl/exoseq/test -verbose [init] 1.21593475341797e-05 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.00294303894042969 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 0.510555982589722 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 1.6192569732666 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 3.86473417282104 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta] 6.99602198600769 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta] 7.26704716682434 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta] 8.44332504272461 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta] 12.103296995163 secs... From cjfields at uiuc.edu Thu Nov 22 19:30:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:30:51 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu> How are tests affected? It might be worth going through the revision history to see if there was a specific reason this was implemented, but if it passes tests I don't see why we need it. chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 22 19:42:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:42:12 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> <4745B044.5090102@campus.iztacala.unam.mx> Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu> I think SeqIO checks the name for parsing reasons only, in cases where the format changes based on the source (such as GenPept DBSOURCE data). I don't think we go beyond that in Bioperl, probably b/c modifying or expanding names for data persistence would lead to volatile coding issues (i.e. consistency between parsers, constant updating to cover new crossrefs, etc). I would definitely suggest retaining the original DB as it appears in the dbxref for consistency/sanity; if needed return expanded names using a different method if they are designated. chris On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote: > Hi Peter, > > In BioPerl, there's no such mapping for db_xref's that I'm aware of. > Each parser handles db_xref records on its own. Take a look at the > Bio::SeqIO::genbank code, inside the next_seq() method for example: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup > > Regards, > Mauricio. > > Peter wrote: >> Dear all, >> >> I'm one of the Biopython developers. I've recently got going with >> BioSQL and have been getting to grips with the Biopython BioSQL >> interface. I'm aware that we need to try and be consistent with >> BioPerl and BioJava, so I'd like to pose my first question related to >> that. >> >> When loading GenBank records, many features have db_xref qualifiers, >> e.g. from a random CDS feature in E. coli K12: >> >> /db_xref="ASAP:1309" >> /db_xref="GI:16128366" >> /db_xref="ECOCYC:EG10213" >> /db_xref="GeneID:945313" >> >> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", >> "GeneID" before using recording these entries in the >> seqfeature_dbxref >> and dbxref tables. For example, "GI" becomes "GeneIndex". >> Biopython's current mapping is as follows: >> >> # Dictionary of database types, keyed by GenBank db_xref abbreviation >> db_dict = {'GeneID': 'Entrez', >> 'GI': 'GeneIndex', >> 'COG': 'COG', >> 'CDD': 'CDD', >> 'DDBJ': 'DNA Databank of Japan', >> 'Entrez': 'Entrez', >> 'GeneIndex': 'GeneIndex', >> 'PUBMED': 'PubMed', >> 'taxon': 'Taxon', >> 'ATCC': 'ATCC', >> 'ISFinder': 'ISFinder', >> 'GOA': 'Gene Ontology Annotation', >> 'ASAP': 'ASAP', >> 'PSEUDO': 'PSEUDO', >> 'InterPro': 'InterPro', >> 'GEO': 'Gene Expression Omnibus', >> 'EMBL': 'EMBL', >> 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', >> 'ECOCYC': 'EcoCyc', >> 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' >> } >> >> In my testing, I've found several GenBank db_xref abbreviation for >> which we don't have a mapping defined, such as "LocusID", "dbSNP", >> "MGD", "MIM", or from an EMBL file, "REMTREMBL". >> >> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a >> similar mapping in their BioSQL code (or GenBank parser), so that >> Biopython can follow your example. >> >> Thank you, >> >> Peter >> >> P.S. See also Biopython bug 2405 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2405 >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 22 19:49:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:49:15 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Albert, Found it: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ SimpleAlign.pm.diff?r1=1.36&r2=1.37 If it slows performance that dramatically, maybe we can move this to a separate AlignUtils method instead. Maybe something to ask Jason about? chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 23 07:29:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Nov 2007 12:29:37 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Message-ID: <4746C7B1.1010002@sendu.me.uk> Dave Messina wrote: > Daniel, > > I don't have bioperl-run or PAML installed on my system to test it myself, > but have you tried the latest version of bioperl-run from CVS? It looks like > that code has been worked on since 1.5.2 was released. Yes, I fixed it in CVS so it should at least /run/. I don't know about the parsing side of things, though that may also have been fixed recently by someone else. From avilella at gmail.com Fri Nov 23 08:08:59 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Nov 2007 13:08:59 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <4746C7B1.1010002@sendu.me.uk> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Just to mention that the new paml4 has a "basemlg" instead of a "baseml" binary. AFAIK, Jason fixed codeml to make it work both for paml3.xx a paml4, but I am not sure about baseml. Also, I think if you set runmode 0, you have to provide a tree: #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy'); my $treeio = Bio::TreeIO->new(-format => 'newick', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree'); my $aln = $alignio->next_aln; my $tree = $treeio->next_tree; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->tree($tree); $bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml"); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while ( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); $DB::single=1;1; # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; 4 50 Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC- Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC ACAUUUU-CCUUGCAAAG ACAUCAU-CCUUGCAAAG ACAUCAUCCCUCGCAGAG ACAUCAUCCCUUGCAGAG (((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm); On Nov 23, 2007 12:29 PM, Sendu Bala wrote: > Dave Messina wrote: > > Daniel, > > > > I don't have bioperl-run or PAML installed on my system to test it myself, > > but have you tried the latest version of bioperl-run from CVS? It looks like > > that code has been worked on since 1.5.2 was released. > > Yes, I fixed it in CVS so it should at least /run/. I don't know about > the parsing side of things, though that may also have been fixed > recently by someone else. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Fri Nov 23 11:24:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 10:24:59 -0600 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu> I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just 'basemlg'), so it would need to work with both. Do we want to put a PAML parser/wrapper overhaul on the TODO list for 1.6? chris On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote: > Just to mention that the new paml4 has a "basemlg" instead of a > "baseml" binary. AFAIK, Jason fixed codeml to make it work both for > paml3.xx a paml4, but I am not sure about baseml. ... From arvindvanam at gmail.com Fri Nov 23 16:26:06 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl Message-ID: <13918981.post@talk.nabble.com> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); my $rnafold = $factory->program('rnafold'); my $job=$rnafold->run(-rnafold => 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); I installed Vienna package and then i tried using Pise to create an object for the program but its giving the following error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bio::Tools::Run::PiseJob terminated: URL missing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::PiseJob::terminated /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 STACK: Bio::Tools::Run::PiseApplication::submit /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 STACK: Bio::Tools::Run::PiseApplication::run /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 STACK: evaluate.pl:12 how to make the program RNAfold run in perl... IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? plz reply soon -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Fri Nov 23 17:49:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 16:49:43 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13918981.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> The Pise wrappers run the programs remotely; see Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ mfold wrappers but haven't done so yet. The Vienna tools do have a Perl-based (non-BioPerl-based) module included which uses libRNA, and is well worth a look. Try 'perldoc RNA' if you have installed the tools locally, or look here for other Perl-based tools: http://www.tbi.univie.ac.at/~ivo/RNA/utils.html chris On Nov 23, 2007, at 3:26 PM, vanam wrote: > > how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? > > my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); > my $rnafold = $factory->program('rnafold'); > my $job=$rnafold->run(-rnafold => > 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); > > I installed Vienna package and then i tried using Pise to create an > object > for the program but its giving the following error > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bio::Tools::Run::PiseJob terminated: URL missing > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::PiseJob::terminated > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 > STACK: Bio::Tools::Run::PiseApplication::submit > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 > STACK: Bio::Tools::Run::PiseApplication::run > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 > STACK: evaluate.pl:12 > > > how to make the program RNAfold run in perl... > IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? > > plz reply soon > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13918981 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Sat Nov 24 02:29:11 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> Message-ID: <13922740.post@talk.nabble.com> i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and i used it exactly as it was mentioned in it. i just want that instead of running its perl version "RNAfold.pl" I can use the functions associated with RNAfold with a perl program without having to call the program using system() command. if you can just tell me how to use these wrapper modules it would b of gr8 help...like while using clustalw or clustalx we define the environment variable for it ..do we have to do the same for RNAfold or Mfold Chris Fields wrote: > > The Pise wrappers run the programs remotely; see > Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a > local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ > mfold wrappers but haven't done so yet. The Vienna tools do have a > Perl-based (non-BioPerl-based) module included which uses libRNA, and > is well worth a look. Try 'perldoc RNA' if you have installed the > tools locally, or look here for other Perl-based tools: > > http://www.tbi.univie.ac.at/~ivo/RNA/utils.html > > chris > > On Nov 23, 2007, at 3:26 PM, vanam wrote: > >> >> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >> >> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >> my $rnafold = $factory->program('rnafold'); >> my $job=$rnafold->run(-rnafold => >> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >> >> I installed Vienna package and then i tried using Pise to create an >> object >> for the program but its giving the following error >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::PiseJob::terminated >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >> STACK: Bio::Tools::Run::PiseApplication::submit >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >> STACK: Bio::Tools::Run::PiseApplication::run >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >> STACK: evaluate.pl:12 >> >> >> how to make the program RNAfold run in perl... >> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >> >> plz reply soon >> -- >> View this message in context: http://www.nabble.com/run-RNAfold-in- >> perl-tf4863835.html#a13918981 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From avilella at gmail.com Sun Nov 25 06:50:42 2007 From: avilella at gmail.com (Albert Vilella) Date: Sun, 25 Nov 2007 11:50:42 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> cvs commited now. it is calculated anyway when calling symbol_chars so... On Nov 23, 2007 12:49 AM, Chris Fields wrote: > Albert, > > Found it: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > SimpleAlign.pm.diff?r1=1.36&r2=1.37 > > If it slows performance that dramatically, maybe we can move this to > a separate AlignUtils method instead. Maybe something to ask Jason > about? > > chris > > On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > > > > Hi, > > > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > > used if one calls the symbol_chars method? > > > > When I comment out this line: > > > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > > $seq->seq; # line 257 > > > > I get a nice speed boost on loading alignments. > > > > Can I comment this line out in the CVS HEAD? > > > > Cheers, > > > > Albert. > > > > [init] 5.96046447753906e-06 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.0022270679473877 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 2.14348912239075 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 6.91910791397095 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 15.8402290344238 secs... > > > > avilella at magneto:~$ perl > > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > > ancestral_alleles.pl > > -dir /home/avilella/ensembl/exoseq/test -verbose > > [init] 1.21593475341797e-05 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.00294303894042969 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 0.510555982589722 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 1.6192569732666 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 3.86473417282104 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000203717.chr1.fasta] > > 6.99602198600769 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000196188.chr1.fasta] > > 7.26704716682434 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000025800.chr1.fasta] > > 8.44332504272461 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000117475.chr1.fasta] > > 12.103296995163 secs... > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From cjfields at uiuc.edu Sun Nov 25 10:05:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:05:27 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13922740.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Again, these wrappers are for submitting data to a Pise server for the corresponding programs (run on a remote server). There are no wrappers for running RNAfold on your computer (i.e. locally), with or w/o a set env. variable. You can try instaling Pise locally and setting the location() as shown in POD to localhost, however I don't know how stable these modules are with newer versions of Pise. These haven't been updated in a few years, apart from getting tests to work. Another option is installing EMBOSS along with the EMBASSY version of RNAFold; this could conceivably be run through Bio::Factory::EMBOSS. chris On Nov 24, 2007, at 1:29 AM, vanam wrote: > > i have seen the documentation for > Bio::Tools::Run::AnalysisFactory::Pise and > i used it exactly as it was mentioned in it. > > i just want that instead of running its perl version "RNAfold.pl" I > can use > the functions associated with RNAfold with a perl program without > having to > call the program using system() command. > > if you can just tell me how to use these wrapper modules it would b > of gr8 > help...like while using clustalw or clustalx we define the environment > variable for it ..do we have to do the same for RNAfold or Mfold > > > > > Chris Fields wrote: >> >> The Pise wrappers run the programs remotely; see >> Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a >> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ >> mfold wrappers but haven't done so yet. The Vienna tools do have a >> Perl-based (non-BioPerl-based) module included which uses libRNA, and >> is well worth a look. Try 'perldoc RNA' if you have installed the >> tools locally, or look here for other Perl-based tools: >> >> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html >> >> chris >> >> On Nov 23, 2007, at 3:26 PM, vanam wrote: >> >>> >>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >>> >>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >>> my $rnafold = $factory->program('rnafold'); >>> my $job=$rnafold->run(-rnafold => >>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >>> >>> I installed Vienna package and then i tried using Pise to create an >>> object >>> for the program but its giving the following error >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >>> STACK: Bio::Tools::Run::PiseJob::terminated >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >>> STACK: Bio::Tools::Run::PiseApplication::submit >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >>> STACK: Bio::Tools::Run::PiseApplication::run >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >>> STACK: evaluate.pl:12 >>> >>> >>> how to make the program RNAfold run in perl... >>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >>> >>> plz reply soon >>> -- >>> View this message in context: http://www.nabble.com/run-RNAfold-in- >>> perl-tf4863835.html#a13918981 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13922740 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 10:38:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:38:40 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: Albert, I was getting a single AlignIO.t fail which appeared to be related to this: ... ok 122 - The object isa Bio::Align::AlignI ok 123 - consensus_string on metafasta not ok 124 - symbol_chars() using metafasta # Failed test 'symbol_chars() using metafasta' # in t/AlignIO.t at line 346. # got: '0' # expected: '23' It was b/c the symbol hash was initialized in the constructor (so it was present, just empty). I have changed that in CVS; all tests pass now. chris On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > cvs commited now. it is calculated anyway when calling symbol_chars > so... > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: >> Albert, >> >> Found it: >> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >> Bio/ >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >> >> If it slows performance that dramatically, maybe we can move this to >> a separate AlignUtils method instead. Maybe something to ask Jason >> about? >> >> chris >> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >> >> >>> Hi, >>> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>> only >>> used if one calls the symbol_chars method? >>> >>> When I comment out this line: >>> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>> $seq->seq; # line 257 >>> >>> I get a nice speed boost on loading alignments. >>> >>> Can I comment this line out in the CVS HEAD? >>> >>> Cheers, >>> >>> Albert. >>> >>> [init] 5.96046447753906e-06 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.0022270679473877 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 2.14348912239075 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 6.91910791397095 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 15.8402290344238 secs... >>> >>> avilella at magneto:~$ perl >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>> ancestral_alleles.pl >>> -dir /home/avilella/ensembl/exoseq/test -verbose >>> [init] 1.21593475341797e-05 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.00294303894042969 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 0.510555982589722 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 1.6192569732666 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 3.86473417282104 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000203717.chr1.fasta] >>> 6.99602198600769 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000196188.chr1.fasta] >>> 7.26704716682434 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000025800.chr1.fasta] >>> 8.44332504272461 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000117475.chr1.fasta] >>> 12.103296995163 secs... >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd.web at gmail.com Sun Nov 25 11:13:44 2007 From: bernd.web at gmail.com (Bernd Web) Date: Sun, 25 Nov 2007 17:13:44 +0100 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Hi, I am not sure if this is related, but I remember SimpleAlign was adapted to cope with more gap symbols that can occur in alignments/FastA sequences, as: . _ - = Previous versions would throw an error on 'illegal' gap characters, Regards, Bernd On Nov 25, 2007 4:38 PM, Chris Fields wrote: > Albert, > > I was getting a single AlignIO.t fail which appeared to be related to > this: > > ... > ok 122 - The object isa Bio::Align::AlignI > ok 123 - consensus_string on metafasta > > not ok 124 - symbol_chars() using metafasta > # Failed test 'symbol_chars() using metafasta' > # in t/AlignIO.t at line 346. > # got: '0' > # expected: '23' > > It was b/c the symbol hash was initialized in the constructor (so it > was present, just empty). I have changed that in CVS; all tests pass > now. > > chris > > > On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > > > cvs commited now. it is calculated anyway when calling symbol_chars > > so... > > > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: > >> Albert, > >> > >> Found it: > >> > >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > >> Bio/ > >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 > >> > >> If it slows performance that dramatically, maybe we can move this to > >> a separate AlignUtils method instead. Maybe something to ask Jason > >> about? > >> > >> chris > >> > >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > >> > >> > >>> Hi, > >>> > >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is > >>> only > >>> used if one calls the symbol_chars method? > >>> > >>> When I comment out this line: > >>> > >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > >>> $seq->seq; # line 257 > >>> > >>> I get a nice speed boost on loading alignments. > >>> > >>> Can I comment this line out in the CVS HEAD? > >>> > >>> Cheers, > >>> > >>> Albert. > >>> > >>> [init] 5.96046447753906e-06 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.0022270679473877 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 2.14348912239075 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 6.91910791397095 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 15.8402290344238 secs... > >>> > >>> avilella at magneto:~$ perl > >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > >>> ancestral_alleles.pl > >>> -dir /home/avilella/ensembl/exoseq/test -verbose > >>> [init] 1.21593475341797e-05 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.00294303894042969 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 0.510555982589722 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 1.6192569732666 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 3.86473417282104 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000203717.chr1.fasta] > >>> 6.99602198600769 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000196188.chr1.fasta] > >>> 7.26704716682434 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000025800.chr1.fasta] > >>> 8.44332504272461 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000117475.chr1.fasta] > >>> 12.103296995163 secs... > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sun Nov 25 11:39:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 10:39:01 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Message-ID: Bernd, That would be when generating Bio::LocatableSeq instances for building a Bio::SimpleAlign object. Judging by test suite results that doesn't appear to be affected. chris On Nov 25, 2007, at 10:13 AM, Bernd Web wrote: > Hi, > > I am not sure if this is related, but I remember SimpleAlign was > adapted to cope with more gap symbols that can occur in > alignments/FastA sequences, as: . _ - = > Previous versions would throw an error on 'illegal' gap characters, > > Regards, > Bernd > > On Nov 25, 2007 4:38 PM, Chris Fields wrote: >> Albert, >> >> I was getting a single AlignIO.t fail which appeared to be related to >> this: >> >> ... >> ok 122 - The object isa Bio::Align::AlignI >> ok 123 - consensus_string on metafasta >> >> not ok 124 - symbol_chars() using metafasta >> # Failed test 'symbol_chars() using metafasta' >> # in t/AlignIO.t at line 346. >> # got: '0' >> # expected: '23' >> >> It was b/c the symbol hash was initialized in the constructor (so it >> was present, just empty). I have changed that in CVS; all tests pass >> now. >> >> chris >> >> >> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: >> >>> cvs commited now. it is calculated anyway when calling symbol_chars >>> so... >>> >>> On Nov 23, 2007 12:49 AM, Chris Fields wrote: >>>> Albert, >>>> >>>> Found it: >>>> >>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >>>> Bio/ >>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >>>> >>>> If it slows performance that dramatically, maybe we can move >>>> this to >>>> a separate AlignUtils method instead. Maybe something to ask Jason >>>> about? >>>> >>>> chris >>>> >>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>>>> only >>>>> used if one calls the symbol_chars method? >>>>> >>>>> When I comment out this line: >>>>> >>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>>>> $seq->seq; # line 257 >>>>> >>>>> I get a nice speed boost on loading alignments. >>>>> >>>>> Can I comment this line out in the CVS HEAD? >>>>> >>>>> Cheers, >>>>> >>>>> Albert. >>>>> >>>>> [init] 5.96046447753906e-06 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.0022270679473877 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 2.14348912239075 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 6.91910791397095 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 15.8402290344238 secs... >>>>> >>>>> avilella at magneto:~$ perl >>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>>>> ancestral_alleles.pl >>>>> -dir /home/avilella/ensembl/exoseq/test -verbose >>>>> [init] 1.21593475341797e-05 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.00294303894042969 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 0.510555982589722 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 1.6192569732666 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 3.86473417282104 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000203717.chr1.fasta] >>>>> 6.99602198600769 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000196188.chr1.fasta] >>>>> 7.26704716682434 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000025800.chr1.fasta] >>>>> 8.44332504272461 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000117475.chr1.fasta] >>>>> 12.103296995163 secs... >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 13:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 12:51:42 -0600 Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu> I have been making some significant changes to Bio::SeqIO::staden::read over the last few months which incorporate code from Bugzilla (bugs 2074 and 2329, very kindly donated from Chris Bailey and Joel Martin, cheers!). Significant Changes: * All Inline code in staden::read are now XS-based * A new method has been added to Bio::SeqIO::staden::read for optionally getting trace data (i.e. for drawing graphs). The method ode is now implemented in Bio::SeqIO::abi, with example code in examples/quality/svgtrace.pl. These changes should allow newer versions of Staden io_lib as well (the code is tested with io_lib 1.9.2), though they haven't been tested extensively as I am having problems compiling newer io_lib versions on my MacBook. It's very likely more changes will need to be made along the way; some issues were found with XS compilation which appear harmless but need to be investigated, and trace data from other formats need to be evaluated. The possibility exists that many of these changes break backward compatibility with older bioperl releases, though tests passed with bioperl 1.5.2. Any feedback re: platform issues, test results using newer io_lib versions, older bioperl-versions, etc would be greatly appreciated. I'm hoping this will stimulate more interest in getting other bioperl- ext modules up-to-date with bioperl-live. chris From cjfields at uiuc.edu Mon Nov 26 13:59:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 12:59:23 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: Steve, Bernd, (and Jason, since you may have some input on this as well), I am now looking into the bug Bernd submitted and it seems there is a serious discrepancy with the way the hit raw_score, bits, and significance is determined for Hit objects. Unless I am mistaken these should always come from the best HSP when they are present, falling back to the hit table data only when no HSP alignments are present. Under the latter conditions a minimal Hit object is made from data in the hit table, which reports the rounded bit score, not the raw score, so in those cases the raw score would be undefined (and you probably should get a nasty warning indicating there are no HSPs present to get the data from). What is occurring now, though, is the raw_score and significance is explicitly set from the hit table in the BLAST parser for the Hit object at all times, while the bits are correctly derived from the best HSP (no fallback to the hit table). Changing to the behavior above results in several tests failing via SearchIO.t, with each failed test reporting the expected (read:correct) raw score. I'll look through the tests just in case, but I am planning on committing changes to the BLAST parsers, GenericHit, and SearchIO.t (to reflect the correct expected data) in the next day or two unless there are any objections. chris On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > On Nov 21, 2007 9:38 AM, Bernd Web wrote: >> [snip] >> >> Further is is possible to get the raw_score of a hit. $hit->raw_score >> actually gets the bitscore (w/o decimal point). > > Hmmm. raw_score should not be the same as bit score. So given an > example blast hit line such as: > > Score = 60.0 bits (30), Expect = 1e-06 > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > Could you submit a bug report for this? http://www.bioperl.org/ > wiki/Bugs > > Thanks, > Steve > >> >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: >>> Hi Russell, >>> >>> I came across your question. At first I thought all was well on my >>> system, but indeed I also have these colouring problems. >>> I noted that scrore in the bgcolor callback gets a different value! >>> Printing score during hit parsing($hit->raw_score) gives the same >>> score as -description >>> my $score = $feature->score; However, printing score in the bgcolor >>> sub gives 2573! >>> All scores in the bgcolor routine all different and higher than the >>> real scores. Were you able to solve this colouring issue? >>> >>> Regards, >>> Bernd >>> >>> >>>> Hi all, >>>> I'm using a modified version of Lincoln's tutorial >>>> (http://www.bioperl.org/wiki/ >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) >>>> and I'm colouring the HSPs by setting the -bgcolor by score with >>>> a sub >>>> to give a similar image to that from NCBI but for some reason, my >>>> colours are coming out wrong (see attached example) >>>> They seem to be off by one but I can't see why. >>>> >>>> Any ideas? >>>> >>>> I can't be certain but I think it's only started doing this >>>> since our >>>> BLAST upgrade to 2.2.17 a few weeks ago. >>>> >>>> Here's the colouring code: >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------- >>>> my $track = $panel->add_track( >>>> -glyph => 'segments', >>>> -label => 1, >>>> -connector => 'dashed', >>>> -bgcolor => sub { >>>> my $feature = shift; >>>> my $score = $feature->score; >>>> return 'red' if $score >= 200; >>>> return 'fuchsia' if $score >>>> >= 80; >>>> return 'lime' if $score >>>> >= 50; >>>> return 'blue' if $score >= 40; >>>> return 'black'; >>>> }, >>>> -font2color => 'gray', >>>> -sort_order => 'high_score', >>>> -description => sub { >>>> my $feature = shift; >>>> return unless >>>> $feature->has_tag('description'); >>>> my ($description) = >>>> $feature->each_tag_value('description'); >>>> my $score = $feature->score; >>>> "$description, score=$score"; >>>> }, >>>> ); >>>> ------------------------------------------------------------------- >>>> ----- >>>> --------- >>>> >>>> >>>> Thanx, >>>> >>>> Russell Smithies >>>> >>>> >>>> >>>> >>>> =================================================================== >>>> ==== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> =================================================================== >>>> ==== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Mon Nov 26 14:08:41 2007 From: arvindvanam at gmail.com (vanam) Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Message-ID: <13955209.post@talk.nabble.com> i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m unable to find a downloadable version.all ther is a web interface for it. can u tell frm wher to fdownload it???? or can you just tell me how to set the location in piseapplication to localhost n wat to enter in the $email variable???? -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Nov 26 15:08:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 14:08:24 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13955209.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> <13955209.post@talk.nabble.com> Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu> On Nov 26, 2007, at 1:08 PM, vanam wrote: > i searches for the embassy version of RNAFOLD (i guess its > vrnafold) but i m > unable to find a downloadable version.all ther is a web interface > for it. > can u tell frm wher to fdownload it???? You will need to install EMBOSS as well as the EMBASSY version of VIENNA (something which is documented in the docs that come along with the distributions and I will not go into detail on): ftp://emboss.open-bio.org/pub/EMBOSS/ This would be your best bet. Understand that there is no specific class framework for dealing with RNA secondary structure in BioPerl, so you will be on your own for now. My suggestion for using Pise had the very important caveats that (1) it very well may not work, (2) I have no experience with Pise, let alone setting it up locally, therefore (3) I haven't tested it (and don't intend to, as I don't have the time). > or can you just tell me how to set the location in piseapplication to > localhost n wat to enter in the $email variable???? I have pointed out documentation previously which comes with the modules in question. Remember perldoc is your friend; consulting it saves me (and everyone else) time. From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise': ---------------------------------------------- DESCRIPTION Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli- cation objects, that let you submit jobs on a Pise server. my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -email => 'me at myhome'); The email is optional (there is default one). It can be useful, though. Your program might enter infinite loops, or just run many jobs: the Pise server maintainer needs a contact (s/he could of course cancel any requests from your address...). And if you plan to run a lot of heavy jobs, or to do a course with many students, please ask the maintainer before. The location parameter stands for the actual CGI location, except when set at the factory creation step, where it is rather the root of all CGI. There are default values for most of Pise programs. You can either set location at: 1 factory creation: my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -location => 'http://somewhere/ Pise/cgi-bin', -email => 'me at myhome'); 2 program creation: my $program = $factory->program('water', -location => 'http://somewhere/Pise/ cgi-bin/water.pl' ); 3 any time before running: $program->location('http://somewhere/Pise/cgi-bin/water.pl'); $job = $program->run(); 4 when running: $job = $program->run(-location => 'http://somewhere/Pise/cgi- bin/water.pl'); You can also retrieve a previous job results by providing its url: $job = $factory->job($url); You get the url of a job by: $job->jobid; ---------------------------------------------- chris From sac at bioperl.org Mon Nov 26 20:41:59 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 17:41:59 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Chris, Cood catch. You're on track here with one exception: WU blast and NCBI blast behave differently in what they report in the hit table: WU blast puts the raw score in the table not the bit score as NCBI blast does (see example below for reference). WU blast also swaps their location in the HSP header relative to how NCBI reports it. It would be good to verify that the blast parser isn't befuddled by this. A quick look at SearchIO::blast and it appears that data from the hit table is always getting stored as score, not bits for WU blast. Not sure if the HSP section data are parsed correctly. I'd recommend looking into these things when you do your fixes. So in the end, WU blast HSPs that are built from the hit table should report a value for raw_score and punt on bits, but NCBI HSPs so constructed should do the opposite. The downside to this arrangement is that code that works for NCBI blast hits will need modification to work for WU blast hits, but that is just the nature of the data. It shouldn't be an issue for the majority of users that stick with one flavor of blast and don't switch back and forth, or for users that get their HSP scoring data from HSP sections rather than relying on the hit table. Ideally, the HSP object would know whether it was NCBI or WU-based and issue an informative warning when attempting to access data it doesn't have. One solution might be for the parser to put a 'WU-' in front of the algorithm name for WU blast reports, so it would then be available for the contained hit/hsp objects. This could break anything dependent on algorithm name, so it would need some testing. Steve Example WU blast table and HSP header: Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh... 4141 0.0 1 gb|AAC76922.1| (AE000468) aspartokinase II and homoserine... 844 3.1e-86 1 gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi... 483 2.8e-47 1 gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c... 97 0.0010 1 >gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I [Escherichia coli] Length = 820 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0 Identities = 820/820 (100%), Positives = 820/820 (100%) Example NCBI blast table and HSP header: Score E Sequences producing significant alignments: (bits) Value ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189... 115 8e-26 >ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397 transcript:ENST00000357569 Length = 425 Score = 120 bits (301), Expect = 3e-27 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%) On Nov 26, 2007 10:59 AM, Chris Fields wrote: > Steve, Bernd, (and Jason, since you may have some input on this as > well), > > I am now looking into the bug Bernd submitted and it seems there is a > serious discrepancy with the way the hit raw_score, bits, and > significance is determined for Hit objects. Unless I am mistaken > these should always come from the best HSP when they are present, > falling back to the hit table data only when no HSP alignments are > present. Under the latter conditions a minimal Hit object is made > from data in the hit table, which reports the rounded bit score, not > the raw score, so in those cases the raw score would be undefined > (and you probably should get a nasty warning indicating there are no > HSPs present to get the data from). > > What is occurring now, though, is the raw_score and significance is > explicitly set from the hit table in the BLAST parser for the Hit > object at all times, while the bits are correctly derived from the > best HSP (no fallback to the hit table). Changing to the behavior > above results in several tests failing via SearchIO.t, with each > failed test reporting the expected (read:correct) raw score. > > I'll look through the tests just in case, but I am planning on > committing changes to the BLAST parsers, GenericHit, and SearchIO.t > (to reflect the correct expected data) in the next day or two unless > there are any objections. > > chris > > > On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > > > On Nov 21, 2007 9:38 AM, Bernd Web wrote: > >> [snip] > >> > >> Further is is possible to get the raw_score of a hit. $hit->raw_score > >> actually gets the bitscore (w/o decimal point). > > > > Hmmm. raw_score should not be the same as bit score. So given an > > example blast hit line such as: > > > > Score = 60.0 bits (30), Expect = 1e-06 > > > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > > > Could you submit a bug report for this? http://www.bioperl.org/ > > wiki/Bugs > > > > Thanks, > > Steve > > > >> > >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: > >>> Hi Russell, > >>> > >>> I came across your question. At first I thought all was well on my > >>> system, but indeed I also have these colouring problems. > >>> I noted that scrore in the bgcolor callback gets a different value! > >>> Printing score during hit parsing($hit->raw_score) gives the same > >>> score as -description > >>> my $score = $feature->score; However, printing score in the bgcolor > >>> sub gives 2573! > >>> All scores in the bgcolor routine all different and higher than the > >>> real scores. Were you able to solve this colouring issue? > >>> > >>> Regards, > >>> Bernd > >>> > >>> > >>>> Hi all, > >>>> I'm using a modified version of Lincoln's tutorial > >>>> (http://www.bioperl.org/wiki/ > >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) > >>>> and I'm colouring the HSPs by setting the -bgcolor by score with > >>>> a sub > >>>> to give a similar image to that from NCBI but for some reason, my > >>>> colours are coming out wrong (see attached example) > >>>> They seem to be off by one but I can't see why. > >>>> > >>>> Any ideas? > >>>> > >>>> I can't be certain but I think it's only started doing this > >>>> since our > >>>> BLAST upgrade to 2.2.17 a few weeks ago. > >>>> > >>>> Here's the colouring code: > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> ------- > >>>> my $track = $panel->add_track( > >>>> -glyph => 'segments', > >>>> -label => 1, > >>>> -connector => 'dashed', > >>>> -bgcolor => sub { > >>>> my $feature = shift; > >>>> my $score = $feature->score; > >>>> return 'red' if $score >= 200; > >>>> return 'fuchsia' if $score > >>>> >= 80; > >>>> return 'lime' if $score > >>>> >= 50; > >>>> return 'blue' if $score >= 40; > >>>> return 'black'; > >>>> }, > >>>> -font2color => 'gray', > >>>> -sort_order => 'high_score', > >>>> -description => sub { > >>>> my $feature = shift; > >>>> return unless > >>>> $feature->has_tag('description'); > >>>> my ($description) = > >>>> $feature->each_tag_value('description'); > >>>> my $score = $feature->score; > >>>> "$description, score=$score"; > >>>> }, > >>>> ); > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> --------- > >>>> > >>>> > >>>> Thanx, > >>>> > >>>> Russell Smithies > >>>> > >>>> > >>>> > >>>> > >>>> =================================================================== > >>>> ==== > >>>> Attention: The information contained in this message and/or > >>>> attachments > >>>> from AgResearch Limited is intended only for the persons or > >>>> entities > >>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>> material. Any review, retransmission, dissemination or other use > >>>> of, or > >>>> taking of any action in reliance upon, this information by > >>>> persons or > >>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>> Limited. If you have received this message in error, please > >>>> notify the > >>>> sender immediately. > >>>> =================================================================== > >>>> ==== > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From sac at bioperl.org Mon Nov 26 22:27:09 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 19:27:09 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com> Hi Jon, I'd recommend downloading it into a separate location of your choosing (~/lib/bioperl-ext for example) and running the installer as specified in the docs that come with the download. Then you can include the location you installed it into via a "use lib '~/lib/bioperl-ext'" statement at the top of your script. It may be handy to install it as a non-root user so that you don't alter the main perl installation. This way your ext install will stay separate from your main bioperl and perl installations. There are some docs about the ext packages you might want to check out at http://www.bioperl.org/wiki/Ext_package. Steve On Nov 21, 2007 4:35 PM, Jonathan Binkley wrote: > Hi, > > I installed bioperl on a Mac (OS 10.4, Intel) via fink, > which put it here: > > /sw/lib/perl5/5.8.6/Bio/ > > It seems to work fine, but I need bioperl-ext for > Smith-Waterman alignments. > > So, into which directory should I download bioperl-ext and > run the Makefile? > > Thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From a_arya2000 at yahoo.com Tue Nov 27 09:51:41 2007 From: a_arya2000 at yahoo.com (a_arya2000) Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST) Subject: [Bioperl-l] Bioperl-ext test fails Message-ID: <615478.1036.qm@web60113.mail.yahoo.com> Hello, I downloaded latest bioperl-ext from bioperl website, and I have io_lib v1.8.11 installed, and I was trying to install Bio::SeqIO::staden::read (of bioperl-ext). It compiled fine without any error but when I run make test I got following output. ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/staden_read....ok 3/94# Test 7 got: "0" (t/staden_read.t at line 110 *TODO*) # Expected: (We don't have the ability to write files for abi format) # t/staden_read.t line 110 is: ok(0, undef, "We don't have the ability to write files for $format format") for 1..7; # Test 8 got: "0" (t/staden_read.t at line 110 fail #2 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 9 got: "0" (t/staden_read.t at line 110 fail #3 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 10 got: "0" (t/staden_read.t at line 110 fail #4 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 11 got: "0" (t/staden_read.t at line 110 fail #5 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 12 got: "0" (t/staden_read.t at line 110 fail #6 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 13 got: "0" (t/staden_read.t at line 110 fail #7 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 14 got: "0" (t/staden_read.t at line 62 *TODO*) # Expected: (Still missing test files for alf format) # t/staden_read.t line 62 is: ok(0, undef, "Still missing test files for $format format") for (1..$formatlooptests); # Test 15 got: "0" (t/staden_read.t at line 62 fail #2 *TODO*) # Expected: (Still missing test files for alf format) # Test 16 got: "0" (t/staden_read.t at line 62 fail #3 *TODO*) # Expected: (Still missing test files for alf format) # Test 17 got: "0" (t/staden_read.t at line 62 fail #4 *TODO*) # Expected: (Still missing test files for alf format) # Test 18 got: "0" (t/staden_read.t at line 62 fail #5 *TODO*) # Expected: (Still missing test files for alf format) # Test 19 got: "0" (t/staden_read.t at line 62 fail #6 *TODO*) # Expected: (Still missing test files for alf format) # Test 20 got: "0" (t/staden_read.t at line 62 fail #7 *TODO*) # Expected: (Still missing test files for alf format) # Test 21 got: "0" (t/staden_read.t at line 62 fail #8 *TODO*) # Expected: (Still missing test files for alf format) # Test 22 got: "0" (t/staden_read.t at line 62 fail #9 *TODO*) # Expected: (Still missing test files for alf format) # Test 23 got: "0" (t/staden_read.t at line 62 fail #10 *TODO*) # Expected: (Still missing test files for alf format) # Test 24 got: "0" (t/staden_read.t at line 62 fail #11 *TODO*) # Expected: (Still missing test files for alf format) # Test 25 got: "0" (t/staden_read.t at line 62 fail #12 *TODO*) # Expected: (Still missing test files for alf format) # Test 31 got: "0" (t/staden_read.t at line 107 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # t/staden_read.t line 107 is: ok(0, undef, "Can't write valid ctf files until we have a trace object") for 1..7; # Test 32 got: "0" (t/staden_read.t at line 107 fail #2 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 33 got: "0" (t/staden_read.t at line 107 fail #3 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 34 got: "0" (t/staden_read.t at line 107 fail #4 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 35 got: "0" (t/staden_read.t at line 107 fail #5 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 36 got: "0" (t/staden_read.t at line 107 fail #6 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 37 got: "0" (t/staden_read.t at line 107 fail #7 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + 0.15 csys = 1.71 CPU) Anyone has any idea what might be going wrong here? By the way, my OS is Linux. Thank you very much. Arya ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From bix at sendu.me.uk Tue Nov 27 10:41:38 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 15:41:38 +0000 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com> References: <615478.1036.qm@web60113.mail.yahoo.com> Message-ID: <474C3AB2.5050208@sendu.me.uk> a_arya2000 wrote: > Hello, > I downloaded latest bioperl-ext from bioperl website, > and I have io_lib v1.8.11 installed, and I was trying > to install Bio::SeqIO::staden::read (of bioperl-ext). > It compiled fine without any error but when I run make > test I got following output. [...] > All tests successful. > Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + > 0.15 csys = 1.71 CPU) > > > Anyone has any idea what might be going wrong here? By > the way, my OS is Linux. Thank you very much. Not being familiar with the test script or ext, I can at least say that nothing actually went wrong: 'All tests successful'. Apparently there are some things in the test script that are known by the author to not work quite right, so he marked them as 'todo'. The problems seem harmless in any case, with things returning 0 instead of undef. So, unless you've reason to believe there is something significant going on, all is well. From alison.waller at utoronto.ca Mon Nov 26 16:06:35 2007 From: alison.waller at utoronto.ca (alison waller) Date: Mon, 26 Nov 2007 16:06:35 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results Message-ID: <005a01c83070$3a814580$d81efea9@AWALL> Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From bix at sendu.me.uk Tue Nov 27 12:01:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 17:01:36 +0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <474C4D70.2010206@sendu.me.uk> alison waller wrote: > I am trying to write a script that will parse large blast files (usually > blastx) I also want to be able to specify how many hits I want to report > information on. > > Most of the time I will only want information on the top hit, but I want to > have the flexibility to obtain information on say the top5. I am pretty > sure I have done this wrong, any advice on how to correct my script to do > this, would be great. [snip] > if ($top_hit=$result->next_hit) # this might be wrong - I want to > specify how many hits to print results for I didn't really pay attention to the rest of your code, but assuming it all works except for only ever giving you info for the top hit, you just need to change this 'if' to a loop of some kind. # ... my $hits = 0; while (my $hit = $result->next_hit) { $hits++; last if $hits > $tophit; # ... } From David.Messina at sbc.su.se Tue Nov 27 12:55:44 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 27 Nov 2007 18:55:44 +0100 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <474C4D70.2010206@sendu.me.uk> References: <005a01c83070$3a814580$d81efea9@AWALL> <474C4D70.2010206@sendu.me.uk> Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Hi Alison, As Sendu mentioned, the key bit is adding a condition to the hit loop to limit the number of hits that are printed. I didn't test the below extensively, but give it a try... Dave #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while ( my $result = $report->next_result ) { my $i = 0; while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { while ( my $hsp = $hit->next_hsp ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } From Russell.Smithies at agresearch.co.nz Tue Nov 27 14:31:29 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 28 Nov 2007 08:31:29 +1300 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: Do the hits need to be sorted first or is this done automagicly? I ask this as I know Megablast doesn't provide sorted output for most of it's formats. Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Dave Messina > Sent: Wednesday, 28 November 2007 6:56 a.m. > To: alison waller > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > Hi Alison, > As Sendu mentioned, the key bit is adding a condition to the hit loop to > limit the number of hits that are printed. I didn't test the below > extensively, but give it a try... > > > Dave > > > > #!/usr/local/bin/perl -w > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > use strict; > use warnings; > use Bio::SearchIO; > > my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; > if (@ARGV != 2) { die $usage; } > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > print OUT > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t > Qstrand\tHstrand\n"; > > # Go through BLAST reports one by one > while ( my $result = $report->next_result ) { > my $i = 0; > while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { > while ( my $hsp = $hit->next_hsp ) { > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > > if ($i == 0) { print OUT "no hits\n"; } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Tue Nov 27 16:09:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:09:43 -0600 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <474C3AB2.5050208@sendu.me.uk> References: <615478.1036.qm@web60113.mail.yahoo.com> <474C3AB2.5050208@sendu.me.uk> Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu> You can always test it within the bioperl suite after it's installed; several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read. In general though if it's passing tests it should be fine. chris On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote: > a_arya2000 wrote: >> Hello, >> I downloaded latest bioperl-ext from bioperl website, >> and I have io_lib v1.8.11 installed, and I was trying >> to install Bio::SeqIO::staden::read (of bioperl-ext). >> It compiled fine without any error but when I run make >> test I got following output. > [...] >> All tests successful. >> Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + >> 0.15 csys = 1.71 CPU) >> >> >> Anyone has any idea what might be going wrong here? By >> the way, my OS is Linux. Thank you very much. > > Not being familiar with the test script or ext, I can at least say > that > nothing actually went wrong: 'All tests successful'. Apparently there > are some things in the test script that are known by the author to not > work quite right, so he marked them as 'todo'. The problems seem > harmless in any case, with things returning 0 instead of undef. > > So, unless you've reason to believe there is something significant > going > on, all is well. From cjfields at uiuc.edu Tue Nov 27 16:00:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:00:33 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Tue Nov 27 20:06:30 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT) Subject: [Bioperl-l] Bio::Tools::Run::Primer3 Message-ID: Hello, I was playing around with Primer3, and I hit a problem. Not sure if it's a bug or if I was doing something I wasn't supposed to, but if it's the latter, I thought it might save someone else half an hour of banging their head of a keyboard if I mentioned it: What I was doing was roughly: # create a primer3 obj my $p3 = ...Primer3->new(); # loop through some sequences generating primers for # each of them using the same primer3 obj while (@some_bio_seqs){ my $res = $p3->run; ... } This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC, at which point it worked for a few sequences then I got a "can't place primer on sequence" error. After a bit of faffing about, I think the problem occurs when no primers are found. In which case $p3 still has the primers from the previous run, which don't come from the current sequence, so can't be placed on it. I tried calling $p3->cleanup in the loop, but that didn't work either. Creating a new $p3 every time works fine. Are you supposed to create a new Primer3 object for every sequence? (Apologies if I missed the relevant bit of the docs). Cheers, Cass xx From alison.waller at utoronto.ca Tue Nov 27 16:32:07 2007 From: alison.waller at utoronto.ca (alison waller) Date: Tue, 27 Nov 2007 16:32:07 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Thanks Everyone, Your edits worked Dave, however after looking at the output I realized that I only want information on the top hsp per query returned. For example some of the querys the top hit has two hsps so it returned both. I tried to further edit it, but after 3 attempts they are all failing, I think due to me using the loops wrong. I also have another problem, I also want to retrieve the gi, this doesn't seem to be straight forward as it should. I found another method _get_seq_identifiers, but this looks awkward, isn't there and object for the gi? I've pasted my non-working script below if there are any suggestions on how to get it to print out just the first hsp per hit, that would be great. Thanks, #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t strand\tHstrand\n"; # Go through BLAST reports one by one while (my $result = $report->next_result) { my $i=0; while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, November 27, 2007 4:01 PM To: Smithies, Russell Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dennis.prickett at bbsrc.ac.uk Wed Nov 28 05:18:26 2007 From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C)) Date: Wed, 28 Nov 2007 10:18:26 -0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk> Dear Alison Or, if you are absolutely only interested in the top hit you could limit it to that in the blast command by adding the parameters " -b 1 ". This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps, etc). Your blasts run faster and then you won't have to worry about how to parse out the top blast hit(s). However, if there are any caveats for using this parameter that I am not aware of please let us know. Dennis Prickett Institute of Animal Health Compton, nr Newbury RG2 9FS United Kingdom -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller Sent: 26 November 2007 21:07 To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] help using SEARCH IO to parse blast results Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From t.nugent at cs.ucl.ac.uk Wed Nov 28 08:10:41 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Wed, 28 Nov 2007 13:10:41 +0000 Subject: [Bioperl-l] Helical Wheel module Message-ID: <474D68D1.3080602@cs.ucl.ac.uk> Hi everyone, I've been drawing a lot of helical wheels recently so put all my code into a module. I don't think there's anything in bioperl to do this yet though there are a few programs written in perl and flash on the web to do the same thing. I was thinking this could fit into biographics. Has lots of options to adjust labels, colours, ttf fonts and can output to png & svg. Tim ... Here's the output, converted to jpg from svg: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg Module: http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz Works like this: use DrawHelicalWheel; my $im = DrawHelicalWheel->new(-title=>$title, -sequence=>$sequence, -helices=>\@helices, -ttf_font=>$font); open(OUTPUT, ">$svg"); binmode OUTPUT; print OUTPUT $im->svg; close OUTPUT; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From tristan.lefebure at gmail.com Wed Nov 28 10:46:11 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:46:11 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281046.11146.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From bix at sendu.me.uk Wed Nov 28 11:19:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Nov 2007 16:19:36 +0000 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <200711281046.11146.tnl7@cornell.edu> References: <200711281046.11146.tnl7@cornell.edu> Message-ID: <474D9518.7010201@sendu.me.uk> Tristan Lefebure wrote: > Hello! > > I was wondering if there was a function to remove sites/columns of an > alignment. Something like: $aln->remove_sites(@sites_to_remove) > I looked around Bio::SimpleAlign but did not find exactly that. There is > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. You might want to take a second look at the docs. You can supply column number ranges to remove_columns(), so it does exactly what you want. From tnl7 at cornell.edu Wed Nov 28 10:44:17 2007 From: tnl7 at cornell.edu (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:44:17 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281044.17770.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From cjfields at uiuc.edu Wed Nov 28 08:57:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:57:27 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Message-ID: I had some code which does this which I committed yesterday to CVS; it catches the GI for the query and the hits: $result->query_gi; $hit->ncbi_gi; I am in the midst of fixing additional problems with WU-BLAST parsing but you are more than welcome to try it. chris On Nov 27, 2007, at 3:32 PM, alison waller wrote: > Thanks Everyone, > > Your edits worked Dave, however after looking at the output I > realized that > I only want information on the top hsp per query returned. For > example some > of the querys the top hit has two hsps so it returned both. > > I tried to further edit it, but after 3 attempts they are all > failing, I > think due to me using the loops wrong. > > I also have another problem, I also want to retrieve the gi, this > doesn't > seem to be straight forward as it should. I found another method > _get_seq_identifiers, but this looks awkward, isn't there and object > for the > gi? > > I've pasted my non-working script below if there are any suggestions > on how > to get it to print out just the first hsp per hit, that would be > great. > > Thanks, > > > #!/usr/local/bin/perl -w > > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > > use strict; > use warnings; > use Bio::SearchIO; > > > my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; > if (@ARGV != 2) { die $usage; } > > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! > \n"; > > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > > print OUT > > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tgaps\t > strand\tHstrand\n"; > > > # Go through BLAST reports one by one > while (my $result = $report->next_result) { > my $i=0; > while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ > while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { > > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > if ($i == 0) { print OUT "no hits\n"; } > > } > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 27, 2007 4:01 PM > To: Smithies, Russell > Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > The hits/HSPs are generally in the order they appear in the report. > > If you are looking for best/worst HSP after parsing you can use the > $hit->hsp() method: > > # best and worst > my $best = $hit->hsp('best'); # also 'first' > my $worst = $hit->hsp('worst'); # also last > > The SearchIO text BLAST parser also has several options implemented > for finer control: > > -inclusion_threshold => e-value threshold for inclusion in the > PSI-BLAST score matrix model (blastpgp) > -signif => float or scientific notation number to be used > as a P- or Expect value cutoff > -score => integer or scientific notation number to be used > as a blast score value cutoff > -bits => integer or scientific notation number to be used > as a bit score value cutoff > -hit_filter => reference to a function to be used for > filtering hits based on arbitrary criteria. > All hits of each BLAST report must satisfy > this criteria to be retained. > If a hit fails this test, it is ignored. > This function should take a > Bio::Search::Hit::BlastHit.pm object as its first > argument and return true > if the hit should be retained. > Sample filter function: > -hit_filter => sub { $hit = shift; > $hit->gaps == 0; }, > (Note: -filt_func is synonymous with -hit_filter) > -overlap => integer. The amount of overlap to permit between > adjacent HSPs when tiling HSPs. A reasonable > value is 2. > Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. > > chris > > On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > >> Do the hits need to be sorted first or is this done automagicly? >> I ask this as I know Megablast doesn't provide sorted output for >> most of >> it's formats. >> >> Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open- >>> bio.org] On Behalf Of Dave Messina >>> Sent: Wednesday, 28 November 2007 6:56 a.m. >>> To: alison waller >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >>> >>> Hi Alison, >>> As Sendu mentioned, the key bit is adding a condition to the hit >>> loop >> to >>> limit the number of hits that are printed. I didn't test the below >>> extensively, but give it a try... >>> >>> >>> Dave >>> >>> >>> >>> #!/usr/local/bin/perl -w >>> >>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >>> # alison waller November 2007 >>> >>> use strict; >>> use warnings; >>> use Bio::SearchIO; >>> >>> my $usage = "to run type: blast_parse_aw.pl <# of >> hits>\n"; >>> if (@ARGV != 2) { die $usage; } >>> >>> my $infile = $ARGV[0]; >>> my $outfile = $infile . '.parsed'; >>> my $tophit = $ARGV[1]; # to specify in the command line how many >>> hits >>> # to report for each query >>> >>> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >>> \n"; >>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! >> $!\n"; >>> >>> my $report = new Bio::SearchIO( >>> -file => "$infile", >>> -format => "blast" >>> ); >>> >>> print OUT >>> >> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent >> \tga >> ps\t >>> Qstrand\tHstrand\n"; >>> >>> # Go through BLAST reports one by one >>> while ( my $result = $report->next_result ) { >>> my $i = 0; >>> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >>> while ( my $hsp = $hit->next_hsp ) { >>> >>> # Print some tab-delimited data about this hit >>> print OUT $result->query_name, "\t"; >>> print OUT $hit->name, "\t"; >>> print OUT $hit->significance, "\t"; >>> print OUT $hit->bits, "\t"; >>> print OUT $hsp->evalue, "\t"; >>> print OUT $hsp->percent_identity, "\t"; >>> print OUT $hsp->length('total'), "\t"; >>> print OUT $hsp->num_identical, "\t"; >>> print OUT $hsp->gaps, "\t"; >>> print OUT $hsp->strand('query'), "\t"; >>> print OUT $hsp->strand('hit'), "\n"; >>> } >>> } >>> >>> if ($i == 0) { print OUT "no hits\n"; } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use of, >> or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 08:54:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:54:39 -0600 Subject: [Bioperl-l] Helical Wheel module In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk> References: <474D68D1.3080602@cs.ucl.ac.uk> Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu> Looks good! We recently added in your transmembrane module, so we could definitely add this in. chris On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote: > Hi everyone, > > I've been drawing a lot of helical wheels recently so put all my code > into a module. I don't think there's anything in bioperl to do this > yet > though there are a few programs written in perl and flash on the web > to > do the same thing. I was thinking this could fit into biographics. Has > lots of options to adjust labels, colours, ttf fonts and can output to > png & svg. > > Tim > > ... > > Here's the output, converted to jpg from svg: > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg > > Module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz > > Works like this: > > use DrawHelicalWheel; > > my $im = DrawHelicalWheel->new(-title=>$title, > -sequence=>$sequence, > -helices=>\@helices, > -ttf_font=>$font); > open(OUTPUT, ">$svg"); > binmode OUTPUT; > print OUTPUT $im->svg; > close OUTPUT; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > http://www.cs.ucl.ac.uk/staff/T.Nugent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 13:43:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 12:43:58 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu> On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote: > Chris, > > Cood catch. You're on track here with one exception: WU blast and NCBI > blast behave differently in what they report in the hit table: WU > blast puts the raw score in the table not the bit score as NCBI blast > does (see example below for reference). WU blast also swaps their > location in the HSP header relative to how NCBI reports it. It would > be good to verify that the blast parser isn't befuddled by this. A > quick look at SearchIO::blast and it appears that data from the hit > table is always getting stored as score, not bits for WU blast. Not > sure if the HSP section data are parsed correctly. I'd recommend > looking into these things when you do your fixes. What I have now after commits is: GenericHit - use the best HSP when possible for bits, score/raw_score, significance. When there is no HSP, construct a minimal Hit object using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST maps to bits(), both map evalue/pvalue to significance). HSP mapping seems to be correct. One issue that has popped up is GenericHit::significance preferentially uses the best HSP. However, GenericHSP::significance uses evalues preferentially over pvalues; both Expect and P appear to be parsed for WU-BLAST HSPs now (so the evalue is reported); this apparently wasn't always the case if I read the GenericHit docs correctly. As NCBI BLAST doesn't report pvalues we could change that so it preferentially returns a pvalue if present, falling back to an evalue. This would match what is found hit table more closely and resembles what is documented for the method (for significance(), WU- BLAST gets pvalues, NCBI BLAST gets evalues). > So in the end, WU blast HSPs that are built from the hit table should > report a value for raw_score and punt on bits, but NCBI HSPs so > constructed should do the opposite. The downside to this arrangement > is that code that works for NCBI blast hits will need modification to > work for WU blast hits, but that is just the nature of the data. It > shouldn't be an issue for the majority of users that stick with one > flavor of blast and don't switch back and forth, or for users that get > their HSP scoring data from HSP sections rather than relying on the > hit table. In general I get my data from the HSPs, so this shouldn't be a significant issue (bad pun). I did find that changing it so that Hit objects use HSP data pointed out issues with test data; hit table raw/ bit scores were rounded from the HSP score data or vice versa since all data came from the hit table, so tests flunked. I think changing the way minimal hit objects report data (particularly for NCBI BLAST) will lead to a lot of confusion unless we clarify warnings when one or the other is missing (as you also indicated). I'm working on that now. > Ideally, the HSP object would know whether it was NCBI or WU-based and > issue an informative warning when attempting to access data it doesn't > have. One solution might be for the parser to put a 'WU-' in front of > the algorithm name for WU blast reports, so it would then be available > for the contained hit/hsp objects. This could break anything dependent > on algorithm name, so it would need some testing. > > Steve I can probably work around as noted above that unless you think it's warranted to add a 'WU' designation (the version info in the Result object has 'WashU' attached, so one could feasibly use that for distinguishing the two report types). Anyway, I'm committing my first batch of fixes, the significance test will fail for at least a day until I can look into it more. chris From tristan.lefebure at gmail.com Wed Nov 28 14:03:44 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 14:03:44 -0500 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <474D9518.7010201@sendu.me.uk> References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Hoops. I was reading the BioPerl 1.4 documentation. Actually, http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be the 1.4documentation... Thank you, it works great. On Nov 28, 2007 11:19 AM, Sendu Bala wrote: > Tristan Lefebure wrote: > > Hello! > > > > I was wondering if there was a function to remove sites/columns of an > > alignment. Something like: $aln->remove_sites(@sites_to_remove) > > I looked around Bio::SimpleAlign but did not find exactly that. There is > > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' > criteria. > > You might want to take a second look at the docs. You can supply column > number ranges to remove_columns(), so it does exactly what you want. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Nov 28 16:57:14 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 29 Nov 2007 10:57:14 +1300 Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk> Message-ID: Has anyone got a good example of parsing ASN.1 with Bio::SeqIO::entrezgene? I'm trying to get GO ids and KEGG terms out but it's quite deeply nested and my Perl isn't that good :-( Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From stefan.kirov at bms.com Wed Nov 28 17:16:18 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time) Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Here is an example for GO, will send the one for KEGG later: my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', -service_record=>'yes');#, -locuslink=>'convert'); while (my $seq=$eio->next_seq) { my $gid=$seq->accession_number; foreach my $ot ($ann->get_Annotations('OntologyTerm')) { next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers my $evid=$ot->comment; $evid=~s/evidence: //i; my @ref=$ot->term->get_references; #Really there should be just one? my $id=$ot->identifier; my $fid='GO:' . sprintf("%07u",$id); print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n"; } } Please note there is a bug in the parser that makes it suck a lot of RAM. I am fixing this and will commit probably by the week's end- you will have to update at that point. If you work with few records this should not matter. Stefan On Thu, 29 Nov 2007, Smithies, Russell wrote: > Has anyone got a good example of parsing ASN.1 with > Bio::SeqIO::entrezgene? > I'm trying to get GO ids and KEGG terms out but it's quite deeply nested > and my Perl isn't that good :-( > > Russell > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Nov 29 18:06:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 17:06:42 -0600 Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu> For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST parsing in Bio::SearchIO::blastxml (though it appears to be pretty stable!). Since there isn't any easy way to distinguish between normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to BLAST, you have to indicate how the report is to be parsed by passing in a '-blasttype' parameter: $searchio = Bio::SearchIO->new('-tempfile' => 1, '-format' => 'blastxml', '-file' => 'psiblast.xml', '-blasttype' => 'psiblast'); Otherwise it chunks the individual iterations out as separate BLAST reports and parses them as separate reports. Tests have also been added to SearchIO.t. I will update the HOWTO and blastxml docs soon. chris From cjfields at uiuc.edu Thu Nov 29 21:41:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 20:41:49 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Primer3 In-Reply-To: References: Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu> It's probably safer to create a new instance each time but it really shouldn't be necessary for a wrapper module; this sounds like a bug to me. Could you file it in Bugzilla? On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote: > Hello, > > I was playing around with Primer3, and I hit a problem. Not sure if > it's a > bug or if I was doing something I wasn't supposed to, but if it's the > latter, I thought it might save someone else half an hour of banging > their > head of a keyboard if I mentioned it: > > What I was doing was roughly: > > # create a primer3 obj > my $p3 = ...Primer3->new(); > > # loop through some sequences generating primers for > # each of them using the same primer3 obj > while (@some_bio_seqs){ > my $res = $p3->run; > ... > } > > This worked fine for a while, but broke when I tried to set > PRIMER_MIN_GC, > at which point it worked for a few sequences then I got a "can't place > primer on sequence" error. > > After a bit of faffing about, I think the problem occurs when no > primers > are found. In which case $p3 still has the primers from the previous > run, > which don't come from the current sequence, so can't be placed on > it. I > tried calling $p3->cleanup in the loop, but that didn't work either. > Creating a new $p3 every time works fine. > > Are you supposed to create a new Primer3 object for every sequence? > (Apologies if I missed the relevant bit of the docs). > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paulhengen at coh.org Wed Nov 28 20:20:42 2007 From: paulhengen at coh.org (Paul N. Hengen) Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST) Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs Message-ID: <14017289.post@talk.nabble.com> Hi. I have a number of gene IDs from Entrez and I want to find the start and end locations in the human genome. This seemed simple enough, so I started working through some of the examples for using the EntrezGene module at www.bioperl.org Of course this did not work because the core installation does not include this module. So, I think I have two choices (1) install the module (how?), or (2) find an easier way to get the locations in the human genome. I want to use the locations to grab sequences out of the genome. Can anyone offer advice on this? Thanks. -Paul. -- Paul N. Hengen, Ph.D. Hematopoietic Stem Cell and Leukemia Research City of Hope National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 USA mailto:paulhengen at coh.org -- View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From Viktor.Martyanov at Dartmouth.EDU Thu Nov 29 15:20:19 2007 From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov) Date: 29 Nov 2007 15:20:19 -0500 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases Message-ID: <193573097@newdonner.Dartmouth.EDU> A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 444 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071129/a6380324/attachment-0001.bin From alison.waller at utoronto.ca Thu Nov 29 11:20:59 2007 From: alison.waller at utoronto.ca (alison waller) Date: Thu, 29 Nov 2007 11:20:59 -0500 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL> Hi all, I would like to install the CVS version of bioperl as I know of some code changes that will be useful to me. However, I am having problems installing it. I am trying to install bioperl in my home directly on a linux cluster. I used > cd bioperl-live * perl Build.PL -install /home/awaller However after the build command I got a lot of errors. Do I have to also have perl installed in my home directory?? There is perl installed on the cluster in /usr/bin. Do I need to point to this or does Build.PL automatically look there? I noticed a few errors about not having permission and a few about not being able to connect. I've copied a portion of the messages after my Build.pl command. Any help would be appreciated, alison Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/02packages.details.txt.gz Trying to get away with old file: 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 /root/.cpan/sources/modules/02packages.details.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Database was generated on Sat, 10 Nov 2007 22:36:34 GMT There's a new CPAN.pm version (v1.9204) available! [Current version is v1.7601] You might want to try install Bundle::CPAN reload cpan without quitting the current session. It should be a seamless upgrade while we are running... Warning: You are not allowed to write into directory "/root/.cpan/sources/modules". I'll continue, but if you encounter problems, they may be due to insufficient permissions. Fetching with LWP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied] Fetching with Net::FTP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from ftp.nrc.ca Fetching with LWP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[FTP close response: 500 Network seems to have barfed - Let's all phone our ISP and go postal! Unknown command. ] Fetching with Net::FTP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca Fetching with LWP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'cpan.mirror.cygnal.ca'] Fetching with Net::FTP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Fetching with LWP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'mirror.isurf.ca'] Fetching with Net::FTP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Trying with "/usr/bin/lynx -source" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: cpan.mirror.cygnal.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/03modlist.data.gz Trying to get away with old file: 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 /root/.cpan/sources/modules/03modlist.data.gz Going to read /root/.cpan/sources/modules/03modlist.data.gz Going to write /root/.cpan/Metadata can't create /root/.cpan/Metadata: Permission denied at /usr/share/perl/5.8/CPAN.pm line 3432 Running install for module Test::Harness Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2342 ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From cjfields at uiuc.edu Thu Nov 29 23:53:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:53:09 -0600 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: Alison, There are directions on how to do this here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA (TinyURL link) http://tinyurl.com/3263dd Note the additional configuration for CPAN in that section; you'll need to set up CPAN so it installs everything locally. chris On Nov 29, 2007, at 10:20 AM, alison waller wrote: > Hi all, > > > > I would like to install the CVS version of bioperl as I know of > some code > changes that will be useful to me. However, I am having problems > installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. > > > > I used > > > >> cd bioperl-live > > * perl Build.PL -install /home/awaller > > > > However after the build command I got a lot of errors. Do I have to > also > have perl installed in my home directory?? There is perl installed > on the > cluster in /usr/bin. Do I need to point to this or does Build.PL > automatically look there? I noticed a few errors about not having > permission and a few about not being able to connect. I've copied a > portion > of the messages after my Build.pl command. > > > > Any help would be appreciated, > > > > alison > > > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/02packages.details.txt.gz > > Trying to get away with old file: > > 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 > /root/.cpan/sources/modules/02packages.details.txt.gz > > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > > Database was generated on Sat, 10 Nov 2007 22:36:34 GMT > > > > There's a new CPAN.pm version (v1.9204) available! > > [Current version is v1.7601] > > You might want to try > > install Bundle::CPAN > > reload cpan > > without quitting the current session. It should be a seamless upgrade > > while we are running... > > > > Warning: You are not allowed to write into directory > "/root/.cpan/sources/modules". > > I'll continue, but if you encounter problems, they may be due > > to insufficient permissions. > > Fetching with LWP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[Cannot write to > '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission > denied] > > Fetching with Net::FTP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from ftp.nrc.ca > > Fetching with LWP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[FTP close response: 500 Network > seems to > have barfed - Let's all phone our ISP and go postal! > > Unknown command. > > ] > > Fetching with Net::FTP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca > > Fetching with LWP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'cpan.mirror.cygnal.ca'] > > Fetching with Net::FTP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Fetching with LWP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'mirror.isurf.ca'] > > Fetching with Net::FTP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: cpan.mirror.cygnal.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > . > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/03modlist.data.gz > > Trying to get away with old file: > > 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 > /root/.cpan/sources/modules/03modlist.data.gz > > Going to read /root/.cpan/sources/modules/03modlist.data.gz > > Going to write /root/.cpan/Metadata > > can't create /root/.cpan/Metadata: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 3432 > > Running install for module Test::Harness > > Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz > > mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 2342 > > ****************************************** > Alison S. Waller M.A.Sc. > Doctoral Candidate > awaller at chem-eng.utoronto.ca > 416-978-4222 (lab) > Department of Chemical Engineering > Wallberg Building > 200 College st. > Toronto, ON > M5S 3E5 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 29 23:57:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:57:36 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- core (I think they were added prior to the 1.5.1 release, but I'm not positive). If possible you should try installing bioperl 1.5.2 or the latest code from CVS. For directions on installing Bioperl for most OS's go here: http://www.bioperl.org/wiki/Installing_BioPerl From CVS: http://www.bioperl.org/wiki/Using_CVS chris On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org > > -- > View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 30 03:45:57 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 30 Nov 2007 08:45:57 +0000 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: <474FCDC5.5020100@sendu.me.uk> alison waller wrote: > I would like to install the CVS version of bioperl as I know of some code > changes that will be useful to me. However, I am having problems installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. [...] > Please check, if the URLs I found in your configuration file > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are > valid. The urllist can be edited. E.g. with 'o conf urllist push > ftp://myurl/' Either these urls are invalid as suggested (try setting the urllist to nothing), or your linux cluster doesn't have internet access. You can't do a 'proper' install of BioPerl and its dependencies without internet access. However, for most purposes simply downloading the BioPerl modules (ie. from a different machine with internet access) and pointing your PERL5LIB to their location is sufficient. You can download CVS modules from the BioPerl website individually, or as a tarball or everything. From MEC at stowers-institute.org Fri Nov 30 09:12:09 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 30 Nov 2007 08:12:09 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: How many, how often? Use ensembl biomart! First time interactively. Then if you to pipeline it, take the perl code it generates for you and run it - of course you'll have to install the Ensembl Perl API.... Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Paul N. Hengen > Sent: Wednesday, November 28, 2007 7:21 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs > > > Hi. > > I have a number of gene IDs from Entrez and I want to find > the start and end locations in the human genome. This seemed > simple enough, so I started working through some of the > examples for using the EntrezGene module at www.bioperl.org > Of course this did not work because the core installation > does not include this module. So, I think I have two choices > (1) install the module (how?), or (2) find an easier way to > get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research City of Hope > National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 > USA mailto:paulhengen at coh.org > > -- > View this message in context: > http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E > ntrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Fri Nov 30 09:38:58 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 30 Nov 2007 09:38:58 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> Message-ID: Paul, Have you taken a look at this page? http://www.bioperl.org/wiki/Getting_Genomic_Sequences There's code there that looks similar to what you're proposing. Brian O. On 11/28/07 8:20 PM, "Paul N. Hengen" wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org From cjfields at uiuc.edu Fri Nov 30 10:47:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Nov 2007 09:47:32 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47502C75.60809@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask Mingyi Liu if he would like to include this parser with BioPerl (since it requires it, makes sense to me, and it avoids the circular dependency that has plagued these modules). chris On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > Chris Fields wrote: > Chris, > Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the > low-level parser and is not part of bioperl. There is a circular > dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... > Paul, you can get it from CPAN and this should make > Bio::SeqIO::entrezgene functional for you. > Stefan > > >> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >> core (I think they were added prior to the 1.5.1 release, but I'm not >> positive). If possible you should try installing bioperl 1.5.2 or >> the >> latest code from CVS. >> >> For directions on installing Bioperl for most OS's go here: >> >> http://www.bioperl.org/wiki/Installing_BioPerl >> >> From CVS: >> >> http://www.bioperl.org/wiki/Using_CVS >> >> chris >> >> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >> >> >>> Hi. >>> >>> I have a number of gene IDs from Entrez and I want to find the >>> start and end locations in the human genome. This seemed simple >>> enough, so I started working through some of the examples for >>> using the EntrezGene module at www.bioperl.org Of course this >>> did not work because the core installation does not include this >>> module. So, I think I have two choices (1) install the module >>> (how?), >>> or (2) find an easier way to get the locations in the human genome. >>> I want to use the locations to grab sequences out of the genome. >>> Can anyone offer advice on this? Thanks. >>> >>> -Paul. >>> >>> -- >>> Paul N. Hengen, Ph.D. >>> Hematopoietic Stem Cell and Leukemia Research >>> City of Hope National Medical Center >>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>> mailto:paulhengen at coh.org >>> >>> -- >>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Fri Nov 30 11:12:22 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 11:12:22 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> Message-ID: <47503666.8090004@bms.com> Chris Fields wrote: > My bad. I always forget about Bio::ASN1::Entrezgene. We should ask > Mingyi Liu if he would like to include this parser with BioPerl (since > it requires it, makes sense to me, and it avoids the circular > dependency that has plagued these modules). > Yes, I think this would be a good step. Stefan > chris > > On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > > >> Chris Fields wrote: >> Chris, >> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >> low-level parser and is not part of bioperl. There is a circular >> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >> Paul, you can get it from CPAN and this should make >> Bio::SeqIO::entrezgene functional for you. >> Stefan >> >> >> >>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>> core (I think they were added prior to the 1.5.1 release, but I'm not >>> positive). If possible you should try installing bioperl 1.5.2 or >>> the >>> latest code from CVS. >>> >>> For directions on installing Bioperl for most OS's go here: >>> >>> http://www.bioperl.org/wiki/Installing_BioPerl >>> >>> From CVS: >>> >>> http://www.bioperl.org/wiki/Using_CVS >>> >>> chris >>> >>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>> >>> >>> >>>> Hi. >>>> >>>> I have a number of gene IDs from Entrez and I want to find the >>>> start and end locations in the human genome. This seemed simple >>>> enough, so I started working through some of the examples for >>>> using the EntrezGene module at www.bioperl.org Of course this >>>> did not work because the core installation does not include this >>>> module. So, I think I have two choices (1) install the module >>>> (how?), >>>> or (2) find an easier way to get the locations in the human genome. >>>> I want to use the locations to grab sequences out of the genome. >>>> Can anyone offer advice on this? Thanks. >>>> >>>> -Paul. >>>> >>>> -- >>>> Paul N. Hengen, Ph.D. >>>> Hematopoietic Stem Cell and Leukemia Research >>>> City of Hope National Medical Center >>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>> mailto:paulhengen at coh.org >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From stefan.kirov at bms.com Fri Nov 30 10:29:57 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 10:29:57 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: <47502C75.60809@bms.com> Chris Fields wrote: Chris, Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the low-level parser and is not part of bioperl. There is a circular dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... Paul, you can get it from CPAN and this should make Bio::SeqIO::entrezgene functional for you. Stefan > Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- > core (I think they were added prior to the 1.5.1 release, but I'm not > positive). If possible you should try installing bioperl 1.5.2 or the > latest code from CVS. > > For directions on installing Bioperl for most OS's go here: > > http://www.bioperl.org/wiki/Installing_BioPerl > > From CVS: > > http://www.bioperl.org/wiki/Using_CVS > > chris > > On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find the >> start and end locations in the human genome. This seemed simple >> enough, so I started working through some of the examples for >> using the EntrezGene module at www.bioperl.org Of course this >> did not work because the core installation does not include this >> module. So, I think I have two choices (1) install the module (how?), >> or (2) find an easier way to get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research >> City of Hope National Medical Center >> 1500 E. Duarte Road, Duarte, CA 91010 USA >> mailto:paulhengen at coh.org >> >> -- >> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arareko at campus.iztacala.unam.mx Fri Nov 30 12:01:29 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 30 Nov 2007 11:01:29 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47503666.8090004@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> <47503666.8090004@bms.com> Message-ID: <475041E9.8050909@campus.iztacala.unam.mx> I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the past, he mentioned he doesn't track the list closely). Mauricio. Stefan Kirov wrote: > Chris Fields wrote: >> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask >> Mingyi Liu if he would like to include this parser with BioPerl (since >> it requires it, makes sense to me, and it avoids the circular >> dependency that has plagued these modules). >> > Yes, I think this would be a good step. > Stefan >> chris >> >> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: >> >> >>> Chris Fields wrote: >>> Chris, >>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >>> low-level parser and is not part of bioperl. There is a circular >>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >>> Paul, you can get it from CPAN and this should make >>> Bio::SeqIO::entrezgene functional for you. >>> Stefan >>> >>> >>> >>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>>> core (I think they were added prior to the 1.5.1 release, but I'm not >>>> positive). If possible you should try installing bioperl 1.5.2 or >>>> the >>>> latest code from CVS. >>>> >>>> For directions on installing Bioperl for most OS's go here: >>>> >>>> http://www.bioperl.org/wiki/Installing_BioPerl >>>> >>>> From CVS: >>>> >>>> http://www.bioperl.org/wiki/Using_CVS >>>> >>>> chris >>>> >>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>>> >>>> >>>> >>>>> Hi. >>>>> >>>>> I have a number of gene IDs from Entrez and I want to find the >>>>> start and end locations in the human genome. This seemed simple >>>>> enough, so I started working through some of the examples for >>>>> using the EntrezGene module at www.bioperl.org Of course this >>>>> did not work because the core installation does not include this >>>>> module. So, I think I have two choices (1) install the module >>>>> (how?), >>>>> or (2) find an easier way to get the locations in the human genome. >>>>> I want to use the locations to grab sequences out of the genome. >>>>> Can anyone offer advice on this? Thanks. >>>>> >>>>> -Paul. >>>>> >>>>> -- >>>>> Paul N. Hengen, Ph.D. >>>>> Hematopoietic Stem Cell and Leukemia Research >>>>> City of Hope National Medical Center >>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>>> mailto:paulhengen at coh.org >>>>> >>>>> -- >>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Fri Nov 30 15:21:13 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 30 Nov 2007 12:21:13 -0800 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases In-Reply-To: <193573097@newdonner.Dartmouth.EDU> References: <193573097@newdonner.Dartmouth.EDU> Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org> Viktor - Bio::SearchIO helps you parse BLAST reports, but don't underestimate the power of going as low-tech as possible and outputting scores with the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular format that is parseable with the 'split' function in Perl. See the wiki http://bioperl.org/wiki for HOWTOs and examples of using the parsers. You might also consider already-written tools like OrthoMCL, InParanoid, and other that help you define relationships like orthologs and paralogs among species. There also exist a few published web resources that have pre-computed homologs for you, might take a look around first unless the point of the project is to learn how to run these kinds of searches. For general Perl help consider Perlmonks.org and some of the introductory books that are available. -jason -- Jason Stajich jason at bioperl.org On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote: > Hello, > > My name is Viktor Martyanov and I am a Ph.D. student in biology at > Dartmouth. > > I need to be able to use a set of genes or FASTA sequences from S. > cerevisiae and retrieve a set of corresponding homologs from other > fungal species via BLASTP searches. > > I would like to find out if there are Perl scripts that approach > this problem. By the way, is there a Perl community or forum where > I could post this question? > > Thanks very much. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri Nov 30 17:03:23 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 30 Nov 2007 15:03:23 -0700 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: Paul, One other alternative is to use the UCSC table browser (http:// genome.ucsc.edu/cgi-bin/hgTables?command=start). Select your organism, upload your ID list. Select you output options. You can download the coordinates or the fasta directly. You have options for including or excluding various parts of the gene, and upstream/ downstream sequences. This is similar the solution that Malcom suggested except the Ensembl option can be run repeatedly as perl code as he pointed out. UCSC allows you to do remote connections to their MySQL server so you could set up a repeated task and more complex queries that way with the UCSC method. Barry On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote: > How many, how often? > > Use ensembl biomart! > > First time interactively. > > Then if you to pipeline it, take the perl code it generates for you > and > run it - of course you'll have to install the Ensembl Perl API.... > > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Paul N. Hengen >> Sent: Wednesday, November 28, 2007 7:21 PM >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez >> IDs >> >> >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find >> the start and end locations in the human genome. This seemed >> simple enough, so I started working through some of the >> examples for using the EntrezGene module at www.bioperl.org >> Of course this did not work because the core installation >> does not include this module. So, I think I have two choices >> (1) install the module (how?), or (2) find an easier way to >> get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research City of Hope >> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 >> USA mailto:paulhengen at coh.org >> >> -- >> View this message in context: >> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E >> ntrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Nov 30 23:37:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Nov 2007 22:37:50 -0600 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <000901c833bf$33d53500$0a02a8c0@AWALL> References: <000901c833bf$33d53500$0a02a8c0@AWALL> Message-ID: <75FF7E93-1633-4D43-9BC0-8BE2A6A7711D@uiuc.edu> Make sure to keep this on the list. ncbi_gi() is only in bioperl-live (CVS); my guess is you either somehow got 1.5.2 instead or the bioperl-live version is not found in your path. It's very likely the latter, as perl's looking for whatever else is present (which appears to be an older version of bioperl). That should give you a hint that the problem may be with your lib path. Try changing the 'Use lib '/home/awaller/bioperl-live/ Bio'' to: use lib '/home/awaller/bioperl-live'; chris On Nov 30, 2007, at 8:09 PM, alison waller wrote: > Okay so Now I'm really confused. > I edited > #!usr/bin/perl >> Use lib '/home/awaller/bioperl-live/Bio. > I ran the script below with the *special hit->ncbi from Chris. It > worked, > it was great, I got the gi! No errors, no bugs that I saw in > checking the > output. Then I went back in, edited the script to retrieve further > info > (specifically the strand). Saved it, now when I try to run it I get > the > same error message that I was previously getting. > > barrett ~ $ perl blast_parse_awcf.pl OldMoBlastxGiTest.txt 1 > Can't locate object method "ncbi_gi" via package > "Bio::Search::Hit::BlastHit" at blast_parse_awcf.pl line 50, > line > 189. > > Thanks soo much, > > > #!usr/bin/perl > > use strict; > use warnings; > use lib "/home/awaller/bioperl-live/Bio"; > use Bio::Perl; > use Bio::SearchIO; > > my $usage = "to run type: blast_parse_aw.pl <# of > hits per > query> \n"; if (@ARGV != 2) { die $usage; } > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! > \n"; > > my $report = Bio::SearchIO->new( > -file => $infile, > -format => "blast" > ); > > print OUT join("\t",qw( > Query > HitDesc > HitAccess > HitGi > HitBits > Evalue > %id > AlignLen > NumIdent > NumPos > gaps > Qframe > Qstrand > Hframe > Hstrand))."\n"; > > # Go through BLAST reports one by one > while ( my $result = $report->next_result ) { > my $ct = 0; > my @tophits = grep {$ct++ < $tophit } $result->hits; > if (scalar(@tophits) == 0) { > print OUT "no hits\n"; > } > for my $hit (@tophits) { > my $tophsp=$hit->hsp('best'); > # Print some tab-delimited data about this hit > print OUT join("\t", > $result->query_name, > $hit->description, > $hit->accession, > $hit->ncbi_gi, > $hit->bits, > $tophsp->evalue, > $tophsp->percent_identity, > $tophsp->length('total'), > $tophsp->num_identical, > $tophsp->num_conserved, > $tophsp->gaps, > $tophsp->query->frame, > $tophsp->strand('query'), > $tophsp->hit->frame, > $tophsp->strand('hit'), > )."\n"; > } > } > > > > > -----Original Message----- > From: Sendu Bala [mailto:bix at sendu.me.uk] > Sent: Friday, November 30, 2007 6:24 PM > To: alison waller > Subject: Re: [Bioperl-l] Problems installing bioperl (bioperl-live > tarball > from CVS) > > alison waller wrote: >> Thank you Sendu, >> >> So I'm trying the second option. I have downloaded the bioperl-live > tarball >> from the CVS on my windows laptop, and then moved it to my home >> directory > in >> the linux cluster where I unzipped and tared it. So I now have a > directory >> /home/awaller/bioperl-live. >> >> I edited my .bashrc file as below: >> Export PERL5LIB='/home/awaller/bioperl-live' >> >> I also edited a sample script to include: >> #!usr/bin/perl >> Use lib '/home/awaller/bioperl-live' > > Does this directory contain a 'Bio' directory with all the BioPerl > modules inside it? > > >> But it still isn't working. >> At the prompt I typed$ perl script.pl >> It gave me the warning - can't locate object method ncbi_gi which >> is why > I'm >> trying to download the CVS version as Chris Fields added code to >> make the >> ncbi-gi object. > > You'll have to give me the complete, unedited error message and > ideally > the script itself before I can help you further. > > >> Don't I have to do something similar to what the Build.PL file does? > > Probably not. It doesn't matter where your perl executable is, btw, as > long as the system knows how to run perl, which it obviously does. > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From barry.moore at genetics.utah.edu Thu Nov 1 00:03:01 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 31 Oct 2007 22:03:01 -0600 Subject: [Bioperl-l] BLAST output parsing In-Reply-To: References: <13519112.post@talk.nabble.com> Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu> Swapna- If you are using NCBI fasta files you can use files from NCBIs gene database to map your gene IDs to names and organisms. Look in particular at the files gene2accession, gene2refseq, and gene_info. For example, if you had RefSeq protein IDs like NP_123456, you could use gene2refseq to map those RefSeq accessions to gene IDs and then gene_info to map the gene IDs to organisms and gene name. B On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote: > Swapna, > >> I am new to bioperl. I did BLAST search of ~4000 genes and I need >> to parse >> it. I did use -m 9 option to get a tabular information of the >> blast data. >> But it does not include the gene names or the names of the >> organisms of each >> hit. Are there any parsers that can do this job ?? > > The -m 9 tabular output does not include gene descriptions and > organisms. It only includes the "gene id" that was present immediately > after the ">" sign in the FASTA file that was used to create the BLAST > database you specified with the -d option when you ran BLAST. > > Hence, no parser will help you. You either have to re-do the BLAST > with a different -m value that includes the information you desire, or > write code to convert your gene IDs into what you want. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 05:45:43 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 10:45:43 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de> Dear all, I have emboss installed on a windows machine. (Embosswin). I can run this from the dos command line and the path is present. However, when I try to call an emboss application from bioperl I get a "Application not found error" my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); gives the following error -------------------- WARNING --------------------- MSG: Application [fuzznuc] is not available! --------------------------------------------------- Can't call method "run" on an undefined value at searchPatterns.pl line 102. Can somebody help me fix this ? best regards Rohit -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 10:22:14 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:22:14 -0400 Subject: [Bioperl-l] PAML/Codeml parsing Message-ID: PAML4 breaks our PAML parser right now because the order of things in the result file has changed. Now sequences precede the information about the version or the program run. This means that $result- >get_seqs() fails because we don't parse the sequences. We'll see what we can do, but as usual with supporting 3rd party programs it is brittle when file formats change. Th -jason -- Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Nov 1 10:32:06 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:32:06 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Presumably the PATH is not getting set properly - you should play around printing the $ENV{PATH} variable in a perl script to see if actually contains the directory where the emboss programs are installed. Bioperl can only guess so much as to where to find an application. It is also possible that we aren't creating the proper path to the executable - you can print the executable path with print $fuzznuc->executable I believe unless it is throwing an error at the program() line. It looks like the code in the Factory object is a little fragile assuming that the programs HAVE to be in your $PATH. I don't know if windows+perl is special in any way that it run things so I can't really tell if there is specific things you have to do here. You may have to run this through cygwin in case PATH and such are just not available properly to windowsPerl. -jason On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. However, > when I > try to call > an emboss application from bioperl I get a "Application not found > error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at searchPatterns.pl > line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Thu Nov 1 10:54:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 09:54:09 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu> This worked for me previously when I tested with WinXP on my old machine using EMBOSS v5: ftp://emboss.open-bio.org/pub/EMBOSS/windows I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably better to use the latest EMBOSS version anyway so I suggest trying the version in the above link. I'll test it again today and let you know what I find. chris On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, >> when I >> try to call >> an emboss application from bioperl I get a "Application not found >> error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl >> line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Thu Nov 1 11:31:40 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 11:31:40 -0400 Subject: [Bioperl-l] PAML3 vs 4 Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org> Small tweaks were needed to parse PAML4 results. Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly now on both PAML 3 and 4. You'll need to get the latest code from CVS in order to see the changes to Bio/Tools/Phylo/PAML.pm I've added tests for PAML4 in the parser and the run code. If you have scripts that use codeml please give it a try. I have not attempted to play with BASEML or AAML results at this point so if you also have codes that use those programs, please try it out and provide bugreports if we need to fix things. -jason -- Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Nov 1 13:25:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 1 Nov 2007 10:25:30 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl onwindows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu> Sounds like a path issue. Try to tell bioperl the full path to the executable rather than just the executable name. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 2:46 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl: cannot run emboss programs > using bioperl onwindows > > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. > However, when I > try to call > an emboss application from bioperl I get a "Application not > found error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at > searchPatterns.pl line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 14:06:48 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:06:48 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de> Thanks for all the suggestions... but I unfortunately still cannot run emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), and the path is set correctly. I printed $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct location. I also tried setting the path directly but I'm not sure how to do this, so I tried this... my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); this also did not work. Also tried printing... $fuzznuc->executable() gave the following error again -------------------- WARNING --------------------- MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! --------------------------------------------------- Any more ideas ? thanks ! Rohit here's the code... use strict; use Bio::Factory::EMBOSS; use Data::Dumper; # # print "PATH=$ENV{PATH}\n"; # path contains C:\EMBOSSwin which is the correct location # embossversion is 2.10.0-Win-0.8 my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper ($f); my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe as well, print Dump ($fuzznuc); #dump of fuzznuc #$VAR1 = bless( { # '_programgroup' => {}, # '_programs' => {}, # '_groups' => {} # }, 'Bio::Factory::EMBOSS' ); #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work my $infile = "temp.fasta"; my $motif = "ATGTCGATC"; my $outfile = "test.out"; $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); Here's the error again.... #-------------------- WARNING --------------------- #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! #--------------------------------------------------- Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, when I >> try to call >> an emboss application from bioperl I get a "Application not found error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 14:37:24 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 14:37:24 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> You could try this - can't test it though so not sure. my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); -jason On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > Thanks for all the suggestions... but I unfortunately still cannot run > emboss. I am running the latest version of embosswin (2.10.0- > Win-0.8), > and the > path is set correctly. I printed $ENV{$PATH} and this contains > C:\EMBOSSwin which is the correct location. > I also tried setting the path directly but I'm not sure how to do > this, > so I tried this... > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > this also did not work. > > Also tried printing... > $fuzznuc->executable() > > gave the following error again > -------------------- WARNING --------------------- > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > --------------------------------------------------- > > Any more ideas ? > > thanks ! > Rohit > > > here's the code... > > use strict; > use Bio::Factory::EMBOSS; > use Data::Dumper; > > # > # print "PATH=$ENV{PATH}\n"; > # path contains C:\EMBOSSwin which is the correct location > # embossversion is 2.10.0-Win-0.8 > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > print Dumper ($f); > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > fuzznuc.exe > as well, > print Dump ($fuzznuc); > > #dump of fuzznuc > #$VAR1 = bless( { > # '_programgroup' => {}, > # '_programs' => {}, > # '_groups' => {} > # }, 'Bio::Factory::EMBOSS' ); > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > my $infile = "temp.fasta"; > my $motif = "ATGTCGATC"; > my $outfile = "test.out"; > > > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > > Here's the error again.... > > #-------------------- WARNING --------------------- > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > #--------------------------------------------------- > > > > > Jason Stajich wrote: >> Presumably the PATH is not getting set properly - you should play >> around printing the $ENV{PATH} variable in a perl script to see if >> actually contains the directory where the emboss programs are >> installed. Bioperl can only guess so much as to where to find an >> application. It is also possible that we aren't creating the proper >> path to the executable - you can print the executable path with >> print $fuzznuc->executable >> I believe unless it is throwing an error at the program() line. >> >> It looks like the code in the Factory object is a little fragile >> assuming that the programs HAVE to be in your $PATH. I don't know if >> windows+perl is special in any way that it run things so I can't >> really tell if there is specific things you have to do here. You may >> have to run this through cygwin in case PATH and such are just not >> available properly to windowsPerl. >> >> -jason >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >> >>> Dear all, >>> >>> I have emboss installed on a windows machine. (Embosswin). I can run >>> this from the dos command line and the path is present. However, >>> when I >>> try to call >>> an emboss application from bioperl I get a "Application not found >>> error" >>> >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> gives the following error >>> >>> -------------------- WARNING --------------------- >>> MSG: Application [fuzznuc] is not available! >>> --------------------------------------------------- >>> Can't call method "run" on an undefined value at >>> searchPatterns.pl line >>> 102. >>> >>> Can somebody help me fix this ? >>> >>> best regards >>> Rohit >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 14:41:41 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:41:41 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de> Hi Jason I tried this as well. This also gives the same error message. -Rohit Jason Stajich wrote: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > >> >> >> Thanks for all the suggestions... but I unfortunately still cannot run >> emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), >> and the >> path is set correctly. I printed $ENV{$PATH} and this contains >> C:\EMBOSSwin which is the correct location. >> I also tried setting the path directly but I'm not sure how to do this, >> so I tried this... >> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >> >> this also did not work. >> >> Also tried printing... >> $fuzznuc->executable() >> >> gave the following error again >> -------------------- WARNING --------------------- >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> --------------------------------------------------- >> >> Any more ideas ? >> >> thanks ! >> Rohit >> >> >> here's the code... >> >> use strict; >> use Bio::Factory::EMBOSS; >> use Data::Dumper; >> >> # >> # print "PATH=$ENV{PATH}\n"; >> # path contains C:\EMBOSSwin which is the correct location >> # embossversion is 2.10.0-Win-0.8 >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> print Dumper ($f); >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >> print Dump ($fuzznuc); >> >> #dump of fuzznuc >> #$VAR1 = bless( { >> # '_programgroup' => {}, >> # '_programs' => {}, >> # '_groups' => {} >> # }, 'Bio::Factory::EMBOSS' ); >> >> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >> >> my $infile = "temp.fasta"; >> my $motif = "ATGTCGATC"; >> my $outfile = "test.out"; >> >> >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> >> Here's the error again.... >> >> #-------------------- WARNING --------------------- >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> #--------------------------------------------------- >> >> >> >> >> Jason Stajich wrote: >>> Presumably the PATH is not getting set properly - you should play >>> around printing the $ENV{PATH} variable in a perl script to see if >>> actually contains the directory where the emboss programs are >>> installed. Bioperl can only guess so much as to where to find an >>> application. It is also possible that we aren't creating the proper >>> path to the executable - you can print the executable path with >>> print $fuzznuc->executable >>> I believe unless it is throwing an error at the program() line. >>> >>> It looks like the code in the Factory object is a little fragile >>> assuming that the programs HAVE to be in your $PATH. I don't know if >>> windows+perl is special in any way that it run things so I can't >>> really tell if there is specific things you have to do here. You may >>> have to run this through cygwin in case PATH and such are just not >>> available properly to windowsPerl. >>> >>> -jason >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>> >>>> Dear all, >>>> >>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>> this from the dos command line and the path is present. However, >>>> when I >>>> try to call >>>> an emboss application from bioperl I get a "Application not found >>>> error" >>>> >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> my $fuzznuc = $f->program('fuzznuc'); >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> gives the following error >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: Application [fuzznuc] is not available! >>>> --------------------------------------------------- >>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>> line >>>> 102. >>>> >>>> Can somebody help me fix this ? >>>> >>>> best regards >>>> Rohit >>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From MEC at stowers-institute.org Thu Nov 1 14:57:33 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 1 Nov 2007 13:57:33 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: in the code http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 there is a call to `wossname` (c.f. http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html ) is wossname in your path? Maybe it needs to be wossname.exe under windows? Malcolm Cook > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 1:42 PM > To: Jason Stajich > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs > usingbioperlonwindows > > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: > > You could try this - can't test it though so not sure. > > my $fuzznuc = $f->program('fuzznuc'); > > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > > > -jason > > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > >> > >> > >> Thanks for all the suggestions... but I unfortunately still cannot > >> run emboss. I am running the latest version of embosswin > >> (2.10.0-Win-0.8), and the path is set correctly. I printed > >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct > >> location. > >> I also tried setting the path directly but I'm not sure how to do > >> this, so I tried this... > >> > >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > >> > >> this also did not work. > >> > >> Also tried printing... > >> $fuzznuc->executable() > >> > >> gave the following error again > >> -------------------- WARNING --------------------- > >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> --------------------------------------------------- > >> > >> Any more ideas ? > >> > >> thanks ! > >> Rohit > >> > >> > >> here's the code... > >> > >> use strict; > >> use Bio::Factory::EMBOSS; > >> use Data::Dumper; > >> > >> # > >> # print "PATH=$ENV{PATH}\n"; > >> # path contains C:\EMBOSSwin which is the correct location # > >> embossversion is 2.10.0-Win-0.8 > >> > >> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS > application > >> object from the factory print Dumper ($f); my $fuzznuc = > >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe > as well, > >> print Dump ($fuzznuc); > >> > >> #dump of fuzznuc > >> #$VAR1 = bless( { > >> # '_programgroup' => {}, > >> # '_programs' => {}, > >> # '_groups' => {} > >> # }, 'Bio::Factory::EMBOSS' ); > >> > >> #print "executing -- >", $fuzznuc->executable, "\n" ; # > doesn't work > >> > >> my $infile = "temp.fasta"; > >> my $motif = "ATGTCGATC"; > >> my $outfile = "test.out"; > >> > >> > >> $fuzznuc->run( > >> { -sequence => $infile, > >> -pattern => $motif, > >> -outfile => $outfile > >> }); > >> > >> Here's the error again.... > >> > >> #-------------------- WARNING --------------------- > >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> #--------------------------------------------------- > >> > >> > >> > >> > >> Jason Stajich wrote: > >>> Presumably the PATH is not getting set properly - you should play > >>> around printing the $ENV{PATH} variable in a perl script > to see if > >>> actually contains the directory where the emboss programs are > >>> installed. Bioperl can only guess so much as to where to find an > >>> application. It is also possible that we aren't creating > the proper > >>> path to the executable - you can print the executable path with > >>> print $fuzznuc->executable I believe unless it is > throwing an error > >>> at the program() line. > >>> > >>> It looks like the code in the Factory object is a little fragile > >>> assuming that the programs HAVE to be in your $PATH. I > don't know > >>> if > >>> windows+perl is special in any way that it run things so I can't > >>> really tell if there is specific things you have to do > here. You may > >>> have to run this through cygwin in case PATH and such are > just not > >>> available properly to windowsPerl. > >>> > >>> -jason > >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >>> > >>>> Dear all, > >>>> > >>>> I have emboss installed on a windows machine. (Embosswin). I can > >>>> run this from the dos command line and the path is present. > >>>> However, when I try to call an emboss application from bioperl I > >>>> get a "Application not found error" > >>>> > >>>> > >>>> my $f = Bio::Factory::EMBOSS->new(); > >>>> # get an EMBOSS application object from the factory > >>>> my $fuzznuc = $f->program('fuzznuc'); > >>>> $fuzznuc->run( > >>>> { -sequence => $infile, > >>>> -pattern => $motif, > >>>> -outfile => $outfile > > >>>> }); > >>>> gives the following error > >>>> > >>>> -------------------- WARNING --------------------- > >>>> MSG: Application [fuzznuc] is not available! > >>>> --------------------------------------------------- > >>>> Can't call method "run" on an undefined value at > searchPatterns.pl > >>>> line 102. > >>>> > >>>> Can somebody help me fix this ? > >>>> > >>>> best regards > >>>> Rohit > >>>> > >>>> -- > >>>> > >>>> Dr. Rohit Ghai > >>>> Institute of Medical Microbiology > >>>> Faculty of Medicine > >>>> Justus-Liebig University > >>>> Frankfurter Strasse 107 > >>>> 35392 - Giessen > >>>> GERMANY > >>>> > >>>> Tel : 0049 (0)641-9946413 > >>>> Fax : 0049 (0)641-9946409 > >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> -- > >>> Jason Stajich > >>> jason at bioperl.org > >>> > >> > >> -- > >> > >> Dr. Rohit Ghai > >> Institute of Medical Microbiology > >> Faculty of Medicine > >> Justus-Liebig University > >> Frankfurter Strasse 107 > >> 35392 - Giessen > >> GERMANY > >> > >> Tel : 0049 (0)641-9946413 > >> Fax : 0049 (0)641-9946409 > >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Thu Nov 1 15:51:41 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Nov 2007 13:51:41 -0600 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx> Doesn't EMBOSS binaries live under 'bin'? Perhaps setting PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this: my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc'); Adding .exe might be worth trying as well. Mauricio. Cook, Malcolm wrote: > in the code > http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 > > there is a call to `wossname` (c.f. > http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html > ) > > is wossname in your path? > > Maybe it needs to be wossname.exe under windows? > > > Malcolm Cook > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai >> Sent: Thursday, November 01, 2007 1:42 PM >> To: Jason Stajich >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs >> usingbioperlonwindows >> >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot >>>> run emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), and the path is set correctly. I printed >>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct >>>> location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location # >>>> embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS >> application >>>> object from the factory print Dumper ($f); my $fuzznuc = >>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # >> doesn't work >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script >> to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating >> the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable I believe unless it is >> throwing an error >>>>> at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I >> don't know >>>>> if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do >> here. You may >>>>> have to run this through cygwin in case PATH and such are >> just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can >>>>>> run this from the dos command line and the path is present. >>>>>> However, when I try to call an emboss application from bioperl I >>>>>> get a "Application not found error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >> >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at >> searchPatterns.pl >>>>>> line 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> >>>>>> Dr. Rohit Ghai >>>>>> Institute of Medical Microbiology >>>>>> Faculty of Medicine >>>>>> Justus-Liebig University >>>>>> Frankfurter Strasse 107 >>>>>> 35392 - Giessen >>>>>> GERMANY >>>>>> >>>>>> Tel : 0049 (0)641-9946413 >>>>>> Fax : 0049 (0)641-9946409 >>>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Nov 1 16:07:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 15:07:39 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> I did a little investigating using my old PC and was able to get fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a hoop or two but I managed to get it working. First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. You need to remove EMBOSSWin and install the one I linked to previously (this is an actual EMBOSS beta release). It's possible older EMBOSSWin can be configured, but I don't plan on checking it out myself. Next, you need to ensure the binaries are in your PATH env. variable (test by running 'wossname' on the command line), then set EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP recognizes the UNIX'y form as a valid path. If you don't know how to set env. variables go here: http://vlaurie.com/computers2/Articles/environment.htm Once that is set up you should be able to run the script using the latest (greatest?) EMBOSS. chris On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: >> You could try this - can't test it though so not sure. >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >> >> -jason >> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >> >>> >>> >>> Thanks for all the suggestions... but I unfortunately still >>> cannot run >>> emboss. I am running the latest version of embosswin (2.10.0- >>> Win-0.8), >>> and the >>> path is set correctly. I printed $ENV{$PATH} and this contains >>> C:\EMBOSSwin which is the correct location. >>> I also tried setting the path directly but I'm not sure how to do >>> this, >>> so I tried this... >>> >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>> >>> this also did not work. >>> >>> Also tried printing... >>> $fuzznuc->executable() >>> >>> gave the following error again >>> -------------------- WARNING --------------------- >>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> --------------------------------------------------- >>> >>> Any more ideas ? >>> >>> thanks ! >>> Rohit >>> >>> >>> here's the code... >>> >>> use strict; >>> use Bio::Factory::EMBOSS; >>> use Data::Dumper; >>> >>> # >>> # print "PATH=$ENV{PATH}\n"; >>> # path contains C:\EMBOSSwin which is the correct location >>> # embossversion is 2.10.0-Win-0.8 >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> print Dumper ($f); >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>> fuzznuc.exe >>> as well, >>> print Dump ($fuzznuc); >>> >>> #dump of fuzznuc >>> #$VAR1 = bless( { >>> # '_programgroup' => {}, >>> # '_programs' => {}, >>> # '_groups' => {} >>> # }, 'Bio::Factory::EMBOSS' ); >>> >>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't >>> work >>> >>> my $infile = "temp.fasta"; >>> my $motif = "ATGTCGATC"; >>> my $outfile = "test.out"; >>> >>> >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> >>> Here's the error again.... >>> >>> #-------------------- WARNING --------------------- >>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> #--------------------------------------------------- >>> >>> >>> >>> >>> Jason Stajich wrote: >>>> Presumably the PATH is not getting set properly - you should play >>>> around printing the $ENV{PATH} variable in a perl script to see if >>>> actually contains the directory where the emboss programs are >>>> installed. Bioperl can only guess so much as to where to find an >>>> application. It is also possible that we aren't creating the >>>> proper >>>> path to the executable - you can print the executable path with >>>> print $fuzznuc->executable >>>> I believe unless it is throwing an error at the program() line. >>>> >>>> It looks like the code in the Factory object is a little fragile >>>> assuming that the programs HAVE to be in your $PATH. I don't >>>> know if >>>> windows+perl is special in any way that it run things so I can't >>>> really tell if there is specific things you have to do here. You >>>> may >>>> have to run this through cygwin in case PATH and such are just not >>>> available properly to windowsPerl. >>>> >>>> -jason >>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>> >>>>> Dear all, >>>>> >>>>> I have emboss installed on a windows machine. (Embosswin). I >>>>> can run >>>>> this from the dos command line and the path is present. However, >>>>> when I >>>>> try to call >>>>> an emboss application from bioperl I get a "Application not found >>>>> error" >>>>> >>>>> >>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>> # get an EMBOSS application object from the factory >>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>> $fuzznuc->run( >>>>> { -sequence => $infile, >>>>> -pattern => $motif, >>>>> -outfile => $outfile >>>>> }); >>>>> gives the following error >>>>> >>>>> -------------------- WARNING --------------------- >>>>> MSG: Application [fuzznuc] is not available! >>>>> --------------------------------------------------- >>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>> line >>>>> 102. >>>>> >>>>> Can somebody help me fix this ? >>>>> >>>>> best regards >>>>> Rohit >>>>> >>>>> -- >>>>> >>>>> Dr. Rohit Ghai >>>>> Institute of Medical Microbiology >>>>> Faculty of Medicine >>>>> Justus-Liebig University >>>>> Frankfurter Strasse 107 >>>>> 35392 - Giessen >>>>> GERMANY >>>>> >>>>> Tel : 0049 (0)641-9946413 >>>>> Fax : 0049 (0)641-9946409 >>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From neetisomaiya at gmail.com Fri Nov 2 00:20:27 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 2 Nov 2007 09:50:27 +0530 Subject: [Bioperl-l] need help Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Hi, This is a perl question, not bioperl. Can anyone point me to a perl program/code/function which can calculate the number of days between any two given dates. Any help will be deeply appreciated. Thanks. -- -Neeti Even my blood says, B positive From whs at ebi.ac.uk Fri Nov 2 01:01:20 2007 From: whs at ebi.ac.uk (Will Spooner) Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT) Subject: [Bioperl-l] need help In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Message-ID: Hi Neeti, A non-bioperl answer to your perl questio; Date::Calc should do the trick. Will On Fri, 2 Nov 2007, neeti somaiya wrote: > Hi, > > This is a perl question, not bioperl. > Can anyone point me to a perl program/code/function which can calculate the > number of days between any two given dates. > Any help will be deeply appreciated. > Thanks. > > From smarkel at accelrys.com Sat Nov 3 02:01:38 2007 From: smarkel at accelrys.com (Scott Markel) Date: Fri, 2 Nov 2007 23:01:38 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: I set multiple environment variables in my code. $ENV{EMBOSS_ROOT} = $embossPath; $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); $ENV{EMBOSS_DB_DIR} = File::Spec->catdir($embossPath, "test"); $ENV{EMBOSS_DATA} = File::Spec->catdir($embossPath, "data"); $ENV{PATH} = $embossPath; I found it necessary to set both PATH and EMBOSS_ROOT. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > > > > > Thanks for all the suggestions... but I unfortunately still cannot run > > emboss. I am running the latest version of embosswin (2.10.0- > > Win-0.8), > > and the > > path is set correctly. I printed $ENV{$PATH} and this contains > > C:\EMBOSSwin which is the correct location. > > I also tried setting the path directly but I'm not sure how to do > > this, > > so I tried this... > > > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > > > this also did not work. > > > > Also tried printing... > > $fuzznuc->executable() > > > > gave the following error again > > -------------------- WARNING --------------------- > > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > --------------------------------------------------- > > > > Any more ideas ? > > > > thanks ! > > Rohit > > > > > > here's the code... > > > > use strict; > > use Bio::Factory::EMBOSS; > > use Data::Dumper; > > > > # > > # print "PATH=$ENV{PATH}\n"; > > # path contains C:\EMBOSSwin which is the correct location > > # embossversion is 2.10.0-Win-0.8 > > > > my $f = Bio::Factory::EMBOSS->new(); > > # get an EMBOSS application object from the factory > > print Dumper ($f); > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > > fuzznuc.exe > > as well, > > print Dump ($fuzznuc); > > > > #dump of fuzznuc > > #$VAR1 = bless( { > > # '_programgroup' => {}, > > # '_programs' => {}, > > # '_groups' => {} > > # }, 'Bio::Factory::EMBOSS' ); > > > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > > > my $infile = "temp.fasta"; > > my $motif = "ATGTCGATC"; > > my $outfile = "test.out"; > > > > > > $fuzznuc->run( > > { -sequence => $infile, > > -pattern => $motif, > > -outfile => $outfile > > }); > > > > Here's the error again.... > > > > #-------------------- WARNING --------------------- > > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > #--------------------------------------------------- > > > > > > > > > > Jason Stajich wrote: > >> Presumably the PATH is not getting set properly - you should play > >> around printing the $ENV{PATH} variable in a perl script to see if > >> actually contains the directory where the emboss programs are > >> installed. Bioperl can only guess so much as to where to find an > >> application. It is also possible that we aren't creating the proper > >> path to the executable - you can print the executable path with > >> print $fuzznuc->executable > >> I believe unless it is throwing an error at the program() line. > >> > >> It looks like the code in the Factory object is a little fragile > >> assuming that the programs HAVE to be in your $PATH. I don't know if > >> windows+perl is special in any way that it run things so I can't > >> really tell if there is specific things you have to do here. You may > >> have to run this through cygwin in case PATH and such are just not > >> available properly to windowsPerl. > >> > >> -jason > >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> > >>> Dear all, > >>> > >>> I have emboss installed on a windows machine. (Embosswin). I can run > >>> this from the dos command line and the path is present. However, > >>> when I > >>> try to call > >>> an emboss application from bioperl I get a "Application not found > >>> error" > >>> > >>> > >>> my $f = Bio::Factory::EMBOSS->new(); > >>> # get an EMBOSS application object from the factory > >>> my $fuzznuc = $f->program('fuzznuc'); > >>> $fuzznuc->run( > >>> { -sequence => $infile, > >>> -pattern => $motif, > >>> -outfile => $outfile > >>> }); > >>> gives the following error > >>> > >>> -------------------- WARNING --------------------- > >>> MSG: Application [fuzznuc] is not available! > >>> --------------------------------------------------- > >>> Can't call method "run" on an undefined value at > >>> searchPatterns.pl line > >>> 102. > >>> > >>> Can somebody help me fix this ? > >>> > >>> best regards > >>> Rohit > >>> > >>> -- > >>> > >>> Dr. Rohit Ghai > >>> Institute of Medical Microbiology > >>> Faculty of Medicine > >>> Justus-Liebig University > >>> Frankfurter Strasse 107 > >>> 35392 - Giessen > >>> GERMANY > >>> > >>> Tel : 0049 (0)641-9946413 > >>> Fax : 0049 (0)641-9946409 > >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > > > > -- > > > > Dr. Rohit Ghai > > Institute of Medical Microbiology > > Faculty of Medicine > > Justus-Liebig University > > Frankfurter Strasse 107 > > 35392 - Giessen > > GERMANY > > > > Tel : 0049 (0)641-9946413 > > Fax : 0049 (0)641-9946409 > > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Sat Nov 3 10:07:52 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Sat, 03 Nov 2007 15:07:52 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. #however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; Chris Fields wrote: > I did a little investigating using my old PC and was able to get > fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a > hoop or two but I managed to get it working. > > First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. > You need to remove EMBOSSWin and install the one I linked to > previously (this is an actual EMBOSS beta release). It's possible > older EMBOSSWin can be configured, but I don't plan on checking it out > myself. > > Next, you need to ensure the binaries are in your PATH env. variable > (test by running 'wossname' on the command line), then set EMBOSS_DATA > to point at the EMBOSS data directory using a UNIX-like path (i.e. > 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP > recognizes the UNIX'y form as a valid path. If you don't know how to > set env. variables go here: > > http://vlaurie.com/computers2/Articles/environment.htm > > Once that is set up you should be able to run the script using the > latest (greatest?) EMBOSS. > > chris > > On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot run >>>> emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), >>>> and the >>>> path is set correctly. I printed $ENV{$PATH} and this contains >>>> C:\EMBOSSwin which is the correct location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, >>>> so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location >>>> # embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> print Dumper ($f); >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>>> fuzznuc.exe >>>> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >>>> >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable >>>>> I believe unless it is throwing an error at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I don't know if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do here. You may >>>>> have to run this through cygwin in case PATH and such are just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>>>> this from the dos command line and the path is present. However, >>>>>> when I >>>>>> try to call >>>>>> an emboss application from bioperl I get a "Application not found >>>>>> error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>>> line >>>>>> 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> > > From hlapp at gmx.net Sun Nov 4 12:42:13 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 4 Nov 2007 12:42:13 -0500 Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net> Hi Stefanie, sorry for taking so long to respond - your email got buried in a pile while I was away on travel. The Bio::SeqFeature::Gene::* modules were written mostly with the motivation to have a model that can represent the results of gene predictors. GenBank AFAIK doesn't annotate introns explicitly, though they should be implicit from cDNA (or mRNA? or gene, as you say) features on genomic sequence. The Bioperl SeqIO parsers won't transform those into a Bio::SeqFeature::Gene-based model, but instead will yield just plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent processing to build these into more hierarchical models. I'm not sure whether someone's done this already for GenBank-type feature tables. There is a Unflattener that at least attempts to build a feature hierarchy from the flat array that's compliant with the Sequence Ontology (or so I recall). I'm copying the list in case others have additional suggestions. -hilmar On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote: > > > Hello Hilmar, > > I have a question about your bioperl module > Bio::SeqFeature::Gene::Transcript: > > I can't figure out how to generate the $gene object for use in this > line: > @introns = $gene->introns(); > > The data I'm working with is a local file in genbank format, and > I'm interested in extracting intron sequences (and maybe flanking > exons) for certain genes. I have been trying to get the introns via > the sequence features ('CDS' or 'gene'), but this has not been > working. Which approach will I have to take? > I'd be very grateful if you could point me into the right direction! > > Hope things are going well in Durham! And thank you in advance! > > Stefanie > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From downloadondemand at gmail.com Sun Nov 4 13:39:42 2007 From: downloadondemand at gmail.com (download on demand) Date: Sun, 4 Nov 2007 20:39:42 +0200 Subject: [Bioperl-l] Help with Bio::SeqIO Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Hi to all. I have a problem with a simplest script: use Bio::SeqIO; # get command-line arguments, or die with a usage statement my $usage = "x2y.pl infile infileformat outfile outfileformat\n"; my $infile = shift or die $usage; my $infileformat = shift or die $usage; # my $outfile = shift or die $usage; my $outfileformat = shift or die $usage; # create one SeqIO object to read in,and another to write out my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, '-format' => $outfileformat); # write each entry in the input file to the output file while (my $inseq = $seq_in->next_seq) { # $seq_out->write_seq($inseq); # Whole sequence not needed for my $feat_object ($inseq->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->get_tag_values('product'),"\n"; print $feat_object->location->start,"..",$feat_object->location->end,"\n"; print $feat_object->spliced_seq->seq,"\n\n"; } } The result seems OK to me, but in case of first CDS of NC_005213.gbk from here the output is wrong: It is: hypothetical protein 1..490885 TAAATGCGATTGCTATTAGAA..................................Truncated sequence................................... Should be: hypothetical protein 879..490883 ATGCGATTGCTATTAGAA...................................Truncated sequence....................................TAA This CDS have an unnatural location string: CDS complement(join(490883..490885,1..879)), but spliced_seq should handle these things? Please help me! Best regards, N. From cjfields at uiuc.edu Sun Nov 4 19:08:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 4 Nov 2007 18:08:34 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Pass in (-nosort => 1) to spliced_seq: print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; This ensures no sorting of sublocations occurs, if you want for instance typical GenBank/EMBL 'join' behavior. To the other devs: shouldn't -nosort be the default behavior when the split location is a 'join'? In other words, should spliced_seq() be modified to take into account the split location type when returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly indicates the order of the sequences is important when joined together; the current behavior is more like that for 'order'. chris On Nov 4, 2007, at 12:39 PM, download on demand wrote: > Hi to all. > > I have a problem with a simplest script: > > > > use Bio::SeqIO; > # get command-line arguments, or die with a usage statement > my $usage = "x2y.pl infile infileformat outfile > outfileformat\n"; > my $infile = shift or die $usage; > my $infileformat = shift or die $usage; > # my $outfile = shift or die $usage; > my $outfileformat = shift or die $usage; > > # create one SeqIO object to read in,and another to write out > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, > '-format' => $outfileformat); > > # write each entry in the input file to the output file > while (my $inseq = $seq_in->next_seq) { > > # $seq_out->write_seq($inseq); # Whole sequence not needed > > for my $feat_object ($inseq->get_SeqFeatures) > { > if ($feat_object->primary_tag eq "CDS") > { > print $feat_object->get_tag_values('product'),"\n"; > print > $feat_object->location->start,"..",$feat_object->location->end,"\n"; > print $feat_object->spliced_seq->seq,"\n\n"; > } > } > > > > The result seems OK to me, but in case of first CDS of > NC_005213.gbk from > here > the > output is wrong: > > It is: > hypothetical protein > 1..490885 > TAAATGCGATTGCTATTAGAA..................................Truncated > sequence................................... > > Should be: > hypothetical protein > 879..490883 > ATGCGATTGCTATTAGAA...................................Truncated > sequence....................................TAA > > > > This CDS have an unnatural location string: > CDS complement(join(490883..490885,1..879)), but > spliced_seq > should handle these things? > > Please help me! > Best regards, N. > _______________________________________________ > From jean-luc.jany at univ-brest.fr Mon Nov 5 03:26:52 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Mon, 05 Nov 2007 09:26:52 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <472ED3CC.2050305@univ-brest.fr> Dear Bioperl and Mac users, I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables. I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?) Actually, my blast file is in myname directory and comprises a /bin and a /data file. I have got my blastall and other executables in myname/blast/bin/blastall. Thank you in anticipation for your help. Jean-Luc From Rohit.Ghai at mikrobio.med.uni-giessen.de Mon Nov 5 06:36:16 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Mon, 05 Nov 2007 12:36:16 +0100 Subject: [Bioperl-l] bioperl and emboss on windows Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing # # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. # # # # however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; From neetisomaiya at gmail.com Mon Nov 5 07:20:04 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 5 Nov 2007 17:50:04 +0530 Subject: [Bioperl-l] perl question Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Again a perl question, and maybe a very trivial one. How do I terminate a number like 3.1232010098 to only 3 decimal places in perl? -- -Neeti Even my blood says, B positive From biology0046 at hotmail.com Mon Nov 5 07:16:13 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Mon, 05 Nov 2007 12:16:13 +0000 Subject: [Bioperl-l] how to extract intron information from gff files. Message-ID: Dear all: i got a poplar genome gff file like this: LG_I src exon 2598 3280 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 2598 3280 . - 0 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 4 LG_I src start_codon 3278 3280 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src stop_codon 2598 2600 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src exon 3544 3918 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 3544 3918 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 3 LG_I src exon 4258 4740 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 4258 4740 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 2 LG_I src exon 5344 6388 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 5344 6388 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 1 LG_I src exon 8259 8528 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8259 8528 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 3 LG_I src stop_codon 8259 8261 . - 0 name "fgenesh1_pg.C_LG_I000002" LG_I src exon 8897 8987 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8897 8987 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 2 LG_I src exon 9831 9892 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 9831 9892 . - 1 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 1 LG_I src start_codon 9890 9892 . - 0 name "fgenesh1_pg.C_LG_I000002" I try to use Bio::DB::GFF, but this module only applies to methods given in the gff file. what i want to get is "intron, 5utr, 3utr", but this information do not contain in this gff file. how can i get these information through bioperl? This file do not contain intron information if i consider gaps between exons as introns, non cds parts of the first and last exon as utrs, how can i extract them through this gff file. Thanks~~ Wenkai _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From spiros at lokku.com Mon Nov 5 07:36:36 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Mon, 5 Nov 2007 12:36:36 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: Hey, use the `sprintf` function. More information can be found at , http://perldoc.perl.org/functions/sprintf.html. For more proper rounding, you could use the Math::Round module, http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm. hope this helps, spiros On 11/5/07, neeti somaiya wrote: > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ak at ebi.ac.uk Mon Nov 5 07:43:06 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 12:43:06 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <20071105124305.GC4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? When displaying: printf( "The number is %.3f\n", $number ); When making a string: my $string = sprintf( "%.3f", $number ); BTW, this is cutting, not rounding. Cheers, Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From t.nugent at cs.ucl.ac.uk Mon Nov 5 07:37:15 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 05 Nov 2007 12:37:15 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F0E7B.60303@cs.ucl.ac.uk> Use Math:Round and nearest_ceil: http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From bix at sendu.me.uk Mon Nov 5 07:47:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 12:47:17 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F10D5.5060006@sendu.me.uk> neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? Please don't use this list to ask general Perl questions. See these instead: http://perldoc.perl.org/perlfaq4.html http://lists.cpan.org/ http://www.perlmonks.org/ $rounded = sprintf("%.3f", $number); From Marc.Logghe at DEVGEN.com Mon Nov 5 07:39:36 2007 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Mon, 5 Nov 2007 13:39:36 +0100 Subject: [Bioperl-l] perl question References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com> Hi, Have a look at http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w idth In your particular case: my $f = 3.1232010098; printf "%0.3f", $f; HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > neeti somaiya > Sent: Monday, November 05, 2007 1:20 PM > To: bioperl-l > Subject: [Bioperl-l] perl question > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 > decimal places in perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Mon Nov 5 08:24:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 13:24:25 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <20071105124305.GC4491@ebi.ac.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> Message-ID: <472F1989.90105@sendu.me.uk> Andreas Kahari wrote: > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: >> Again a perl question, and maybe a very trivial one. >> How do I terminate a number like 3.1232010098 to only 3 decimal places in >> perl? > > When displaying: > > printf( "The number is %.3f\n", $number ); > > When making a string: > > my $string = sprintf( "%.3f", $number ); > > > BTW, this is cutting, not rounding. (s)printf rounds (ie. doesn't simply truncate), though for critical applications you should use your own rounding algorithm. From ak at ebi.ac.uk Mon Nov 5 08:56:24 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 13:56:24 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <472F1989.90105@sendu.me.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk> Message-ID: <20071105135624.GD4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote: > Andreas Kahari wrote: > > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > >> Again a perl question, and maybe a very trivial one. > >> How do I terminate a number like 3.1232010098 to only 3 decimal places in > >> perl? > > > > When displaying: > > > > printf( "The number is %.3f\n", $number ); > > > > When making a string: > > > > my $string = sprintf( "%.3f", $number ); > > > > > > BTW, this is cutting, not rounding. > > (s)printf rounds (ie. doesn't simply truncate), though for critical > applications you should use your own rounding algorithm. They do indeed. Mea culpa. Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From jay at jays.net Mon Nov 5 10:14:17 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 10:14:17 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > To the other devs: shouldn't -nosort be the default behavior when the > split location is a 'join'? I certainly think so. > In other words, should spliced_seq() be > modified to take into account the split location type when returning > sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly > indicates the order of the sequences is important when joined > together; the current behavior is more like that for 'order'. I don't see any value to the sorting algorithm. All tests invoke - nosort => 1 (except a phase test where nosort doesn't matter anyway). In my limited experience the sorting only serves to break real-world splicing. If there is no valid use then we can remove ~20 lines from SeqFeatureI.pm circa line 505. If there is a valid use and someone would be so kind as to educate me I'd be happy to add tests which demonstrate them. :) P.S. CSHL is neato. I plan on understanding some of this stuff some day. :) j http://www.bioperl.org/wiki/User:Jhannah From hlapp at duke.edu Mon Nov 5 11:03:16 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 11:03:16 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: I agree that there should be a meaningful default that results in "doing the right thing" in most cases if the user doesn't intervene. I'm not sure I understand all the details, but it sounds sorting or not sorting should depend on the split location type unless the user overrides it by argument. That's what you're suggesting, right? -hilmar On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > Pass in (-nosort => 1) to spliced_seq: > > print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; > > This ensures no sorting of sublocations occurs, if you want for > instance typical GenBank/EMBL 'join' behavior. > > To the other devs: shouldn't -nosort be the default behavior when > the split location is a 'join'? In other words, should spliced_seq > () be modified to take into account the split location type when > returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' > explicitly indicates the order of the sequences is important when > joined together; the current behavior is more like that for 'order'. > > chris > > On Nov 4, 2007, at 12:39 PM, download on demand wrote: > >> Hi to all. >> >> I have a problem with a simplest script: >> >> >> >> use Bio::SeqIO; >> # get command-line arguments, or die with a usage statement >> my $usage = "x2y.pl infile infileformat outfile >> outfileformat\n"; >> my $infile = shift or die $usage; >> my $infileformat = shift or die $usage; >> # my $outfile = shift or die $usage; >> my $outfileformat = shift or die $usage; >> >> # create one SeqIO object to read in,and another to write >> out >> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >> '-format' => $infileformat); >> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >> '-format' => $outfileformat); >> >> # write each entry in the input file to the output file >> while (my $inseq = $seq_in->next_seq) { >> >> # $seq_out->write_seq($inseq); # Whole sequence not needed >> >> for my $feat_object ($inseq->get_SeqFeatures) >> { >> if ($feat_object->primary_tag eq "CDS") >> { >> print $feat_object->get_tag_values('product'),"\n"; >> print >> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >> print $feat_object->spliced_seq->seq,"\n\n"; >> } >> } >> >> >> >> The result seems OK to me, but in case of first CDS of >> NC_005213.gbk from >> here > Nanoarchaeum_equitans/> the >> output is wrong: >> >> It is: >> hypothetical protein >> 1..490885 >> TAAATGCGATTGCTATTAGAA..................................Truncated >> sequence................................... >> >> Should be: >> hypothetical protein >> 879..490883 >> ATGCGATTGCTATTAGAA...................................Truncated >> sequence....................................TAA >> >> >> >> This CDS have an unnatural location string: >> CDS complement(join(490883..490885,1..879)), but >> spliced_seq >> should handle these things? >> >> Please help me! >> Best regards, N. >> _______________________________________________ >> > > > From bernd.web at gmail.com Mon Nov 5 11:53:01 2007 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 5 Nov 2007 17:53:01 +0100 Subject: [Bioperl-l] PSI-BLAST Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com> Hi, Is it possible with SearchIO to select a specific iteration (Results from round i) part of the PSI-blast report, when parsing this with SearchIO::blast? It seems the parser parses the complete report. If not implemented I could of course extract the specific part of the psi-blast report and then give it too SearchIO (e.g. with IO::String), but maybe I am missing a built-in option? Regards, Bernd From jay at jays.net Mon Nov 5 11:54:13 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 11:54:13 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? If someone knows why spliced_seq() should ever sort then I'm suggesting we add a test demonstrating a useful example of that. If no one has a useful example of when you would want spliced_seq() to sort then I'm suggesting we remove the sorting altogether and nosort goes away. I can provide/add many examples where sorting is bad. I do not know of a case where sorting is good. j http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Mon Nov 5 12:07:10 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Nov 2007 12:07:10 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: At one point the location order was not respected/saved I believe. I guess we will just assume the user will build up a SplitLocation in order (i.e. add_SubLocation). I'll try and remember if there were any other particular reasons. -jason On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar > > On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > >> Pass in (-nosort => 1) to spliced_seq: >> >> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >> >> This ensures no sorting of sublocations occurs, if you want for >> instance typical GenBank/EMBL 'join' behavior. >> >> To the other devs: shouldn't -nosort be the default behavior when >> the split location is a 'join'? In other words, should spliced_seq >> () be modified to take into account the split location type when >> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >> explicitly indicates the order of the sequences is important when >> joined together; the current behavior is more like that for 'order'. >> >> chris >> >> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >> >>> Hi to all. >>> >>> I have a problem with a simplest script: >>> >>> >>> >>> use Bio::SeqIO; >>> # get command-line arguments, or die with a usage statement >>> my $usage = "x2y.pl infile infileformat outfile >>> outfileformat\n"; >>> my $infile = shift or die $usage; >>> my $infileformat = shift or die $usage; >>> # my $outfile = shift or die $usage; >>> my $outfileformat = shift or die $usage; >>> >>> # create one SeqIO object to read in,and another to write >>> out >>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>> '-format' => $infileformat); >>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>> '-format' => $outfileformat); >>> >>> # write each entry in the input file to the output file >>> while (my $inseq = $seq_in->next_seq) { >>> >>> # $seq_out->write_seq($inseq); # Whole sequence not >>> needed >>> >>> for my $feat_object ($inseq->get_SeqFeatures) >>> { >>> if ($feat_object->primary_tag eq "CDS") >>> { >>> print $feat_object->get_tag_values('product'),"\n"; >>> print >>> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >>> print $feat_object->spliced_seq->seq,"\n\n"; >>> } >>> } >>> >>> >>> >>> The result seems OK to me, but in case of first CDS of >>> NC_005213.gbk from >>> here >> Nanoarchaeum_equitans/> the >>> output is wrong: >>> >>> It is: >>> hypothetical protein >>> 1..490885 >>> TAAATGCGATTGCTATTAGAA..................................Truncated >>> sequence................................... >>> >>> Should be: >>> hypothetical protein >>> 879..490883 >>> ATGCGATTGCTATTAGAA...................................Truncated >>> sequence....................................TAA >>> >>> >>> >>> This CDS have an unnatural location string: >>> CDS complement(join(490883..490885,1..879)), but >>> spliced_seq >>> should handle these things? >>> >>> Please help me! >>> Best regards, N. >>> _______________________________________________ >>> >> >> >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Mon Nov 5 12:16:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:16:10 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Yes, we would sort based on the splittype() and default to a particular behavior ('join') if one isn't designated, maybe with a warning indicating the splittype() isn't defined. Using an 'order' or other defined types could also delineate a default sort/nosort behavior (probably the previous as it would replicate prior behavior). chris On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar From cjfields at uiuc.edu Mon Nov 5 12:20:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:20:35 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu> On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote: > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? > > If someone knows why spliced_seq() should ever sort then I'm > suggesting we add a test demonstrating a useful example of that. > > If no one has a useful example of when you would want spliced_seq() > to sort then I'm suggesting we remove the sorting altogether and > nosort goes away. > > I can provide/add many examples where sorting is bad. I do not know > of a case where sorting is good. > > j > http://www.bioperl.org/wiki/User:Jhannah The behavior would be based on the current use of 'join', 'order', and 'bond' (the latter in GenPept records). I documented some cases here a while back: http://www.bioperl.org/wiki/BioPerl_Locations#Split chris From hlapp at duke.edu Mon Nov 5 12:32:24 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 12:32:24 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu> Sounds good to me. -hilmar On Nov 5, 2007, at 12:16 PM, Chris Fields wrote: > Yes, we would sort based on the splittype() and default to a > particular behavior ('join') if one isn't designated, maybe with a > warning indicating the splittype() isn't defined. Using an 'order' > or other defined types could also delineate a default sort/nosort > behavior (probably the previous as it would replicate prior behavior). > > chris > > On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 12:41:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:41:27 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: It may have something to do with remote locations or setting strand() in sublocations. This may have popped up in relation to a LocationI code audit I proposed a while back on the list which I never got around to. Oh well... I at least managed getting a wiki page started in case we decided to make changes, with the intention of making it a HOWTO at some point: http://www.bioperl.org/wiki/BioPerl_Locations If we go through with the changes to spliced_seq(), should it be implemented for inclusion in v1.6 or wait until v1.7? chris On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote: > > At one point the location order was not respected/saved I believe. > I guess we will just assume the user will build up a SplitLocation > in order (i.e. add_SubLocation). I'll try and remember if there > were any other particular reasons. > > > -jason > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar >> >> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: >> >>> Pass in (-nosort => 1) to spliced_seq: >>> >>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >>> >>> This ensures no sorting of sublocations occurs, if you want for >>> instance typical GenBank/EMBL 'join' behavior. >>> >>> To the other devs: shouldn't -nosort be the default behavior when >>> the split location is a 'join'? In other words, should spliced_seq >>> () be modified to take into account the split location type when >>> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >>> explicitly indicates the order of the sequences is important when >>> joined together; the current behavior is more like that for 'order'. >>> >>> chris >>> >>> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >>> >>>> Hi to all. >>>> >>>> I have a problem with a simplest script: >>>> >>>> >>>> >>>> use Bio::SeqIO; >>>> # get command-line arguments, or die with a usage >>>> statement >>>> my $usage = "x2y.pl infile infileformat outfile >>>> outfileformat\n"; >>>> my $infile = shift or die $usage; >>>> my $infileformat = shift or die $usage; >>>> # my $outfile = shift or die $usage; >>>> my $outfileformat = shift or die $usage; >>>> >>>> # create one SeqIO object to read in,and another to write >>>> out >>>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>>> '-format' => $infileformat); >>>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>>> '-format' => >>>> $outfileformat); >>>> >>>> # write each entry in the input file to the output file >>>> while (my $inseq = $seq_in->next_seq) { >>>> >>>> # $seq_out->write_seq($inseq); # Whole sequence not >>>> needed >>>> >>>> for my $feat_object ($inseq->get_SeqFeatures) >>>> { >>>> if ($feat_object->primary_tag eq "CDS") >>>> { >>>> print $feat_object->get_tag_values('product'),"\n"; >>>> print >>>> $feat_object->location->start,"..",$feat_object->location- >>>> >end,"\n"; >>>> print $feat_object->spliced_seq->seq,"\n\n"; >>>> } >>>> } >>>> >>>> >>>> >>>> The result seems OK to me, but in case of first CDS of >>>> NC_005213.gbk from >>>> here >>> Nanoarchaeum_equitans/> the >>>> output is wrong: >>>> >>>> It is: >>>> hypothetical protein >>>> 1..490885 >>>> TAAATGCGATTGCTATTAGAA..................................Truncated >>>> sequence................................... >>>> >>>> Should be: >>>> hypothetical protein >>>> 879..490883 >>>> ATGCGATTGCTATTAGAA...................................Truncated >>>> sequence....................................TAA >>>> >>>> >>>> >>>> This CDS have an unnatural location string: >>>> CDS complement(join(490883..490885,1..879)), but >>>> spliced_seq >>>> should handle these things? >>>> >>>> Please help me! >>>> Best regards, N. >>>> _______________________________________________ >>>> >>> >>> >>> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Mon Nov 5 11:05:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 05 Nov 2007 12:05:41 -0400 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: <472ED3CC.2050305@univ-brest.fr> Message-ID: Jean-luc, >From what you written it sounds like you're using bash and not some other shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file in your home directory, as well as a .ncbirc file. This should work. I'm no Unix expert but I've always configured tcsh on the Mac in the same ways I'd configure it on Linux machines. Similarly, if you're using bash then it will read its .bashrc file, regardless of what flavor of Unix you use (and the same thing holds true for zsh or csh or ...). Brian O. On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > Dear Bioperl and Mac users, > > I am a Mac user and would like to run a script I made using > Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate > to Bioperl the pathway to Blastall and other executables. > > I read carefully the following link > http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the > path to Blast, but I guess the way to proceed is slightly different in Mac and > that I should not create .ncbirc and .bashrc files (e.g. should I modify the > .profile file instead of .bashrc?) > > Actually, my blast file is in myname directory and comprises a /bin and a > /data file. I have got my blastall and other executables in > myname/blast/bin/blastall. > > Thank you in anticipation for your help. > > Jean-Luc > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Nov 5 13:35:56 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 05 Nov 2007 12:35:56 -0600 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: References: Message-ID: <472F628C.2000506@campus.iztacala.unam.mx> If the ~/.bashrc file doesn't work for you, try renaming it to ~/.bash_profile and re-login, that might work best. ~/.bashrc works as an individual per-interactive-shell startup file, whereas ~/.bash_profile is a personal initialization file, executed for login shells. Hope this helps. Regards, Mauricio. Brian Osborne wrote: > Jean-luc, > >>From what you written it sounds like you're using bash and not some other > shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file > in your home directory, as well as a .ncbirc file. This should work. > > I'm no Unix expert but I've always configured tcsh on the Mac in the same > ways I'd configure it on Linux machines. Similarly, if you're using bash > then it will read its .bashrc file, regardless of what flavor of Unix you > use (and the same thing holds true for zsh or csh or ...). > > Brian O. > > > On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > >> Dear Bioperl and Mac users, >> >> I am a Mac user and would like to run a script I made using >> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate >> to Bioperl the pathway to Blastall and other executables. >> >> I read carefully the following link >> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the >> path to Blast, but I guess the way to proceed is slightly different in Mac and >> that I should not create .ncbirc and .bashrc files (e.g. should I modify the >> .profile file instead of .bashrc?) >> >> Actually, my blast file is in myname directory and comprises a /bin and a >> /data file. I have got my blastall and other executables in >> myname/blast/bin/blastall. >> >> Thank you in anticipation for your help. >> >> Jean-Luc >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at duke.edu Mon Nov 5 16:04:11 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 16:04:11 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > If we go through with the changes to spliced_seq(), should it be > implemented for inclusion in v1.6 or wait until v1.7? I would say they should be implemented ASAP because they 1) should not change behavior for those for which the current default behavior was already broken (and who therefore pass in --no_sort), and 2) fix the behavior for those who erroneously assumed that the code was going to do the right thing by default. I.e., it sounds mostly like a bugfix to me. Am I overlooking something? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 17:12:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 16:12:23 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu> On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote: > > On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > >> If we go through with the changes to spliced_seq(), should it be >> implemented for inclusion in v1.6 or wait until v1.7? > > I would say they should be implemented ASAP because they 1) should > not change behavior for those for which the current default > behavior was already broken (and who therefore pass in --no_sort), > and 2) fix the behavior for those who erroneously assumed that the > code was going to do the right thing by default. > > I.e., it sounds mostly like a bugfix to me. Am I overlooking > something? > > -hilmar > -- Okay; I'll try to get this in soon. chris From jean-luc.jany at univ-brest.fr Tue Nov 6 04:00:07 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Tue, 06 Nov 2007 10:00:07 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <47302D17.2030500@univ-brest.fr> Thanks Brian. Yes I use bash. I am going to follow your advice as soon as possible (for some reasons I am unable to run bioperl) and come back to you to tell you if it runs. Jean-Luc From jason at bioperl.org Tue Nov 6 16:18:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 16:18:35 -0500 Subject: [Bioperl-l] lightweight sequence features Message-ID: I started a branch for implementing and playing with lightweight feature object. The branch is called 'lightweight_feature_branch'. Right now it is about 70% faster just in object creation based on parsing features using Bio::Tools::GFF and swapping the types of features that are created. It uses arrays instead of hashes under the hood. So the objects don't have locations under the hood. My hope is if this works okay we could use it for creating objects where we KNOW the underlying features have simple locations so such as parsing in GFF data. -jason -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Tue Nov 6 16:57:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Nov 2007 15:57:17 -0600 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: References: Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Bravo! I once benchmarked Location instance creation once and found it contributed quite a bit of overhead so the speedup with that and the use of arrays makes quite a bit of sense to me. You mention only simple locations; I'm guessing this doesn't handle 'fuzzy' ends? If it did I could see layering the feature data from the get-go, so it could be used just about anywhere in the place of SF::Generic. Maybe something to test out in 1.7? chris On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > I started a branch for implementing and playing with lightweight > feature object. The branch is called 'lightweight_feature_branch'. > > Right now it is about 70% faster just in object creation based on > parsing features using Bio::Tools::GFF and swapping the types of > features that are created. It uses arrays instead of hashes under > the hood. > > So the objects don't have locations under the hood. My hope is if > this works okay we could use it for creating objects where we KNOW > the underlying features have simple locations so such as parsing in > GFF data. > > -jason > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue Nov 6 23:14:55 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 23:14:55 -0500 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> References: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Message-ID: Right - only for simple locations. I've got a bunch more tests and fixes to put in. I am hoping this can be fast replacement in the case where we're dealing with this "unflattened" data (i.e. GFF in FeatureIO & Gbrowse). This is sort of a playground until I feel like it can really get it tested a bit more. I'll give an all clear when the dust settles in terms of the design if anyone wants to play/help. -jason On Nov 6, 2007, at 4:57 PM, Chris Fields wrote: > Bravo! I once benchmarked Location instance creation once and > found it contributed quite a bit of overhead so the speedup with > that and the use of arrays makes quite a bit of sense to me. > > You mention only simple locations; I'm guessing this doesn't handle > 'fuzzy' ends? If it did I could see layering the feature data from > the get-go, so it could be used just about anywhere in the place of > SF::Generic. Maybe something to test out in 1.7? > > chris > > On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > >> I started a branch for implementing and playing with lightweight >> feature object. The branch is called 'lightweight_feature_branch'. >> >> Right now it is about 70% faster just in object creation based on >> parsing features using Bio::Tools::GFF and swapping the types of >> features that are created. It uses arrays instead of hashes under >> the hood. >> >> So the objects don't have locations under the hood. My hope is if >> this works okay we could use it for creating objects where we KNOW >> the underlying features have simple locations so such as parsing in >> GFF data. >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki at sanbi.ac.za Wed Nov 7 05:05:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 7 Nov 2007 12:05:59 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Mdust Message-ID: <200711071205.59576.heikki@sanbi.ac.za> Hi Donald, I started using your Mdust module in bioperl-run and run into problems immediately. * Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects, although the docs say otherwise * Sequences are modified in place. That is really bad, because that means that the user has to know to create a copy before running Mdust on it. * The docs say that you have to set MDUSTDIR envvar to tell the program where to find the binary. That is actually optional if the binary is on your path. * The tests do not cover any of the options to the program As a quick fix, I suggest that we: * leave the current way of working for Bio::SeqI objects: sequence string is not masked but seqfeatures to that effect are added * Modify run() to return the new masked sequence object when the target is a Bio::PrimarySeqI. * fix the documentation After that it will be possible to simply write: use Bio::Tools::Run::Mdust; $mdust = Bio::Tools::Run::Mdust->new(); $seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI); Are you happy for me to do this or do you want to do it yourself? Yours, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho _/_/_/_/_/ heikki at_sanbi _ac _za skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Kevin.M.Brown at asu.edu Wed Nov 7 13:04:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 7 Nov 2007 11:04:50 -0700 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> I installed bioperl-ext from CVS, but can't figure out what else is missing to utilize Bio::Tools::pSW. The error I get from the example script in the wiki is: The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. Compilation failed in require at ./align_test.pl line 3. BEGIN failed--compilation aborted at ./align_test.pl line 3. In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called Align, but no Align.pm file. I followed the directions in the wiki to install 1.5.2_102 (think I had _100 installed previously). Any thoughts on what I'm missing? From jason at bioperl.org Wed Nov 7 14:52:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 14:52:16 -0500 Subject: [Bioperl-l] (no subject) Message-ID: The array-based Bio::SeqFeature::Slim is only about 7% faster than Bio::Graphics::Feature so I suspect most of the speedup comes from removing location objects. Generic 6.75 -- -37% -41% GraphicsF 4.26 58% -- -7% Slim 3.98 70% 7% -- this is using code on the lightweight_feature_branch so cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r lightweight_feature_branch -d core_lwf bioperl-live http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl and the GFF3 file I used to parse http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 -jason From lstein at cshl.edu Wed Nov 7 15:04:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Nov 2007 15:04:24 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: References: Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> I wonder if it is worth moving to the array-based version more generally, then. How does the array based feature object deal with tags? Lincoln On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > The array-based Bio::SeqFeature::Slim is only about 7% faster than > Bio::Graphics::Feature so I suspect most of the speedup comes from removing > location objects. > > Generic 6.75 -- -37% -41% > GraphicsF 4.26 58% -- -7% > Slim 3.98 70% 7% -- > > this is using code on the lightweight_feature_branch so > cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r > lightweight_feature_branch -d core_lwf bioperl-live > > http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl > and the GFF3 file I used to parse > http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 > > -jason > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Wed Nov 7 15:09:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 15:09:35 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> It uses hashes there so technically it is not entirely array based. -jason On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > I wonder if it is worth moving to the array-based version more > generally, > then. > > How does the array based feature object deal with tags? > > Lincoln > > On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > >> The array-based Bio::SeqFeature::Slim is only about 7% faster than >> Bio::Graphics::Feature so I suspect most of the speedup comes from >> removing >> location objects. >> >> Generic 6.75 -- -37% -41% >> GraphicsF 4.26 58% -- -7% >> Slim 3.98 70% 7% -- >> >> this is using code on the lightweight_feature_branch so >> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >> lightweight_feature_branch -d core_lwf bioperl-live >> >> http://jason.open-bio.org/~jason/bioperl/ >> seqfeature_speed.pl> seqfeature_speed.pl> >> and the GFF3 file I used to parse >> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >> >> -jason >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Nov 7 16:12:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 15:12:35 -0600 Subject: [Bioperl-l] (no subject) In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> I can see preferring a lightweight simple SF over SF::Generic in the next BioPerl dev cycle. I guess we would just layer split locations as simple sub-features/segments, typing when necessary? That shouldn't be much more overhead than creating a layered Location::Split. chris On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > It uses hashes there so technically it is not entirely array based. > > -jason > On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > >> I wonder if it is worth moving to the array-based version more >> generally, >> then. >> >> How does the array based feature object deal with tags? >> >> Lincoln >> >> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >> >>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>> removing >>> location objects. >>> >>> Generic 6.75 -- -37% -41% >>> GraphicsF 4.26 58% -- -7% >>> Slim 3.98 70% 7% -- >>> >>> this is using code on the lightweight_feature_branch so >>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>> lightweight_feature_branch -d core_lwf bioperl-live >>> >>> http://jason.open-bio.org/~jason/bioperl/ >>> seqfeature_speed.pl>> seqfeature_speed.pl> >>> and the GFF3 file I used to parse >>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>> >>> -jason >>> >> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Wed Nov 7 18:19:15 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 7 Nov 2007 18:19:15 -0500 Subject: [Bioperl-l] lightweight features In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: It seems to me that there are applications where you're dealing with a huge number of features (such as GFF) and where therefore a lightweight object makes tremendous sense. But when you parse a genbank file, I'm not sure that's the bottleneck, unless maybe it's a large contig with lots of feature annotations. I guess we'll ultimately want a way to control the type of feature being instantiated by a parser, e..g using a factory. -hilmar On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > I can see preferring a lightweight simple SF over SF::Generic in the > next BioPerl dev cycle. I guess we would just layer split locations > as simple sub-features/segments, typing when necessary? That > shouldn't be much more overhead than creating a layered > Location::Split. > > chris > > On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > >> It uses hashes there so technically it is not entirely array based. >> >> -jason >> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >> >>> I wonder if it is worth moving to the array-based version more >>> generally, >>> then. >>> >>> How does the array based feature object deal with tags? >>> >>> Lincoln >>> >>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>> >>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>> removing >>>> location objects. >>>> >>>> Generic 6.75 -- -37% -41% >>>> GraphicsF 4.26 58% -- -7% >>>> Slim 3.98 70% 7% -- >>>> >>>> this is using code on the lightweight_feature_branch so >>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>>> lightweight_feature_branch -d core_lwf bioperl-live >>>> >>>> http://jason.open-bio.org/~jason/bioperl/ >>>> seqfeature_speed.pl>>> seqfeature_speed.pl> >>>> and the GFF3 file I used to parse >>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>> >>>> -jason >>>> >>> >>> >>> >>> -- >>> Lincoln D. Stein >>> Cold Spring Harbor Laboratory >>> 1 Bungtown Road >>> Cold Spring Harbor, NY 11724 >>> (516) 367-8380 (voice) >>> (516) 367-8389 (fax) >>> FOR URGENT MESSAGES & SCHEDULING, >>> PLEASE CONTACT MY ASSISTANT, >>> SANDRA MICHELSEN, AT michelse at cshl.edu >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Nov 7 20:04:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 19:04:05 -0600 Subject: [Bioperl-l] lightweight features In-Reply-To: References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: I'm also thinking a factory is a good possibility; maybe something to take the place of FTHelper. chris On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote: > It seems to me that there are applications where you're dealing with > a huge number of features (such as GFF) and where therefore a > lightweight object makes tremendous sense. But when you parse a > genbank file, I'm not sure that's the bottleneck, unless maybe it's a > large contig with lots of feature annotations. > > I guess we'll ultimately want a way to control the type of feature > being instantiated by a parser, e..g using a factory. > > -hilmar > > On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > >> I can see preferring a lightweight simple SF over SF::Generic in the >> next BioPerl dev cycle. I guess we would just layer split locations >> as simple sub-features/segments, typing when necessary? That >> shouldn't be much more overhead than creating a layered >> Location::Split. >> >> chris >> >> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: >> >>> It uses hashes there so technically it is not entirely array based. >>> >>> -jason >>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >>> >>>> I wonder if it is worth moving to the array-based version more >>>> generally, >>>> then. >>>> >>>> How does the array based feature object deal with tags? >>>> >>>> Lincoln >>>> >>>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>>> >>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>>> removing >>>>> location objects. >>>>> >>>>> Generic 6.75 -- -37% -41% >>>>> GraphicsF 4.26 58% -- -7% >>>>> Slim 3.98 70% 7% -- >>>>> >>>>> this is using code on the lightweight_feature_branch so >>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl >>>>> co -r >>>>> lightweight_feature_branch -d core_lwf bioperl-live >>>>> >>>>> http://jason.open-bio.org/~jason/bioperl/ >>>>> seqfeature_speed.pl>>>> seqfeature_speed.pl> >>>>> and the GFF3 file I used to parse >>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>>> >>>>> -jason >>>>> >>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Cold Spring Harbor Laboratory >>>> 1 Bungtown Road >>>> Cold Spring Harbor, NY 11724 >>>> (516) 367-8380 (voice) >>>> (516) 367-8389 (fax) >>>> FOR URGENT MESSAGES & SCHEDULING, >>>> PLEASE CONTACT MY ASSISTANT, >>>> SANDRA MICHELSEN, AT michelse at cshl.edu >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 7 23:45:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 22:45:26 -0600 Subject: [Bioperl-l] test please ignore Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> From cjfields at uiuc.edu Thu Nov 8 10:50:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Nov 2007 09:50:02 -0600 Subject: [Bioperl-l] test please ignore In-Reply-To: <47332534.5090205@bms.com> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> <47332534.5090205@bms.com> Message-ID: And respond back! Just checking the mail list; the open-bio wiki pages were down last night. chris On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote: > Chris Fields wrote: >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > This is the best way to make everyone open this e-mail ;-) > Stefan From stefan.kirov at bms.com Thu Nov 8 10:03:16 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 08 Nov 2007 10:03:16 -0500 Subject: [Bioperl-l] test please ignore In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> Message-ID: <47332534.5090205@bms.com> Chris Fields wrote: > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > This is the best way to make everyone open this e-mail ;-) Stefan From Kevin.M.Brown at asu.edu Thu Nov 8 17:30:24 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Nov 2007 15:30:24 -0700 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org> References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> <20071108003638.GA5892@eniac.jgi-psf.org> Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu> OK, found the issue. For whatever reason the Align.pm file is inside the Align folder and so the package name and path don't match up once it is installed. This would cause it to have a name of "Bio::Ext::Align::Align" instead of "Bio::Ext::Align". Not sure why this wasn't caught when I did "perl Makefile.pl && make && make test && make install" > -----Original Message----- > From: Joel Martin [mailto:j_martin at lbl.gov] > Sent: Wednesday, November 07, 2007 5:37 PM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Ext::Align? > > Hello, > Might be a side effect of fixing the other bioperl-ext package, > what steps exactly did this entail: > > > I installed bioperl-ext from CVS, > > ? > > you can probably bypass it at the moment by doing this after > unpacking the > bioperl-ext package > > cd Bio/Ext/Align > perl Makefile.PL > make > make test > make install > > and > > cd Bio/Ext/HMM > perl Makefile.PL > make > make test > make install > > Joel > > but can't figure out what else is > > missing to utilize Bio::Tools::pSW. The error I get from > the example > > script in the wiki is: > > > > The C-compiled engine for Smith Waterman alignments > (Bio::Ext::Align) > > has not been installed. > > Please read the install the bioperl-ext package > > > > BEGIN failed--compilation aborted at > > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. > > Compilation failed in require at ./align_test.pl line 3. > > BEGIN failed--compilation aborted at ./align_test.pl line 3. > > > > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called > > Align, but no Align.pm file. > > > > I followed the directions in the wiki to install 1.5.2_102 > (think I had > > _100 installed previously). Any thoughts on what I'm missing? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From akarger at CGR.Harvard.edu Fri Nov 9 09:53:02 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 9 Nov 2007 09:53:02 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? Message-ID: When I tblastn ENSP00000349467 against the human genome, I get a few hits on chr10, among which are: Score = 192 bits (487), Expect(2) = 5e-64 Identities = 99/109 (90%), Positives = 99/109 (90%) Frame = +2 Query: 40 LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99 L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F VFDKDGNG Sbjct: 71593562 LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741 Query: 100 YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148 YIS EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA 71593885 Score = 75.1 bits (183), Expect(2) = 5e-64 Identities = 36/43 (83%), Positives = 39/43 (90%) Frame = +1 Query: 1 MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43 MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS ++ Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575 As you can see from Sbjct lines, these two hits are basically contiguous. I was surprised to see that the bit scores and identities and alignment lengths here are totally different but the expectation values are identical. After a bit of grepping in the BLAST source, I found reference to "sum segments" and "a collection [of] multiple distinct alignments with asymmetric gaps between the alignments" and decided it was time to cry for help. When does BLAST decide that two or more alignments belong "together" and how does the affect the evalue? Is the evalue really showing how good those two alignments combined are, despite the frame shift? (It so happens that that's what I want.) And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University From cjfields at uiuc.edu Fri Nov 9 12:58:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Nov 2007 11:58:16 -0600 Subject: [Bioperl-l] GFF3loader and indexing Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu> Quick question: shouldn't the new Index attribute be passed on to seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping purposes (for instance, properly reloading dumped gff3 data)? I'm testing out a feature editor using volvox.gff3 data in GBrowse and the mRNA features appear to drop this attribute once loaded: Original data: ctgA example gene 1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN. 1;Note=Eden splice form 1;Index=1 ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.1 partial gff3_string(1) output: ctgA example gene 1050 9000 . + . Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . Name=EDEN. 1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1 ctgA example five_prime_UTR 1050 1200 . + . Parent=51;ID=52 ... chris From David.Messina at sbc.su.se Sat Nov 10 06:04:25 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 10 Nov 2007 12:04:25 +0100 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave From sac at bioperl.org Sat Nov 10 17:59:28 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Nov 2007 14:59:28 -0800 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> The Bioperl blast parser should extract that value and you can obtain it from an HSP object, via the HSPI::n() method, documented here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23 Dave's basically correct in his explanation. It's a result of the application of sum statistics by the blast algorithm. You can read all about it in Korf et al's BLAST book. Here's the relevant section: http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1 Steve On Nov 10, 2007 3:04 AM, Dave Messina wrote: > Hi Amir, > > I don't have my BLAST book handy, and my memory is a little fuzzy, but I > think the Expect(2) you're seeing is the E-value based on both HSPs > combined. And I think this is why you see the same Expect value for both -- > because it is shared between them (which sounds like what you wanted). > > Again, this is just from memory, but I think this is an option that has to > be turned on rather than something which Blast decides to do on its own. > > > I don't know whether BioPerl reports this or not. Would you mind e-mailing > me a entire BLAST report as a sample? When I have some time I'd like to play > around with this a bit. > > Thanks, > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Tue Nov 13 06:57:04 2007 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 13 Nov 2007 12:57:04 +0100 Subject: [Bioperl-l] Panel link Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com> Hi, Is it possible with Panel to provide javascript event handlers? With -link we can provide hrefs as: -link => 'http://www.google.com/search?q=$description' or use a coderef that returns a href. However, I'd like to set-up links as: Is this possible by default with Panel? Regards, Bernd From akarger at CGR.Harvard.edu Tue Nov 13 12:12:32 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:12:32 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: Thanks for the reply. I'm curious as to how BLAST decides to do this, but not curious enough to buy the BLAST book. If you want to see this, you could just tblastn the ENSP00000349467 sequence vs. the genome: MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE EVDEMIREADIDGDGQVNYEEFVQMMTAK against the human genome at NCBI or locally. I've attached the tblastn report for that protein, which includes the results I quoted. (It was done as part of a blast of 150 proteins vs. the genome.) -Amir ________________________________ From: dave at davemessina.com [mailto:dave at davemessina.com] On Behalf Of Dave Messina Sent: Saturday, November 10, 2007 6:04 AM To: Amir Karger Cc: bioperl-l Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: ENSP00000349467_tblastn.txt.gz Type: application/x-gzip Size: 9755 bytes Desc: ENSP00000349467_tblastn.txt.gz URL: From akarger at CGR.Harvard.edu Tue Nov 13 12:30:52 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:30:52 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: > From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > Of Steve Chervitz > > The Bioperl blast parser should extract that value and you can obtain > it from an HSP object, via the HSPI::n() method, documented here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B io/Search/HSP/HSPI.html#POD23 As I mentioned in my email: And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) And the docs for n() actually say, "This value is not defined with NCBI Blast2 with gapping" although they don't say why. Which may explain why, when I ran the following code on the blast result I included in my last email, I got empty values for all of the n's. (Why is n() undefined for gapped blast if I'm getting n's in my results from that blast?) use warnings; use strict; use Bio::SearchIO; my $blast_out = $ARGV[0]; my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_out, -report_type => 'tblastn'); print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N Evalue)), "\n"; while(my $query = $in->next_result) { while(my $subject = $query->next_hit) { while (my $hsp = $subject->next_hsp) { print join("\t", $query->query_name, $hsp->start("query"), $hsp->end("query"), $hsp->strand("hit"), $subject->name, $hsp->start("hit"), $hsp->end("hit"), $subject->frame, $hsp->n, $hsp->evalue, ),"\n"; } } } > Dave's basically correct in his explanation. It's a result of the > application of sum statistics by the blast algorithm. You can read all > about it in Korf et al's BLAST book. Here's the relevant section: [snip] Thanks, -Amir From cjfields at uiuc.edu Tue Nov 13 12:42:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Nov 2007 11:42:07 -0600 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Amir, Can you file this as a bug? Dave mentioned he would look into it but I think it warrants tracking to make sure it gets fixed: http://www.bioperl.org/wiki/Bugs Attach the example BLAST report from your last post to the report. BTW, I wonder how this appears in XML output? chris On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf >> Of Steve Chervitz >> >> The Bioperl blast parser should extract that value and you can obtain >> it from an HSP object, via the HSPI::n() method, documented here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/Search/HSP/HSPI.html#POD23 > > As I mentioned in my email: > > And does anyone know off-hand if Bioperl will tell me when situations > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > subroutine > would help, but I just get a bunch of empty strings for that, > whether or > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > {"_n"} is > undef.) > > And the docs for n() actually say, "This value is not defined with > NCBI > Blast2 with gapping" although they don't say why. Which may explain > why, > when I ran the following code on the blast result I included in my > last > email, I got empty values for all of the n's. (Why is n() undefined > for > gapped blast if I'm getting n's in my results from that blast?) > > use warnings; > use strict; > use Bio::SearchIO; > > my $blast_out = $ARGV[0]; > my $in = new Bio::SearchIO(-format => 'blast', > -file => $blast_out, > -report_type => 'tblastn'); > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N > Evalue)), "\n"; > while(my $query = $in->next_result) { > while(my $subject = $query->next_hit) { > while (my $hsp = $subject->next_hsp) { > print join("\t", > $query->query_name, > $hsp->start("query"), > $hsp->end("query"), > $hsp->strand("hit"), > $subject->name, > $hsp->start("hit"), > $hsp->end("hit"), > $subject->frame, > $hsp->n, > $hsp->evalue, > ),"\n"; > } > } > } > >> Dave's basically correct in his explanation. It's a result of the >> application of sum statistics by the blast algorithm. You can read >> all >> about it in Korf et al's BLAST book. Here's the relevant section: > > [snip] > > Thanks, > > -Amir > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lskatz at gatech.edu Tue Nov 13 20:27:45 2007 From: lskatz at gatech.edu (Lee Katz) Date: Tue, 13 Nov 2007 20:27:45 -0500 Subject: [Bioperl-l] chromatogram Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Hi, I would like to know how to draw a chromatogram file. Does anyone have any sample code where you read in an scf file and create a jpeg or other image file? For that matter, I want to be able to customize these images with base calls if possible. I really appreciate the help, so thanks! -- Lee Katz From mvrmakam at yahoo.com Wed Nov 14 04:52:13 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST) Subject: [Bioperl-l] Installing Bioperl on Windows XP Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com> Hi, I am encountering a problem while installing Bioperl on Windows XP. I have installed ActivePerl version 5.8.8.822. I am using Perl Package Manager GUI. Also, I am following the instructions outlined for installing Bioperl on Windows. I am getting an error. The error is as follows: Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com') I do not know how to overcome this problem. The other issue is when I type bioperl in the search box I do not see any packages of bioperl. I do not know what the problem is. If anyone of you could guide me through the installation process I would appreciate it. Thanks, Roshan ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From cjfields at uiuc.edu Wed Nov 14 09:02:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Nov 2007 08:02:05 -0600 Subject: [Bioperl-l] Installing Bioperl on Windows XP In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com> References: <235423.72586.qm@web33703.mail.mud.yahoo.com> Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu> The instructions are pretty specific: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Note the section on adding new repositories. As for the PPM connection error, it's more than likely an error with the default address but it isn't bioperl-related; maybe answers lie here: http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- faq2.html#ppm_repositories chris On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote: > Hi, > > I am encountering a problem while installing Bioperl on Windows > XP. I have installed ActivePerl version 5.8.8.822. I am using > Perl Package Manager GUI. Also, I am following the instructions > outlined for installing Bioperl on Windows. I am getting an > error. The error is as follows: > > Downloading ActiveState Package Repository packlist ... failed 500 > Can't connect to ppm4.activestate.com:80 (Bad hostname > 'ppm4.activestate.com') > > I do not know how to overcome this problem. The other issue is > when I type bioperl in the search box I do not see any packages of > bioperl. I do not know what the problem is. If anyone of you > could guide me through the installation process I would appreciate it. > > Thanks, > > Roshan From reshetovdenis at gmail.com Wed Nov 14 12:28:40 2007 From: reshetovdenis at gmail.com (Denis Reshetov) Date: Wed, 14 Nov 2007 20:28:40 +0300 Subject: [Bioperl-l] how to load all genomes Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Dear BioPerl-db Creators, I`m trying to load all genomes from NCBI ftp site to my BioSql database using common script load_seqdatabase.pl But it seems very slow. Let me know what is the better way to do it? Thank you very much, Denis. From barry.moore at genetics.utah.edu Wed Nov 14 14:18:29 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 14 Nov 2007 12:18:29 -0700 Subject: [Bioperl-l] how to load all genomes In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu> Denis, You might be interested in this thread from a couple years ago. I was having a similar problem, that I eventually resolved. Unfortunately the reason for the problem and the solution weren't entirely clear, but you may be able to glean some ideas from it. Also, you may have already done this, but I suggest searching the archives from this list because it seems like this comes up every now and then, so there may be other postings similar to the one I'm sending you that could help you. http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html Finally, if you are still having problems, you'll want to include a few more details about your situation. What DB are you using, have you preloaded taxonomy data etc. How fast/slow are your sequences loading? Barry On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote: > Dear BioPerl-db Creators, > > I`m trying to load all genomes from NCBI ftp site > to my BioSql database using common script load_seqdatabase.pl > > But it seems very slow. Let me know what is the better way to do it? > > Thank you very much, > > Denis. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Nov 14 14:57:49 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 08:57:49 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Here's my trace viewer. Please excuse my dodgy Perl and debugging code as it's still under development :-) Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ------------------------------------------------------------------------ ------------------ #!perl -w use ABI; use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Data::Dumper; use Getopt::Long; use constant HEIGHT => 300; GetOptions ('h|height=i' => \$HEIGHT, 'f|file=s' => \$FILE, 'o|out=s' => \$OUTFILE, 'l|left=s' => \$LEFT_SEQ, 'r|right=s' => \$RIGHT_SEQ, 's|size=i' => \$SIZE, ) || die < Set height of image (${\HEIGHT} pixels default) --file Filename for the ABI trace file --out Filename for the generated .png image --left --right --size Parse an ABI trace file and render a PNG image. See http://search.cpan.org/dist/ABI/ABI.pm or http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm USAGE my $height = $HEIGHT || HEIGHT; my $file = $FILE; my $outfile = $OUTFILE; my $abi = ABI->new(-file=> $file); my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" my @base_calls = $abi->get_base_calls(); # Get the base calls my $sequence =$abi->get_sequence(); @bp = split(//, $sequence); # iterate over array $size = $abi->get_trace_length(); for ($i=0,$count = 0; $i<$size; $i++) { if(grep(/\b$i\b/, @base_calls)){ $bases[$i] = $bp[$count]; $count++; }else{ $bases[$i] = ' '; } } # create the data. see GD::Graph::Data for details of the format my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); $graph->set( title => $abi->get_sample_name(), # y_max_value => $abi->get_max_trace() + 50, x_max_value => $abi->get_trace_length(), t_margin => 5, b_margin => 5, l_margin => 5, r_margin => 5, x_ticks => 0, text_space => 0, line_width => 1, transparent => 0, b_margin => 30, t_margin => 35, x_plot_values => 0, interlaced => 1, ); # allocate some colors for drawing the bases #use colors same as Chromas $graph->set( dclrs => [ qw( green blue black red pink) ] ); #plot the data my $gd = $graph->plot(\@data); $black = $gd->colorAllocate(0,0,0); # A $blue = $gd->colorAllocate(0,0,255); # C $red = $gd->colorAllocate(255,0,0); # G $green = $gd->colorAllocate(0,255,0); # T $magenta =$gd->colorAllocate(255,0,255); # N $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn $gray = $gd->colorAllocate(210,210,210); %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", $magenta, " ",$white); #$start_base = index(lc($sequence),lc($LEFT_SEQ)); $start_base = find_match($sequence,$LEFT_SEQ); #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ $end_base = find_match($sequence,$RIGHT_SEQ, 1); if($end_base){ $end_base += length($RIGHT_SEQ); } # get the coords of the features on the image @coords = $graph->get_hotspot(1); $size = @coords; $printed_num = 1; $basecount = 0; $numstoprint = $basecount - $start_base; # draw the colored bases and scale at top and bottom of image for ($i=0,$count = 0; $i<$size; $i++) { $c = $coords[$i]; (undef, $xs, undef, undef, undef, undef) = @$c; $base = $bases[$i]; if($base =~ /[ACGTN]/){ if($start_base - 1 == $basecount){$start_base_coord = $xs;} if($end_base - 1 == $basecount){$end_base_coord = $xs;} if(defined($SIZE) && $start_base+$SIZE -2 == $basecount){$end_base_coord_by_size = $xs;} $basecount++; $numstoprint++; $printed_num = 0; } # print the bases top and bottom $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); # print scale if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ if($LEFT_SEQ){ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; }else{ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; } } $top_right_corner = $xs; } # only draw the clipped region if the calculated size is + or - 6bp #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) - $SIZE >= -6 ){ # draw the clipped regions as gray #if LEFT_SEQ supplied and a match found if($LEFT_SEQ && $start_base > 0){ $gd->filledRectangle(38,35,$start_base_coord - 1,$height - 33,$red); $clipped = 1; } #if RIGHT_SEQ supplied and a match found if($RIGHT_SEQ && $end_base > 0){ print join("\t", ($end_base)),"\n"; $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - 33,$gray); $clipped = 1; } #if no RIGHT_SEQ supplied or no match found, use left match + seq length if(!$RIGHT_SEQ || $end_base < 0){ $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh t - 33,$blue); $clipped = 1; } # set height based on max trace within clipped region $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); # need to re-plot the data over the grayed out area $graph->plot(\@data) if $clipped; $gd->filledRectangle(0,0,$top_right_corner,33,$white); #} #print the graph open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; binmode OUT; print OUT $gd->png; close OUT; sub find_match{ my ($sequence,$query,$last) = @_; return -1 if length($query) < 6; my($odds, $evens, $ones, $twos, $threes, $match_pos); # try exact match $match_pos = do_regex($query, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every second base starting from the second base e.g. it will be .C.T.C.G.etc map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} ($query=~m/(\w\w)/g); $match_pos = do_regex($odds, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($evens, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every third base starting from the first base e.g. it will be C..T..G..T etc map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; $threes.="..$3"} ($query =~m/(\w\w\w)/g); $match_pos = do_regex($ones, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($twos, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($threes, $sequence,$last); return $match_pos if $match_pos > 0; # not found return -1; } sub do_regex(){ my ($query,$sequence,$last)= @_; #print "trying $query \n"; my $result = -1; $result = pos($sequence)-length($query)+1 if $last && ($sequence =~ m/.*($query)/ig); $result = pos($sequence)-length($query)+1 if($sequence =~ m/.*?($query)/ig); return $result; } ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 15:47:20 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 15:47:20 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: <473B5ED8.1090201@mail.nih.gov> I guess you need chromatogram from SCF. I can't help in that. ABI.pm is not in Bioperl distribution. But to make the record straight, you can use one step chromatogram drawing in SVG from ABI file using my BioSVG module, available at: http://www.bioinformatics.org/~malay/biosvg/ Malay Smithies, Russell wrote: > Here's my trace viewer. > Please excuse my dodgy Perl and debugging code as it's still under > development :-) > > > Russell Smithies > > Bioinformatics Software Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > ------------------------------------------------------------------------ > ------------------ > > #!perl -w > use ABI; > > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Data::Dumper; > > > use Getopt::Long; > > use constant HEIGHT => 300; > > GetOptions ('h|height=i' => \$HEIGHT, > 'f|file=s' => \$FILE, > 'o|out=s' => \$OUTFILE, > 'l|left=s' => \$LEFT_SEQ, > 'r|right=s' => \$RIGHT_SEQ, > 's|size=i' => \$SIZE, > ) || die < Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > test2.png -l actacgtacgta -r atgatcgtacgtac > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > Options: > --height Set height of image (${\HEIGHT} pixels default) > --file Filename for the ABI trace file > --out Filename for the generated .png image > --left > --right > --size > > Parse an ABI trace file and render a PNG image. > See http://search.cpan.org/dist/ABI/ABI.pm > or > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > USAGE > > my $height = $HEIGHT || HEIGHT; > my $file = $FILE; > my $outfile = $OUTFILE; > > my $abi = ABI->new(-file=> $file); > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > my @base_calls = $abi->get_base_calls(); # Get the base calls > my $sequence =$abi->get_sequence(); > @bp = split(//, $sequence); > > > > # iterate over array > $size = $abi->get_trace_length(); > for ($i=0,$count = 0; $i<$size; $i++) { > if(grep(/\b$i\b/, @base_calls)){ > $bases[$i] = $bp[$count]; > $count++; > }else{ > $bases[$i] = ' '; > } > } > > # create the data. see GD::Graph::Data for details of the format > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > $graph->set( > title => $abi->get_sample_name(), > # y_max_value => $abi->get_max_trace() + 50, > x_max_value => $abi->get_trace_length(), > t_margin => 5, > b_margin => 5, > l_margin => 5, > r_margin => 5, > x_ticks => 0, > text_space => 0, > line_width => 1, > transparent => 0, > b_margin => 30, > t_margin => 35, > x_plot_values => 0, > interlaced => 1, > ); > > # allocate some colors for drawing the bases > #use colors same as Chromas > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > #plot the data > my $gd = $graph->plot(\@data); > > $black = $gd->colorAllocate(0,0,0); # A > $blue = $gd->colorAllocate(0,0,255); # C > $red = $gd->colorAllocate(255,0,0); # G > $green = $gd->colorAllocate(0,255,0); # T > $magenta =$gd->colorAllocate(255,0,255); # N > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > $gray = $gd->colorAllocate(210,210,210); > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > $magenta, " ",$white); > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > $start_base = find_match($sequence,$LEFT_SEQ); > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > if($end_base){ > $end_base += length($RIGHT_SEQ); > } > > > # get the coords of the features on the image > @coords = $graph->get_hotspot(1); > $size = @coords; > $printed_num = 1; > $basecount = 0; > $numstoprint = $basecount - $start_base; > > # draw the colored bases and scale at top and bottom of image > for ($i=0,$count = 0; $i<$size; $i++) { > $c = $coords[$i]; > (undef, $xs, undef, undef, undef, undef) = @$c; > $base = $bases[$i]; > if($base =~ /[ACGTN]/){ > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > if(defined($SIZE) && $start_base+$SIZE -2 == > $basecount){$end_base_coord_by_size = $xs;} > $basecount++; > $numstoprint++; > $printed_num = 0; > } > # print the bases top and bottom > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > # print scale > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > if($LEFT_SEQ){ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > }else{ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > } > } > $top_right_corner = $xs; > } > > > > # only draw the clipped region if the calculated size is + or - 6bp > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > - $SIZE >= -6 ){ > # draw the clipped regions as gray > #if LEFT_SEQ supplied and a match found > if($LEFT_SEQ && $start_base > 0){ > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > 33,$red); > $clipped = 1; > } > #if RIGHT_SEQ supplied and a match found > if($RIGHT_SEQ && $end_base > 0){ > print join("\t", ($end_base)),"\n"; > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > 33,$gray); > $clipped = 1; > } > #if no RIGHT_SEQ supplied or no match found, use left match + seq > length > if(!$RIGHT_SEQ || $end_base < 0){ > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > t - 33,$blue); > $clipped = 1; > } > > > > # set height based on max trace within clipped region > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > # need to re-plot the data over the grayed out area > $graph->plot(\@data) if $clipped; > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > #} > > #print the graph > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > binmode OUT; > print OUT $gd->png; > close OUT; > > > sub find_match{ > my ($sequence,$query,$last) = @_; > return -1 if length($query) < 6; > my($odds, $evens, $ones, $twos, $threes, $match_pos); > # try exact match > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > $match_pos > 0; > > # try matching every second base starting from the second base e.g. > it will be .C.T.C.G.etc > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > ($query=~m/(\w\w)/g); > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > if $match_pos > 0; > > # try matching every third base starting from the first base e.g. it > will be C..T..G..T etc > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > if $match_pos > 0; > > # not found > return -1; > } > > sub do_regex(){ > my ($query,$sequence,$last)= @_; > #print "trying $query \n"; > my $result = -1; > $result = pos($sequence)-length($query)+1 if $last && ($sequence > =~ m/.*($query)/ig); > $result = pos($sequence)-length($query)+1 if($sequence =~ > m/.*?($query)/ig); > return $result; > } > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Lee Katz >> Sent: Wednesday, 14 November 2007 2:28 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] chromatogram >> >> Hi, >> I would like to know how to draw a chromatogram file. Does anyone >> have any sample code where you read in an scf file and create a jpeg >> or other image file? >> For that matter, I want to be able to customize these images with base >> calls if possible. I really appreciate the help, so thanks! >> >> -- >> Lee Katz >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Malay K Basu www.malaybasu.net From Russell.Smithies at agresearch.co.nz Wed Nov 14 15:58:19 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 09:58:19 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B5ED8.1090201@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: We try and avoid SVG at all costs as installing plugins and viewers in a locked down corporate environment can be more trouble than it's worth whereas generating .png images works for any browser with no extras required. We actually call this trace drawing code from Python which then generates webpages with the embedded image. It also means we don't need to licence, install and maintain a trace viewer like Chromas. :-) Russell > -----Original Message----- > From: Malay [mailto:mbasu at mail.nih.gov] > Sent: Thursday, 15 November 2007 9:47 a.m. > To: Smithies, Russell > Cc: Lee Katz; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] chromatogram > > I guess you need chromatogram from SCF. I can't help in that. ABI.pm is > not in Bioperl distribution. But to make the record straight, you can > use one step chromatogram drawing in SVG from ABI file using my BioSVG > module, available at: > > http://www.bioinformatics.org/~malay/biosvg/ > > Malay > > > > > Smithies, Russell wrote: > > Here's my trace viewer. > > Please excuse my dodgy Perl and debugging code as it's still under > > development :-) > > > > > > Russell Smithies > > > > Bioinformatics Software Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > ------------------------------------------------------------------------ > > ------------------ > > > > #!perl -w > > use ABI; > > > > use GD::Graph::lines; > > use GD::Graph::colour; > > use GD::Graph::Data; > > > > use Data::Dumper; > > > > > > use Getopt::Long; > > > > use constant HEIGHT => 300; > > > > GetOptions ('h|height=i' => \$HEIGHT, > > 'f|file=s' => \$FILE, > > 'o|out=s' => \$OUTFILE, > > 'l|left=s' => \$LEFT_SEQ, > > 'r|right=s' => \$RIGHT_SEQ, > > 's|size=i' => \$SIZE, > > ) || die < > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > > test2.png -l actacgtacgta -r atgatcgtacgtac > > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > > > Options: > > --height Set height of image (${\HEIGHT} pixels default) > > --file Filename for the ABI trace file > > --out Filename for the generated .png image > > --left > > --right > > --size > > > > Parse an ABI trace file and render a PNG image. > > See http://search.cpan.org/dist/ABI/ABI.pm > > or > > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > > USAGE > > > > my $height = $HEIGHT || HEIGHT; > > my $file = $FILE; > > my $outfile = $OUTFILE; > > > > my $abi = ABI->new(-file=> $file); > > > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > > > my @base_calls = $abi->get_base_calls(); # Get the base calls > > my $sequence =$abi->get_sequence(); > > @bp = split(//, $sequence); > > > > > > > > # iterate over array > > $size = $abi->get_trace_length(); > > for ($i=0,$count = 0; $i<$size; $i++) { > > if(grep(/\b$i\b/, @base_calls)){ > > $bases[$i] = $bp[$count]; > > $count++; > > }else{ > > $bases[$i] = ' '; > > } > > } > > > > # create the data. see GD::Graph::Data for details of the format > > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > > $graph->set( > > title => $abi->get_sample_name(), > > # y_max_value => $abi->get_max_trace() + 50, > > x_max_value => $abi->get_trace_length(), > > t_margin => 5, > > b_margin => 5, > > l_margin => 5, > > r_margin => 5, > > x_ticks => 0, > > text_space => 0, > > line_width => 1, > > transparent => 0, > > b_margin => 30, > > t_margin => 35, > > x_plot_values => 0, > > interlaced => 1, > > ); > > > > # allocate some colors for drawing the bases > > #use colors same as Chromas > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > > > #plot the data > > my $gd = $graph->plot(\@data); > > > > $black = $gd->colorAllocate(0,0,0); # A > > $blue = $gd->colorAllocate(0,0,255); # C > > $red = $gd->colorAllocate(255,0,0); # G > > $green = $gd->colorAllocate(0,255,0); # T > > $magenta =$gd->colorAllocate(255,0,255); # N > > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > > $gray = $gd->colorAllocate(210,210,210); > > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > > $magenta, " ",$white); > > > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > > $start_base = find_match($sequence,$LEFT_SEQ); > > > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > > if($end_base){ > > $end_base += length($RIGHT_SEQ); > > } > > > > > > # get the coords of the features on the image > > @coords = $graph->get_hotspot(1); > > $size = @coords; > > $printed_num = 1; > > $basecount = 0; > > $numstoprint = $basecount - $start_base; > > > > # draw the colored bases and scale at top and bottom of image > > for ($i=0,$count = 0; $i<$size; $i++) { > > $c = $coords[$i]; > > (undef, $xs, undef, undef, undef, undef) = @$c; > > $base = $bases[$i]; > > if($base =~ /[ACGTN]/){ > > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > > if(defined($SIZE) && $start_base+$SIZE -2 == > > $basecount){$end_base_coord_by_size = $xs;} > > $basecount++; > > $numstoprint++; > > $printed_num = 0; > > } > > # print the bases top and bottom > > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > > > # print scale > > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > > if($LEFT_SEQ){ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > }else{ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > } > > } > > $top_right_corner = $xs; > > } > > > > > > > > # only draw the clipped region if the calculated size is + or - 6bp > > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > > - $SIZE >= -6 ){ > > # draw the clipped regions as gray > > #if LEFT_SEQ supplied and a match found > > if($LEFT_SEQ && $start_base > 0){ > > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > > 33,$red); > > $clipped = 1; > > } > > #if RIGHT_SEQ supplied and a match found > > if($RIGHT_SEQ && $end_base > 0){ > > print join("\t", ($end_base)),"\n"; > > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > > 33,$gray); > > $clipped = 1; > > } > > #if no RIGHT_SEQ supplied or no match found, use left match + seq > > length > > if(!$RIGHT_SEQ || $end_base < 0){ > > > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > > t - 33,$blue); > > $clipped = 1; > > } > > > > > > > > # set height based on max trace within clipped region > > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > > > # need to re-plot the data over the grayed out area > > $graph->plot(\@data) if $clipped; > > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > > > #} > > > > #print the graph > > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > > binmode OUT; > > print OUT $gd->png; > > close OUT; > > > > > > sub find_match{ > > my ($sequence,$query,$last) = @_; > > return -1 if length($query) < 6; > > my($odds, $evens, $ones, $twos, $threes, $match_pos); > > # try exact match > > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > > $match_pos > 0; > > > > # try matching every second base starting from the second base e.g. > > it will be .C.T.C.G.etc > > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > > ($query=~m/(\w\w)/g); > > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # try matching every third base starting from the first base e.g. it > > will be C..T..G..T etc > > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # not found > > return -1; > > } > > > > sub do_regex(){ > > my ($query,$sequence,$last)= @_; > > #print "trying $query \n"; > > my $result = -1; > > $result = pos($sequence)-length($query)+1 if $last && ($sequence > > =~ m/.*($query)/ig); > > $result = pos($sequence)-length($query)+1 if($sequence =~ > > m/.*?($query)/ig); > > return $result; > > } > > > > ------------------------------------------------------------------------ > > ------------------ > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of Lee Katz > >> Sent: Wednesday, 14 November 2007 2:28 p.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] chromatogram > >> > >> Hi, > >> I would like to know how to draw a chromatogram file. Does anyone > >> have any sample code where you read in an scf file and create a jpeg > >> or other image file? > >> For that matter, I want to be able to customize these images with base > >> calls if possible. I really appreciate the help, so thanks! > >> > >> -- > >> Lee Katz > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ============================================================= > ========== > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > > ============================================================= > ========== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Malay K Basu > www.malaybasu.net ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 16:04:25 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 16:04:25 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: <473B62D9.8010004@mail.nih.gov> You don't need any plugin. Firefox natively can show most of the SVG files. -Malay Smithies, Russell wrote: > We try and avoid SVG at all costs as installing plugins and viewers in a > locked down corporate environment can be more trouble than it's worth > whereas generating .png images works for any browser with no extras > required. > We actually call this trace drawing code from Python which then > generates webpages with the embedded image. > It also means we don't need to licence, install and maintain a trace > viewer like Chromas. > :-) > > Russell > >> -----Original Message----- >> From: Malay [mailto:mbasu at mail.nih.gov] >> Sent: Thursday, 15 November 2007 9:47 a.m. >> To: Smithies, Russell >> Cc: Lee Katz; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] chromatogram >> >> I guess you need chromatogram from SCF. I can't help in that. ABI.pm > is >> not in Bioperl distribution. But to make the record straight, you can >> use one step chromatogram drawing in SVG from ABI file using my BioSVG >> module, available at: >> >> http://www.bioinformatics.org/~malay/biosvg/ >> >> Malay >> >> >> >> >> Smithies, Russell wrote: >>> Here's my trace viewer. >>> Please excuse my dodgy Perl and debugging code as it's still under >>> development :-) >>> >>> >>> Russell Smithies >>> >>> Bioinformatics Software Developer >>> T +64 3 489 9085 >>> E russell.smithies at agresearch.co.nz >>> >>> Invermay Research Centre >>> Puddle Alley, >>> Mosgiel, >>> New Zealand >>> T +64 3 489 3809 >>> F +64 3 489 9174 >>> www.agresearch.co.nz >>> >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>> #!perl -w >>> use ABI; >>> >>> use GD::Graph::lines; >>> use GD::Graph::colour; >>> use GD::Graph::Data; >>> >>> use Data::Dumper; >>> >>> >>> use Getopt::Long; >>> >>> use constant HEIGHT => 300; >>> >>> GetOptions ('h|height=i' => \$HEIGHT, >>> 'f|file=s' => \$FILE, >>> 'o|out=s' => \$OUTFILE, >>> 'l|left=s' => \$LEFT_SEQ, >>> 'r|right=s' => \$RIGHT_SEQ, >>> 's|size=i' => \$SIZE, >>> ) || die <>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o >>> test2.png -l actacgtacgta -r atgatcgtacgtac >>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 >>> --out test2.png --left actacgtacgta --right atgatcgtacgtac >>> >>> Options: >>> --height Set height of image (${\HEIGHT} pixels default) >>> --file Filename for the ABI trace file >>> --out Filename for the generated .png image >>> --left >>> --right >>> --size >>> >>> Parse an ABI trace file and render a PNG image. >>> See http://search.cpan.org/dist/ABI/ABI.pm >>> or >>> http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm >>> USAGE >>> >>> my $height = $HEIGHT || HEIGHT; >>> my $file = $FILE; >>> my $outfile = $OUTFILE; >>> >>> my $abi = ABI->new(-file=> $file); >>> >>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" >>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" >>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" >>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" >>> >>> my @base_calls = $abi->get_base_calls(); # Get the base calls >>> my $sequence =$abi->get_sequence(); >>> @bp = split(//, $sequence); >>> >>> >>> >>> # iterate over array >>> $size = $abi->get_trace_length(); >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> if(grep(/\b$i\b/, @base_calls)){ >>> $bases[$i] = $bp[$count]; >>> $count++; >>> }else{ >>> $bases[$i] = ' '; >>> } >>> } >>> >>> # create the data. see GD::Graph::Data for details of the format >>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); >>> >>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); >>> $graph->set( >>> title => $abi->get_sample_name(), >>> # y_max_value => $abi->get_max_trace() + 50, >>> x_max_value => $abi->get_trace_length(), >>> t_margin => 5, >>> b_margin => 5, >>> l_margin => 5, >>> r_margin => 5, >>> x_ticks => 0, >>> text_space => 0, >>> line_width => 1, >>> transparent => 0, >>> b_margin => 30, >>> t_margin => 35, >>> x_plot_values => 0, >>> interlaced => 1, >>> ); >>> >>> # allocate some colors for drawing the bases >>> #use colors same as Chromas >>> $graph->set( dclrs => [ qw( green blue black red pink) ] ); >>> >>> #plot the data >>> my $gd = $graph->plot(\@data); >>> >>> $black = $gd->colorAllocate(0,0,0); # A >>> $blue = $gd->colorAllocate(0,0,255); # C >>> $red = $gd->colorAllocate(255,0,0); # G >>> $green = $gd->colorAllocate(0,255,0); # T >>> $magenta =$gd->colorAllocate(255,0,255); # N >>> $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn >>> $gray = $gd->colorAllocate(210,210,210); >>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", >>> $magenta, " ",$white); >>> >>> #$start_base = index(lc($sequence),lc($LEFT_SEQ)); >>> $start_base = find_match($sequence,$LEFT_SEQ); >>> >>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ >>> $end_base = find_match($sequence,$RIGHT_SEQ, 1); >>> if($end_base){ >>> $end_base += length($RIGHT_SEQ); >>> } >>> >>> >>> # get the coords of the features on the image >>> @coords = $graph->get_hotspot(1); >>> $size = @coords; >>> $printed_num = 1; >>> $basecount = 0; >>> $numstoprint = $basecount - $start_base; >>> >>> # draw the colored bases and scale at top and bottom of image >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> $c = $coords[$i]; >>> (undef, $xs, undef, undef, undef, undef) = @$c; >>> $base = $bases[$i]; >>> if($base =~ /[ACGTN]/){ >>> if($start_base - 1 == $basecount){$start_base_coord = $xs;} >>> if($end_base - 1 == $basecount){$end_base_coord = $xs;} >>> if(defined($SIZE) && $start_base+$SIZE -2 == >>> $basecount){$end_base_coord_by_size = $xs;} >>> $basecount++; >>> $numstoprint++; >>> $printed_num = 0; >>> } >>> # print the bases top and bottom >>> $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); >>> $gd->string(GD::Font->Small(),$xs,$height - > 30,$base,$colors{$base}); >>> # print scale >>> if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ >>> if($LEFT_SEQ){ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> }else{ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> } >>> } >>> $top_right_corner = $xs; >>> } >>> >>> >>> >>> # only draw the clipped region if the calculated size is + or - 6bp >>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - > $start_base) >>> - $SIZE >= -6 ){ >>> # draw the clipped regions as gray >>> #if LEFT_SEQ supplied and a match found >>> if($LEFT_SEQ && $start_base > 0){ >>> $gd->filledRectangle(38,35,$start_base_coord - 1,$height - >>> 33,$red); >>> $clipped = 1; >>> } >>> #if RIGHT_SEQ supplied and a match found >>> if($RIGHT_SEQ && $end_base > 0){ >>> print join("\t", ($end_base)),"\n"; >>> $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height > - >>> 33,$gray); >>> $clipped = 1; >>> } >>> #if no RIGHT_SEQ supplied or no match found, use left match + seq >>> length >>> if(!$RIGHT_SEQ || $end_base < 0){ >>> >>> > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh >>> t - 33,$blue); >>> $clipped = 1; >>> } >>> >>> >>> >>> # set height based on max trace within clipped region >>> $graph->set( y_max_value => 3000);#$abi->get_max_trace() + > 50); >>> # need to re-plot the data over the grayed out area >>> $graph->plot(\@data) if $clipped; >>> $gd->filledRectangle(0,0,$top_right_corner,33,$white); >>> >>> #} >>> >>> #print the graph >>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; >>> binmode OUT; >>> print OUT $gd->png; >>> close OUT; >>> >>> >>> sub find_match{ >>> my ($sequence,$query,$last) = @_; >>> return -1 if length($query) < 6; >>> my($odds, $evens, $ones, $twos, $threes, $match_pos); >>> # try exact match >>> $match_pos = do_regex($query, $sequence,$last); return > $match_pos if >>> $match_pos > 0; >>> >>> # try matching every second base starting from the second base > e.g. >>> it will be .C.T.C.G.etc >>> map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} >>> ($query=~m/(\w\w)/g); >>> $match_pos = do_regex($odds, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($evens, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # try matching every third base starting from the first base > e.g. it >>> will be C..T..G..T etc >>> map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; >>> $threes.="..$3"} ($query =~m/(\w\w\w)/g); >>> $match_pos = do_regex($ones, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($twos, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($threes, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # not found >>> return -1; >>> } >>> >>> sub do_regex(){ >>> my ($query,$sequence,$last)= @_; >>> #print "trying $query \n"; >>> my $result = -1; >>> $result = pos($sequence)-length($query)+1 if $last && > ($sequence >>> =~ m/.*($query)/ig); >>> $result = pos($sequence)-length($query)+1 if($sequence =~ >>> m/.*?($query)/ig); >>> return $result; >>> } >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open- >>>> bio.org] On Behalf Of Lee Katz >>>> Sent: Wednesday, 14 November 2007 2:28 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] chromatogram >>>> >>>> Hi, >>>> I would like to know how to draw a chromatogram file. Does anyone >>>> have any sample code where you read in an scf file and create a > jpeg >>>> or other image file? >>>> For that matter, I want to be able to customize these images with > base >>>> calls if possible. I really appreciate the help, so thanks! >>>> >>>> -- >>>> Lee Katz >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ============================================================= >> ========== >>> Attention: The information contained in this message and/or > attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or > privileged >>> material. Any review, retransmission, dissemination or other use of, > or >>> taking of any action in reliance upon, this information by persons > or >>> entities other than the intended recipients is prohibited by > AgResearch >>> Limited. If you have received this message in error, please notify > the >>> sender immediately. >>> >> ============================================================= >> ========== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Malay K Basu >> www.malaybasu.net > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- Malay K Basu www.malaybasu.net From tomboy at cs.huji.ac.il Wed Nov 14 21:43:43 2007 From: tomboy at cs.huji.ac.il (Tomer Hertz) Date: Wed, 14 Nov 2007 18:43:43 -0800 Subject: [Bioperl-l] problems in stalling bio perl Message-ID: hi when I try to install bioperl I get the following error message: hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 $ perl Build.PL Can't find file lib/Module/Build.pm to determine version at /usr/lib/perl5/site_ perl/5.8/Module/Build/Base.pm line 950. can you please help. I have tried reinstalling the build command and that does not seem to help as well. many thanks --Tomer -- -------------------------------------------------------------------------------- Tomer Hertz Postdoctoral Researcher Machine Learning and Applied Statistics Microsoft Research One Microsoft Way, Redmond, WA, 98052, USA Homepage: www.cs.huji.ac.il/~tomboy Email: hertz at microsoft dot com Tel: (425)-421-8313 Fax: (425) 936-7329 -------------------------------------------------------------------------------- From lskatz at gatech.edu Thu Nov 15 08:24:02 2007 From: lskatz at gatech.edu (Lee Katz) Date: Thu, 15 Nov 2007 08:24:02 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B62D9.8010004@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Thank you all. Are you all sure in that there is no way to go from an scf to an image though? I do have abi files, but I am relying on Phred output for base calls for other things and I want to stay consistent. This means that if I use the fasta files that I get from Phred in another part of my program, I need to use the scf files it produces. If this is not possible, do you know if drawing an scf is in the works? Thanks. -- Lee Katz http://www.lskatz.com From cain.cshl at gmail.com Thu Nov 15 09:21:26 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 15 Nov 2007 09:21:26 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <1195136486.2785.12.camel@localhost.localdomain> Hi Lee, Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses Bio::SCF to draw trace files onto a Bio::Graphics::Panel. Bio::SCF is not part of bioperl, so you have to get it from CPAN and it depends on the Staden io-lib package, so you'll need that too. You can get GBrowse from http://www.gmod.org/gbrowse , and you can look at the tutorial for more information on configuring the trace glyph. Scott On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote: > Thank you all. > Are you all sure in that there is no way to go from an scf to an image > though? I do have abi files, but I am relying on Phred output for > base calls for other things and I want to stay consistent. This means > that if I use the fasta files that I get from Phred in another part of > my program, I need to use the scf files it produces. > > If this is not possible, do you know if drawing an scf is in the works? Thanks. > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From bosborne11 at verizon.net Thu Nov 15 09:18:05 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 09:18:05 -0500 Subject: [Bioperl-l] problems in stalling bio perl In-Reply-To: Message-ID: Tomer, Interesting. When I used Cygwin I always worked entirely within the C: drive, it looks like you're executing the script from the E: drive. Is Cygwin installed in C:/cygwin? You can see what I'm getting at, it's possible that you need to set $PERL5LIB to something like /cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say? Brian O. On 11/14/07 9:43 PM, "Tomer Hertz" wrote: > hi > when I try to install bioperl I get the following error message: > > hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 > $ perl Build.PL > Can't find file lib/Module/Build.pm to determine version at > /usr/lib/perl5/site_ > perl/5.8/Module/Build/Base.pm line 950. > can you please help. I have tried reinstalling the build command and that > does not seem to help as well. > > many thanks > --Tomer From bernd.web at gmail.com Thu Nov 15 10:26:42 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 16:26:42 +0100 Subject: [Bioperl-l] Graphics::Panel Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Hi, Has someone been able to access '$description' for the production of imagemaps with Graphics::Panel? The map below does not print the "title" tag at all, '$description' seems not available, although for the tracks ($panel->add_track) it is available. $map = $panel->create_web_map($mapname, $linkrule, '$description'); Replacing '$description' with a coderef for the titletag does work, if I use the code below my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } Regards, Bernd From luciap at sas.upenn.edu Thu Nov 15 10:44:21 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Thu, 15 Nov 2007 10:44:21 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Hi I was asked this question recently and it occurred to me I must be doing things inefficiently To produce gff file I was using SeqIO to parse the required fields, then according to the conventions just printing out whatever was required tab delimited, which is easy but if I wanted to generate a genbank file, extracting features from a gff file and a plain fasta file it was more complicated is there support for gff in bioperl now? anyone can contribute with smart way to go from/to gff, genebank and embl? thanks very much Lucia Peixoto Department of Biology,SAS University of Pennsylvania From lstein at cshl.edu Thu Nov 15 12:38:04 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Nov 2007 12:38:04 -0500 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Depending on which Feature object you use, you may have to use a tag named "note" instead of "description". Lincoln On Nov 15, 2007 10:26 AM, Bernd Web wrote: > Hi, > > Has someone been able to access '$description' for the production of > imagemaps with Graphics::Panel? > The map below does not print the "title" tag at all, '$description' > seems not available, although for the tracks ($panel->add_track) it is > available. > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > Replacing '$description' with a coderef for the titletag does work, if > I use the code below > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bernd.web at gmail.com Thu Nov 15 13:03:19 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 19:03:19 +0100 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com> On Nov 15, 2007 6:38 PM, Lincoln Stein wrote: > Depending on which Feature object you use, you may have to use a tag named > "note" instead of "description". > > Lincoln > > > > On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote: > > > > > > > > Hi, > > > > Has someone been able to access '$description' for the production of > > imagemaps with Graphics::Panel? > > The map below does not print the "title" tag at all, '$description' > > seems not available, although for the tracks ($panel->add_track) it is > > available. > > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > > > Replacing '$description' with a coderef for the titletag does work, if > > I use the code below > > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > > > > Regards, > > Bernd > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Nov 15 13:43:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Nov 2007 12:43:02 -0600 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> There are currently many ways to get what you want, but not all are consistent (particularly re: GFF3). We are aiming for more consistent, compliant GFF/GTF output in the next developer series (1.7) of Bioperl. You can try using bp_genbank2gff or bp_genbank2gff3 (both in the scripts directory); these are probably the most common way when working directly from a seq record. Bio::Tools::GFF is the most commonly used class though I'm unsure of it's status for GFF3 output. From within a Bio::SeqI you can call write_gff() (currently not very flexible) or from the SeqFeature itself gff_string(). Bio::Graphics::Feature has the additional method gff3_string(). Bio::FeatureIO is also an option, though I would consider it very experimental (it will likely undergo significant revision in the next bioperl dev series). Any others anyone can think of, maybe non-BioPerl related as well? chris On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > Hi > I was asked this question recently > and it occurred to me I must be doing things inefficiently > To produce gff file I was using SeqIO to parse the required fields, > then > according to the conventions just printing out whatever was > required tab > delimited, which is easy > > but if I wanted to generate a genbank file, extracting features > from a gff file > and a plain fasta file it was more complicated > is there support for gff in bioperl now? > anyone can contribute with smart way to go from/to gff, genebank > and embl? > > thanks very much > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Nov 15 14:19:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 14:19:41 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> Message-ID: Chris, There's also a genbank2gff3.PLS script in the GMOD package ( http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? revision=1.5&view=markup). However, it has not been modified for a couple of years, it may not be the "preferred" script. See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information on using Bioperl's bp_genbank2gff3 script. Brian O. On 11/15/07 1:43 PM, "Chris Fields" wrote: > There are currently many ways to get what you want, but not all are > consistent (particularly re: GFF3). We are aiming for more > consistent, compliant GFF/GTF output in the next developer series > (1.7) of Bioperl. > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > scripts directory); these are probably the most common way when > working directly from a seq record. Bio::Tools::GFF is the most > commonly used class though I'm unsure of it's status for GFF3 > output. From within a Bio::SeqI you can call write_gff() (currently > not very flexible) or from the SeqFeature itself gff_string(). > Bio::Graphics::Feature has the additional method gff3_string(). > Bio::FeatureIO is also an option, though I would consider it very > experimental (it will likely undergo significant revision in the next > bioperl dev series). > > Any others anyone can think of, maybe non-BioPerl related as well? > > chris > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > >> Hi >> I was asked this question recently >> and it occurred to me I must be doing things inefficiently >> To produce gff file I was using SeqIO to parse the required fields, >> then >> according to the conventions just printing out whatever was >> required tab >> delimited, which is easy >> >> but if I wanted to generate a genbank file, extracting features >> from a gff file >> and a plain fasta file it was more complicated >> is there support for gff in bioperl now? >> anyone can contribute with smart way to go from/to gff, genebank >> and embl? >> >> thanks very much >> >> Lucia Peixoto >> Department of Biology,SAS >> University of Pennsylvania >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Nov 15 17:31:28 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 16 Nov 2007 11:31:28 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Just to add to this, does anyone have any code for reading .sff 'traces' from 454 sequences? Thanx, Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From torsten.seemann at infotech.monash.edu.au Thu Nov 15 20:13:22 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 16 Nov 2007 12:13:22 +1100 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: > Just to add to this, does anyone have any code for reading .sff 'traces' > from 454 sequences? The .SFF files can be manipulated using the SFF tools which 454 distribute with their result data. eg. "sffinfo 454AllContigs.sff" will list all the reads with the original flowgram values etc. However, the SFF tools are i386.Linux binaries, so not really a portable solution. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From mvrmakam at yahoo.com Thu Nov 15 22:04:55 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST) Subject: [Bioperl-l] Problem with installing bioperl on Windows Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com> Hi, I have installed Perl Package Manager ver 5.8.8.822 on windows XP. I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View. However, I am not able to see any packages in the view box. Can anyone help me in this matter. Roshan ____________________________________________________________________________________ Get easy, one-click access to your favorites. Make Yahoo! your homepage. http://www.yahoo.com/r/hs From David.Messina at sbc.su.se Fri Nov 16 03:33:04 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 16 Nov 2007 09:33:04 +0100 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com> > If this is not possible, do you know if drawing an scf is in the > works? Thanks. > One non-BioPerl solution is 4peaks: http://mekentosj.com/4peaks/ Mac only, but really great software. I'm also a fan of their Papers journal article PDF library program. Dave From neetisomaiya at gmail.com Mon Nov 19 01:11:49 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 19 Nov 2007 11:41:49 +0530 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Hi, I am using Bio::SeqIO for parsing KEGG gene ent files. A part of my code is foreach my $key ( $ac->get_all_annotation_keys() ) { if($key eq "dblink") { my %values = $ac->get_Annotations($key); foreach my $value ( keys(%values )) { print "\n*****VALUE $value*****\n"; } } } Here not all dblinks present in the actual file get parsed. For eg, in the data below, ENTRY 116064 CDS H.sapiens NAME LRRC58 DEFINITION leucine rich repeat containing 58 POSITION 3q13.33 MOTIF Pfam: SdiA-regulated LRR_1 PROSITE: LEU_RICH DBLINKS NCBI-GI: 153792305 NCBI-GeneID: 116064 HGNC: 26968 Ensembl: ENSG00000163428 UniProt: Q96CX6 Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE, but doesnt give me HGNC and UniProt. For other entries it gives me other combinations of dbs. Can anyone help me with this. Why is this happenning? I have no clue. Thanks and Regards, Neeti. -- -Neeti Even my blood says, B positive From johnston at biochem.ucl.ac.uk Mon Nov 19 06:44:59 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT) Subject: [Bioperl-l] blast database names Message-ID: Hello, Is there a list of the possible database names for -data => $dbname in RemoteBlast somwhere? Cheers, Cass From cjfields at uiuc.edu Mon Nov 19 08:44:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 07:44:46 -0600 Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: Here's a recent list (don't know if it's up-to-date): http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html chris On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote: > Hello, > > Is there a list of the possible database names for -data => > $dbname in RemoteBlast somwhere? > > Cheers, > Cass > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Nov 19 09:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 08:33:46 -0600 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Message-ID: It makes sense in the light that you're (erroneously) using a hash: my %values = $ac->get_Annotations($key); This assigns key-value pairs of DBLink => DBLink; you don't see an error b/c the number of links happens to be even (I get 8) but you would if the number of links returned is odd (missing value for key error or something along those lines). So when you call: foreach my $value (keys(%values)) {....} you only get half of the DBLinks. You should use an array: my @values = $ac->get_Annotations($key); foreach my $value (@values) { print $value->as_text,"\n"; } Note the loop change; Bio::Annotation are no longer operator overloaded so your print statement wouldn't work in a bioperl 1.6 world. chris On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote: > Hi, > > I am using Bio::SeqIO for parsing KEGG gene ent files. > > A part of my code is > > foreach my $key ( $ac->get_all_annotation_keys() ) > { > if($key eq "dblink") > { > my %values = > $ac->get_Annotations($key); > foreach my $value ( > keys(%values )) > { > print > "\n*****VALUE > $value*****\n"; > } > } > } > > Here not all dblinks present in the actual file get parsed. For eg, > in the > data below, > ENTRY 116064 CDS H.sapiens > NAME LRRC58 > DEFINITION leucine rich repeat containing 58 > POSITION 3q13.33 > MOTIF Pfam: SdiA-regulated LRR_1 > PROSITE: LEU_RICH > DBLINKS NCBI-GI: 153792305 > NCBI-GeneID: 116064 > HGNC: 26968 > Ensembl: ENSG00000163428 > UniProt: Q96CX6 > > Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and > PROSITE, > but doesnt give me HGNC and UniProt. For other entries it gives me > other > combinations of dbs. > > Can anyone help me with this. Why is this happenning? I have no clue. > > Thanks and Regards, > Neeti. > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From akarger at CGR.Harvard.edu Mon Nov 19 10:38:26 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 19 Nov 2007 10:38:26 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> References: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Message-ID: > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 13, 2007 12:42 PM > To: Amir Karger > Cc: Steve Chervitz; Dave Messina; bioperl-l > Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? > > Amir, > > Can you file this as a bug? Done. http://bugzilla.open-bio.org/show_bug.cgi?id=2399 > Dave mentioned he would look > into it but > I think it warrants tracking to make sure it gets fixed: > > http://www.bioperl.org/wiki/Bugs > > Attach the example BLAST report from your last post to the report. > BTW, I wonder how this appears in XML output? > > chris > > On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: > > >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > >> Of Steve Chervitz > >> > >> The Bioperl blast parser should extract that value and you > can obtain > >> it from an HSP object, via the HSPI::n() method, documented here: > >> > >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > > io/Search/HSP/HSPI.html#POD23 > > > > As I mentioned in my email: > > > > And does anyone know off-hand if Bioperl will tell me when > situations > > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > > subroutine > > would help, but I just get a bunch of empty strings for that, > > whether or > > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > > {"_n"} is > > undef.) > > > > And the docs for n() actually say, "This value is not defined with > > NCBI > > Blast2 with gapping" although they don't say why. Which may > explain > > why, > > when I ran the following code on the blast result I included in my > > last > > email, I got empty values for all of the n's. (Why is n() > undefined > > for > > gapped blast if I'm getting n's in my results from that blast?) > > > > use warnings; > > use strict; > > use Bio::SearchIO; > > > > my $blast_out = $ARGV[0]; > > my $in = new Bio::SearchIO(-format => 'blast', > > -file => $blast_out, > > -report_type => 'tblastn'); > > > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart > Send Frame N > > Evalue)), "\n"; > > while(my $query = $in->next_result) { > > while(my $subject = $query->next_hit) { > > while (my $hsp = $subject->next_hsp) { > > print join("\t", > > $query->query_name, > > $hsp->start("query"), > > $hsp->end("query"), > > $hsp->strand("hit"), > > $subject->name, > > $hsp->start("hit"), > > $hsp->end("hit"), > > $subject->frame, > > $hsp->n, > > $hsp->evalue, > > ),"\n"; > > } > > } > > } > > > >> Dave's basically correct in his explanation. It's a result of the > >> application of sum statistics by the blast algorithm. You > can read > >> all > >> about it in Korf et al's BLAST book. Here's the relevant section: > > > > [snip] > > > > Thanks, > > > > -Amir > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From aaron.j.mackey at gsk.com Mon Nov 19 11:50:53 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 19 Nov 2007 11:50:53 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: Message-ID: While Lucia's subject line asked for genbank2gff, her message actually asked the reverse (gff + fasta -> genbank). e.g. pretend you had to prepare a genome annotation for submission to GenBank ... and no, I don't know of any generalized gff2genbank script out there ... Lucia, the SeqIO::genbank module will write GenBank format, but you have to get all the bits and bobs together in the right way, i.e. construct the various AnnotationCollections and SeqFeatures (with SplitLocations for exons, CDS, etc.) that a GenBank record expects. One way to do this is to start with a template GenBank file that you'd like to mimic, strip it down to only two gene models, use SeqIO::genbank to read it into memory, and then step through the object with the Perl debugger to see how it is composed. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM: > Chris, > > There's also a genbank2gff3.PLS script in the GMOD package ( > http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? > revision=1.5&view=markup). However, it has not been modified for a couple of > years, it may not be the "preferred" script. > > See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and > http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information > on using Bioperl's bp_genbank2gff3 script. > > Brian O. > > > On 11/15/07 1:43 PM, "Chris Fields" wrote: > > > There are currently many ways to get what you want, but not all are > > consistent (particularly re: GFF3). We are aiming for more > > consistent, compliant GFF/GTF output in the next developer series > > (1.7) of Bioperl. > > > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > > scripts directory); these are probably the most common way when > > working directly from a seq record. Bio::Tools::GFF is the most > > commonly used class though I'm unsure of it's status for GFF3 > > output. From within a Bio::SeqI you can call write_gff() (currently > > not very flexible) or from the SeqFeature itself gff_string(). > > Bio::Graphics::Feature has the additional method gff3_string(). > > Bio::FeatureIO is also an option, though I would consider it very > > experimental (it will likely undergo significant revision in the next > > bioperl dev series). > > > > Any others anyone can think of, maybe non-BioPerl related as well? > > > > chris > > > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > > > >> Hi > >> I was asked this question recently > >> and it occurred to me I must be doing things inefficiently > >> To produce gff file I was using SeqIO to parse the required fields, > >> then > >> according to the conventions just printing out whatever was > >> required tab > >> delimited, which is easy > >> > >> but if I wanted to generate a genbank file, extracting features > >> from a gff file > >> and a plain fasta file it was more complicated > >> is there support for gff in bioperl now? > >> anyone can contribute with smart way to go from/to gff, genebank > >> and embl? > >> > >> thanks very much > >> > >> Lucia Peixoto > >> Department of Biology,SAS > >> University of Pennsylvania > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From johnston at biochem.ucl.ac.uk Mon Nov 19 09:46:03 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT) Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: On Mon, 19 Nov 2007, Chris Fields wrote: > Here's a recent list (don't know if it's up-to-date): > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html Thanks. Perhaps I missed something in the docs, but I don't think I've quite understood how this is supposed to work. I'm trying to blast primer sequences against the ref genome sequence. Should I be using ref_contig? How can I limit the blast to a single species? cheers, Cass. From Kevin.M.Brown at asu.edu Mon Nov 19 13:31:38 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 19 Nov 2007 11:31:38 -0700 Subject: [Bioperl-l] pSW vs dpAlign Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu> I was able to get the Ext package installed, just had to copy the Align.pm file up one directory from where it was being put by the installer. Now I have a technician trying to use pSW (Bio::Tools::pSW) and it appears to have been last updated back in '99 and seems to lack certain methods to get things out of the alignment like the score. The test.pl script that Bio::Ext comes with actually uses Bio::Tools::dpAlign. Is dpAlign the replacement for pSW? From bernd.web at gmail.com Wed Nov 21 11:42:40 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 17:42:40 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Hi Russell, I came across your question. At first I thought all was well on my system, but indeed I also have these colouring problems. I noted that scrore in the bgcolor callback gets a different value! Printing score during hit parsing($hit->raw_score) gives the same score as -description my $score = $feature->score; However, printing score in the bgcolor sub gives 2573! All scores in the bgcolor routine all different and higher than the real scores. Were you able to solve this colouring issue? Regards, Bernd > Hi all, > I'm using a modified version of Lincoln's tutorial > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > to give a similar image to that from NCBI but for some reason, my > colours are coming out wrong (see attached example) > They seem to be off by one but I can't see why. > > Any ideas? > > I can't be certain but I think it's only started doing this since our > BLAST upgrade to 2.2.17 a few weeks ago. > > Here's the colouring code: > ------------------------------------------------------------------------ > ------- > my $track = $panel->add_track( > -glyph => 'segments', > -label => 1, > -connector => 'dashed', > -bgcolor => sub { > my $feature = shift; > my $score = $feature->score; > return 'red' if $score >= 200; > return 'fuchsia' if $score >= 80; > return 'lime' if $score >= 50; > return 'blue' if $score >= 40; > return 'black'; > }, > -font2color => 'gray', > -sort_order => 'high_score', > -description => sub { > my $feature = shift; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my $score = $feature->score; > "$description, score=$score"; > }, > ); > ------------------------------------------------------------------------ > --------- > > > Thanx, > > Russell Smithies > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Wed Nov 21 12:38:30 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 18:38:30 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Hi, I now found that bgcolor is using a $feature->score that is coming directly from the blast report, it is not the bit score. -bgcolor => sub {my $feature = shift; my $score = $feature->score; print "$score\n"; } always print the score, even if the score is not set in the Bio::SeqFeature::Generic object. -description callbacks are somehow using the score from the SeqFeature object. Does anyone have an idea why? Further is is possible to get the raw_score of a hit. $hit->raw_score actually gets the bitscore (w/o decimal point). Bernd On Nov 21, 2007 5:42 PM, Bernd Web wrote: > Hi Russell, > > I came across your question. At first I thought all was well on my > system, but indeed I also have these colouring problems. > I noted that scrore in the bgcolor callback gets a different value! > Printing score during hit parsing($hit->raw_score) gives the same > score as -description > my $score = $feature->score; However, printing score in the bgcolor > sub gives 2573! > All scores in the bgcolor routine all different and higher than the > real scores. Were you able to solve this colouring issue? > > Regards, > Bernd > > > > Hi all, > > I'm using a modified version of Lincoln's tutorial > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > to give a similar image to that from NCBI but for some reason, my > > colours are coming out wrong (see attached example) > > They seem to be off by one but I can't see why. > > > > Any ideas? > > > > I can't be certain but I think it's only started doing this since our > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > Here's the colouring code: > > ------------------------------------------------------------------------ > > ------- > > my $track = $panel->add_track( > > -glyph => 'segments', > > -label => 1, > > -connector => 'dashed', > > -bgcolor => sub { > > my $feature = shift; > > my $score = $feature->score; > > return 'red' if $score >= 200; > > return 'fuchsia' if $score >= 80; > > return 'lime' if $score >= 50; > > return 'blue' if $score >= 40; > > return 'black'; > > }, > > -font2color => 'gray', > > -sort_order => 'high_score', > > -description => sub { > > my $feature = shift; > > return unless > > $feature->has_tag('description'); > > my ($description) = > > $feature->each_tag_value('description'); > > my $score = $feature->score; > > "$description, score=$score"; > > }, > > ); > > ------------------------------------------------------------------------ > > --------- > > > > > > Thanx, > > > > Russell Smithies > > > > > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From sac at bioperl.org Wed Nov 21 13:43:54 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 21 Nov 2007 10:43:54 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> On Nov 21, 2007 9:38 AM, Bernd Web wrote: > [snip] > > Further is is possible to get the raw_score of a hit. $hit->raw_score > actually gets the bitscore (w/o decimal point). Hmmm. raw_score should not be the same as bit score. So given an example blast hit line such as: Score = 60.0 bits (30), Expect = 1e-06 $hit->raw_score() should return 30, not 60, as you seem to be getting. Could you submit a bug report for this? http://www.bioperl.org/wiki/Bugs Thanks, Steve > > On Nov 21, 2007 5:42 PM, Bernd Web wrote: > > Hi Russell, > > > > I came across your question. At first I thought all was well on my > > system, but indeed I also have these colouring problems. > > I noted that scrore in the bgcolor callback gets a different value! > > Printing score during hit parsing($hit->raw_score) gives the same > > score as -description > > my $score = $feature->score; However, printing score in the bgcolor > > sub gives 2573! > > All scores in the bgcolor routine all different and higher than the > > real scores. Were you able to solve this colouring issue? > > > > Regards, > > Bernd > > > > > > > Hi all, > > > I'm using a modified version of Lincoln's tutorial > > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > > to give a similar image to that from NCBI but for some reason, my > > > colours are coming out wrong (see attached example) > > > They seem to be off by one but I can't see why. > > > > > > Any ideas? > > > > > > I can't be certain but I think it's only started doing this since our > > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > > > Here's the colouring code: > > > ------------------------------------------------------------------------ > > > ------- > > > my $track = $panel->add_track( > > > -glyph => 'segments', > > > -label => 1, > > > -connector => 'dashed', > > > -bgcolor => sub { > > > my $feature = shift; > > > my $score = $feature->score; > > > return 'red' if $score >= 200; > > > return 'fuchsia' if $score >= 80; > > > return 'lime' if $score >= 50; > > > return 'blue' if $score >= 40; > > > return 'black'; > > > }, > > > -font2color => 'gray', > > > -sort_order => 'high_score', > > > -description => sub { > > > my $feature = shift; > > > return unless > > > $feature->has_tag('description'); > > > my ($description) = > > > $feature->each_tag_value('description'); > > > my $score = $feature->score; > > > "$description, score=$score"; > > > }, > > > ); > > > ------------------------------------------------------------------------ > > > --------- > > > > > > > > > Thanx, > > > > > > Russell Smithies > > > > > > > > > > > > > > > ======================================================================= > > > Attention: The information contained in this message and/or attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or privileged > > > material. Any review, retransmission, dissemination or other use of, or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From binkley at genome.stanford.edu Wed Nov 21 19:35:02 2007 From: binkley at genome.stanford.edu (Jonathan Binkley) Date: Wed, 21 Nov 2007 16:35:02 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Hi, I installed bioperl on a Mac (OS 10.4, Intel) via fink, which put it here: /sw/lib/perl5/5.8.6/Bio/ It seems to work fine, but I need bioperl-ext for Smith-Waterman alignments. So, into which directory should I download bioperl-ext and run the Makefile? Thanks. From dcj at sanger.ac.uk Thu Nov 22 09:47:09 2007 From: dcj at sanger.ac.uk (Daniel Jeffares) Date: Thu, 22 Nov 2007 14:47:09 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml Message-ID: Hi all, Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to be a little 'broken', at least in my hands. First, $bml->set_parameter('runmode', 0); does not work (sets runmode to -2). setting runmode to 1 is OK. Also, $bml->no_param_checks(1); doesn't seem to work. The result is that the baseml.ctl file created under /tmp is not runnable by baseml with runmode 0. The phylip file created is run OK by baeml(with another .ctl file). My script & baseml.ctl below. Hope it can be fixed, cheers, Dan #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; my $alignio = Bio::AlignIO->new(-format => 'phylip',-file => 'test.phy'); my $aln = $alignio->next_aln; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; The baseml.ctl file produced: seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA outfile = mlb fix_rho = 1 verbose = 0 noisy = 0 RateAncestor = 1 kappa = 2.5 model = 0 ndata = 5 Small_Diff = 1e-6 runmode = -2 alpha = 0 fix_kappa = 0 rho = 0 nhomo = 0 getSE = 0 cleandata = 1 fix_alpha = 1 clock = 0 Malpha = 0 ncatG = 5 fix_blength = -1 nparK = 0 Regards, Daniel Jeffares ______________________________ Population and Comparative Genomics Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK Phone: +44(0)1223 834244 x 7297 Fax: +44 (0)1223 494919 www.sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Nov 22 11:06:16 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 22 Nov 2007 17:06:16 +0100 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: References: Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Daniel, I don't have bioperl-run or PAML installed on my system to test it myself, but have you tried the latest version of bioperl-run from CVS? It looks like that code has been worked on since 1.5.2 was released. If that still doesn't work, could you file this as a bug to make sure it gets followed up? Dave You can grab the tarball here: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl and if necessary file the bug here: BioPerl Bugzilla tracking system From arareko at campus.iztacala.unam.mx Thu Nov 22 11:37:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 22 Nov 2007 10:37:24 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> Message-ID: <4745B044.5090102@campus.iztacala.unam.mx> Hi Peter, In BioPerl, there's no such mapping for db_xref's that I'm aware of. Each parser handles db_xref records on its own. Take a look at the Bio::SeqIO::genbank code, inside the next_seq() method for example: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup Regards, Mauricio. Peter wrote: > Dear all, > > I'm one of the Biopython developers. I've recently got going with > BioSQL and have been getting to grips with the Biopython BioSQL > interface. I'm aware that we need to try and be consistent with > BioPerl and BioJava, so I'd like to pose my first question related to > that. > > When loading GenBank records, many features have db_xref qualifiers, > e.g. from a random CDS feature in E. coli K12: > > /db_xref="ASAP:1309" > /db_xref="GI:16128366" > /db_xref="ECOCYC:EG10213" > /db_xref="GeneID:945313" > > Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", > "GeneID" before using recording these entries in the seqfeature_dbxref > and dbxref tables. For example, "GI" becomes "GeneIndex". > Biopython's current mapping is as follows: > > # Dictionary of database types, keyed by GenBank db_xref abbreviation > db_dict = {'GeneID': 'Entrez', > 'GI': 'GeneIndex', > 'COG': 'COG', > 'CDD': 'CDD', > 'DDBJ': 'DNA Databank of Japan', > 'Entrez': 'Entrez', > 'GeneIndex': 'GeneIndex', > 'PUBMED': 'PubMed', > 'taxon': 'Taxon', > 'ATCC': 'ATCC', > 'ISFinder': 'ISFinder', > 'GOA': 'Gene Ontology Annotation', > 'ASAP': 'ASAP', > 'PSEUDO': 'PSEUDO', > 'InterPro': 'InterPro', > 'GEO': 'Gene Expression Omnibus', > 'EMBL': 'EMBL', > 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', > 'ECOCYC': 'EcoCyc', > 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' > } > > In my testing, I've found several GenBank db_xref abbreviation for > which we don't have a mapping defined, such as "LocusID", "dbSNP", > "MGD", "MIM", or from an EMBL file, "REMTREMBL". > > I'd like to know if BioPerl and/or BioJava and/or BioRuby define a > similar mapping in their BioSQL code (or GenBank parser), so that > Biopython can follow your example. > > Thank you, > > Peter > > P.S. See also Biopython bug 2405 > http://bugzilla.open-bio.org/show_bug.cgi?id=2405 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From avilella at gmail.com Thu Nov 22 16:55:10 2007 From: avilella at gmail.com (Albert Vilella) Date: Thu, 22 Nov 2007 21:55:10 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Hi, Am I right in thinking that the '_symbols' hash in SimpleAlign is only used if one calls the symbol_chars method? When I comment out this line: map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if $seq->seq; # line 257 I get a nice speed boost on loading alignments. Can I comment this line out in the CVS HEAD? Cheers, Albert. [init] 5.96046447753906e-06 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.0022270679473877 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 2.14348912239075 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 6.91910791397095 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 15.8402290344238 secs... avilella at magneto:~$ perl /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl -dir /home/avilella/ensembl/exoseq/test -verbose [init] 1.21593475341797e-05 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.00294303894042969 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 0.510555982589722 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 1.6192569732666 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 3.86473417282104 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta] 6.99602198600769 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta] 7.26704716682434 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta] 8.44332504272461 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta] 12.103296995163 secs... From cjfields at uiuc.edu Thu Nov 22 19:30:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:30:51 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu> How are tests affected? It might be worth going through the revision history to see if there was a specific reason this was implemented, but if it passes tests I don't see why we need it. chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 22 19:42:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:42:12 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> <4745B044.5090102@campus.iztacala.unam.mx> Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu> I think SeqIO checks the name for parsing reasons only, in cases where the format changes based on the source (such as GenPept DBSOURCE data). I don't think we go beyond that in Bioperl, probably b/c modifying or expanding names for data persistence would lead to volatile coding issues (i.e. consistency between parsers, constant updating to cover new crossrefs, etc). I would definitely suggest retaining the original DB as it appears in the dbxref for consistency/sanity; if needed return expanded names using a different method if they are designated. chris On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote: > Hi Peter, > > In BioPerl, there's no such mapping for db_xref's that I'm aware of. > Each parser handles db_xref records on its own. Take a look at the > Bio::SeqIO::genbank code, inside the next_seq() method for example: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup > > Regards, > Mauricio. > > Peter wrote: >> Dear all, >> >> I'm one of the Biopython developers. I've recently got going with >> BioSQL and have been getting to grips with the Biopython BioSQL >> interface. I'm aware that we need to try and be consistent with >> BioPerl and BioJava, so I'd like to pose my first question related to >> that. >> >> When loading GenBank records, many features have db_xref qualifiers, >> e.g. from a random CDS feature in E. coli K12: >> >> /db_xref="ASAP:1309" >> /db_xref="GI:16128366" >> /db_xref="ECOCYC:EG10213" >> /db_xref="GeneID:945313" >> >> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", >> "GeneID" before using recording these entries in the >> seqfeature_dbxref >> and dbxref tables. For example, "GI" becomes "GeneIndex". >> Biopython's current mapping is as follows: >> >> # Dictionary of database types, keyed by GenBank db_xref abbreviation >> db_dict = {'GeneID': 'Entrez', >> 'GI': 'GeneIndex', >> 'COG': 'COG', >> 'CDD': 'CDD', >> 'DDBJ': 'DNA Databank of Japan', >> 'Entrez': 'Entrez', >> 'GeneIndex': 'GeneIndex', >> 'PUBMED': 'PubMed', >> 'taxon': 'Taxon', >> 'ATCC': 'ATCC', >> 'ISFinder': 'ISFinder', >> 'GOA': 'Gene Ontology Annotation', >> 'ASAP': 'ASAP', >> 'PSEUDO': 'PSEUDO', >> 'InterPro': 'InterPro', >> 'GEO': 'Gene Expression Omnibus', >> 'EMBL': 'EMBL', >> 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', >> 'ECOCYC': 'EcoCyc', >> 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' >> } >> >> In my testing, I've found several GenBank db_xref abbreviation for >> which we don't have a mapping defined, such as "LocusID", "dbSNP", >> "MGD", "MIM", or from an EMBL file, "REMTREMBL". >> >> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a >> similar mapping in their BioSQL code (or GenBank parser), so that >> Biopython can follow your example. >> >> Thank you, >> >> Peter >> >> P.S. See also Biopython bug 2405 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2405 >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 22 19:49:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:49:15 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Albert, Found it: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ SimpleAlign.pm.diff?r1=1.36&r2=1.37 If it slows performance that dramatically, maybe we can move this to a separate AlignUtils method instead. Maybe something to ask Jason about? chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 23 07:29:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Nov 2007 12:29:37 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Message-ID: <4746C7B1.1010002@sendu.me.uk> Dave Messina wrote: > Daniel, > > I don't have bioperl-run or PAML installed on my system to test it myself, > but have you tried the latest version of bioperl-run from CVS? It looks like > that code has been worked on since 1.5.2 was released. Yes, I fixed it in CVS so it should at least /run/. I don't know about the parsing side of things, though that may also have been fixed recently by someone else. From avilella at gmail.com Fri Nov 23 08:08:59 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Nov 2007 13:08:59 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <4746C7B1.1010002@sendu.me.uk> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Just to mention that the new paml4 has a "basemlg" instead of a "baseml" binary. AFAIK, Jason fixed codeml to make it work both for paml3.xx a paml4, but I am not sure about baseml. Also, I think if you set runmode 0, you have to provide a tree: #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy'); my $treeio = Bio::TreeIO->new(-format => 'newick', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree'); my $aln = $alignio->next_aln; my $tree = $treeio->next_tree; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->tree($tree); $bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml"); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while ( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); $DB::single=1;1; # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; 4 50 Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC- Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC ACAUUUU-CCUUGCAAAG ACAUCAU-CCUUGCAAAG ACAUCAUCCCUCGCAGAG ACAUCAUCCCUUGCAGAG (((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm); On Nov 23, 2007 12:29 PM, Sendu Bala wrote: > Dave Messina wrote: > > Daniel, > > > > I don't have bioperl-run or PAML installed on my system to test it myself, > > but have you tried the latest version of bioperl-run from CVS? It looks like > > that code has been worked on since 1.5.2 was released. > > Yes, I fixed it in CVS so it should at least /run/. I don't know about > the parsing side of things, though that may also have been fixed > recently by someone else. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Fri Nov 23 11:24:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 10:24:59 -0600 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu> I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just 'basemlg'), so it would need to work with both. Do we want to put a PAML parser/wrapper overhaul on the TODO list for 1.6? chris On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote: > Just to mention that the new paml4 has a "basemlg" instead of a > "baseml" binary. AFAIK, Jason fixed codeml to make it work both for > paml3.xx a paml4, but I am not sure about baseml. ... From arvindvanam at gmail.com Fri Nov 23 16:26:06 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl Message-ID: <13918981.post@talk.nabble.com> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); my $rnafold = $factory->program('rnafold'); my $job=$rnafold->run(-rnafold => 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); I installed Vienna package and then i tried using Pise to create an object for the program but its giving the following error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bio::Tools::Run::PiseJob terminated: URL missing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::PiseJob::terminated /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 STACK: Bio::Tools::Run::PiseApplication::submit /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 STACK: Bio::Tools::Run::PiseApplication::run /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 STACK: evaluate.pl:12 how to make the program RNAfold run in perl... IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? plz reply soon -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Fri Nov 23 17:49:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 16:49:43 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13918981.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> The Pise wrappers run the programs remotely; see Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ mfold wrappers but haven't done so yet. The Vienna tools do have a Perl-based (non-BioPerl-based) module included which uses libRNA, and is well worth a look. Try 'perldoc RNA' if you have installed the tools locally, or look here for other Perl-based tools: http://www.tbi.univie.ac.at/~ivo/RNA/utils.html chris On Nov 23, 2007, at 3:26 PM, vanam wrote: > > how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? > > my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); > my $rnafold = $factory->program('rnafold'); > my $job=$rnafold->run(-rnafold => > 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); > > I installed Vienna package and then i tried using Pise to create an > object > for the program but its giving the following error > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bio::Tools::Run::PiseJob terminated: URL missing > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::PiseJob::terminated > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 > STACK: Bio::Tools::Run::PiseApplication::submit > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 > STACK: Bio::Tools::Run::PiseApplication::run > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 > STACK: evaluate.pl:12 > > > how to make the program RNAfold run in perl... > IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? > > plz reply soon > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13918981 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Sat Nov 24 02:29:11 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> Message-ID: <13922740.post@talk.nabble.com> i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and i used it exactly as it was mentioned in it. i just want that instead of running its perl version "RNAfold.pl" I can use the functions associated with RNAfold with a perl program without having to call the program using system() command. if you can just tell me how to use these wrapper modules it would b of gr8 help...like while using clustalw or clustalx we define the environment variable for it ..do we have to do the same for RNAfold or Mfold Chris Fields wrote: > > The Pise wrappers run the programs remotely; see > Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a > local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ > mfold wrappers but haven't done so yet. The Vienna tools do have a > Perl-based (non-BioPerl-based) module included which uses libRNA, and > is well worth a look. Try 'perldoc RNA' if you have installed the > tools locally, or look here for other Perl-based tools: > > http://www.tbi.univie.ac.at/~ivo/RNA/utils.html > > chris > > On Nov 23, 2007, at 3:26 PM, vanam wrote: > >> >> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >> >> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >> my $rnafold = $factory->program('rnafold'); >> my $job=$rnafold->run(-rnafold => >> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >> >> I installed Vienna package and then i tried using Pise to create an >> object >> for the program but its giving the following error >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::PiseJob::terminated >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >> STACK: Bio::Tools::Run::PiseApplication::submit >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >> STACK: Bio::Tools::Run::PiseApplication::run >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >> STACK: evaluate.pl:12 >> >> >> how to make the program RNAfold run in perl... >> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >> >> plz reply soon >> -- >> View this message in context: http://www.nabble.com/run-RNAfold-in- >> perl-tf4863835.html#a13918981 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From avilella at gmail.com Sun Nov 25 06:50:42 2007 From: avilella at gmail.com (Albert Vilella) Date: Sun, 25 Nov 2007 11:50:42 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> cvs commited now. it is calculated anyway when calling symbol_chars so... On Nov 23, 2007 12:49 AM, Chris Fields wrote: > Albert, > > Found it: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > SimpleAlign.pm.diff?r1=1.36&r2=1.37 > > If it slows performance that dramatically, maybe we can move this to > a separate AlignUtils method instead. Maybe something to ask Jason > about? > > chris > > On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > > > > Hi, > > > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > > used if one calls the symbol_chars method? > > > > When I comment out this line: > > > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > > $seq->seq; # line 257 > > > > I get a nice speed boost on loading alignments. > > > > Can I comment this line out in the CVS HEAD? > > > > Cheers, > > > > Albert. > > > > [init] 5.96046447753906e-06 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.0022270679473877 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 2.14348912239075 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 6.91910791397095 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 15.8402290344238 secs... > > > > avilella at magneto:~$ perl > > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > > ancestral_alleles.pl > > -dir /home/avilella/ensembl/exoseq/test -verbose > > [init] 1.21593475341797e-05 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.00294303894042969 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 0.510555982589722 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 1.6192569732666 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 3.86473417282104 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000203717.chr1.fasta] > > 6.99602198600769 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000196188.chr1.fasta] > > 7.26704716682434 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000025800.chr1.fasta] > > 8.44332504272461 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000117475.chr1.fasta] > > 12.103296995163 secs... > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From cjfields at uiuc.edu Sun Nov 25 10:05:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:05:27 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13922740.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Again, these wrappers are for submitting data to a Pise server for the corresponding programs (run on a remote server). There are no wrappers for running RNAfold on your computer (i.e. locally), with or w/o a set env. variable. You can try instaling Pise locally and setting the location() as shown in POD to localhost, however I don't know how stable these modules are with newer versions of Pise. These haven't been updated in a few years, apart from getting tests to work. Another option is installing EMBOSS along with the EMBASSY version of RNAFold; this could conceivably be run through Bio::Factory::EMBOSS. chris On Nov 24, 2007, at 1:29 AM, vanam wrote: > > i have seen the documentation for > Bio::Tools::Run::AnalysisFactory::Pise and > i used it exactly as it was mentioned in it. > > i just want that instead of running its perl version "RNAfold.pl" I > can use > the functions associated with RNAfold with a perl program without > having to > call the program using system() command. > > if you can just tell me how to use these wrapper modules it would b > of gr8 > help...like while using clustalw or clustalx we define the environment > variable for it ..do we have to do the same for RNAfold or Mfold > > > > > Chris Fields wrote: >> >> The Pise wrappers run the programs remotely; see >> Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a >> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ >> mfold wrappers but haven't done so yet. The Vienna tools do have a >> Perl-based (non-BioPerl-based) module included which uses libRNA, and >> is well worth a look. Try 'perldoc RNA' if you have installed the >> tools locally, or look here for other Perl-based tools: >> >> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html >> >> chris >> >> On Nov 23, 2007, at 3:26 PM, vanam wrote: >> >>> >>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >>> >>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >>> my $rnafold = $factory->program('rnafold'); >>> my $job=$rnafold->run(-rnafold => >>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >>> >>> I installed Vienna package and then i tried using Pise to create an >>> object >>> for the program but its giving the following error >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >>> STACK: Bio::Tools::Run::PiseJob::terminated >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >>> STACK: Bio::Tools::Run::PiseApplication::submit >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >>> STACK: Bio::Tools::Run::PiseApplication::run >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >>> STACK: evaluate.pl:12 >>> >>> >>> how to make the program RNAfold run in perl... >>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >>> >>> plz reply soon >>> -- >>> View this message in context: http://www.nabble.com/run-RNAfold-in- >>> perl-tf4863835.html#a13918981 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13922740 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 10:38:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:38:40 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: Albert, I was getting a single AlignIO.t fail which appeared to be related to this: ... ok 122 - The object isa Bio::Align::AlignI ok 123 - consensus_string on metafasta not ok 124 - symbol_chars() using metafasta # Failed test 'symbol_chars() using metafasta' # in t/AlignIO.t at line 346. # got: '0' # expected: '23' It was b/c the symbol hash was initialized in the constructor (so it was present, just empty). I have changed that in CVS; all tests pass now. chris On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > cvs commited now. it is calculated anyway when calling symbol_chars > so... > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: >> Albert, >> >> Found it: >> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >> Bio/ >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >> >> If it slows performance that dramatically, maybe we can move this to >> a separate AlignUtils method instead. Maybe something to ask Jason >> about? >> >> chris >> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >> >> >>> Hi, >>> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>> only >>> used if one calls the symbol_chars method? >>> >>> When I comment out this line: >>> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>> $seq->seq; # line 257 >>> >>> I get a nice speed boost on loading alignments. >>> >>> Can I comment this line out in the CVS HEAD? >>> >>> Cheers, >>> >>> Albert. >>> >>> [init] 5.96046447753906e-06 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.0022270679473877 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 2.14348912239075 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 6.91910791397095 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 15.8402290344238 secs... >>> >>> avilella at magneto:~$ perl >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>> ancestral_alleles.pl >>> -dir /home/avilella/ensembl/exoseq/test -verbose >>> [init] 1.21593475341797e-05 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.00294303894042969 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 0.510555982589722 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 1.6192569732666 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 3.86473417282104 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000203717.chr1.fasta] >>> 6.99602198600769 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000196188.chr1.fasta] >>> 7.26704716682434 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000025800.chr1.fasta] >>> 8.44332504272461 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000117475.chr1.fasta] >>> 12.103296995163 secs... >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd.web at gmail.com Sun Nov 25 11:13:44 2007 From: bernd.web at gmail.com (Bernd Web) Date: Sun, 25 Nov 2007 17:13:44 +0100 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Hi, I am not sure if this is related, but I remember SimpleAlign was adapted to cope with more gap symbols that can occur in alignments/FastA sequences, as: . _ - = Previous versions would throw an error on 'illegal' gap characters, Regards, Bernd On Nov 25, 2007 4:38 PM, Chris Fields wrote: > Albert, > > I was getting a single AlignIO.t fail which appeared to be related to > this: > > ... > ok 122 - The object isa Bio::Align::AlignI > ok 123 - consensus_string on metafasta > > not ok 124 - symbol_chars() using metafasta > # Failed test 'symbol_chars() using metafasta' > # in t/AlignIO.t at line 346. > # got: '0' > # expected: '23' > > It was b/c the symbol hash was initialized in the constructor (so it > was present, just empty). I have changed that in CVS; all tests pass > now. > > chris > > > On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > > > cvs commited now. it is calculated anyway when calling symbol_chars > > so... > > > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: > >> Albert, > >> > >> Found it: > >> > >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > >> Bio/ > >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 > >> > >> If it slows performance that dramatically, maybe we can move this to > >> a separate AlignUtils method instead. Maybe something to ask Jason > >> about? > >> > >> chris > >> > >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > >> > >> > >>> Hi, > >>> > >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is > >>> only > >>> used if one calls the symbol_chars method? > >>> > >>> When I comment out this line: > >>> > >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > >>> $seq->seq; # line 257 > >>> > >>> I get a nice speed boost on loading alignments. > >>> > >>> Can I comment this line out in the CVS HEAD? > >>> > >>> Cheers, > >>> > >>> Albert. > >>> > >>> [init] 5.96046447753906e-06 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.0022270679473877 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 2.14348912239075 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 6.91910791397095 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 15.8402290344238 secs... > >>> > >>> avilella at magneto:~$ perl > >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > >>> ancestral_alleles.pl > >>> -dir /home/avilella/ensembl/exoseq/test -verbose > >>> [init] 1.21593475341797e-05 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.00294303894042969 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 0.510555982589722 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 1.6192569732666 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 3.86473417282104 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000203717.chr1.fasta] > >>> 6.99602198600769 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000196188.chr1.fasta] > >>> 7.26704716682434 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000025800.chr1.fasta] > >>> 8.44332504272461 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000117475.chr1.fasta] > >>> 12.103296995163 secs... > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sun Nov 25 11:39:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 10:39:01 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Message-ID: Bernd, That would be when generating Bio::LocatableSeq instances for building a Bio::SimpleAlign object. Judging by test suite results that doesn't appear to be affected. chris On Nov 25, 2007, at 10:13 AM, Bernd Web wrote: > Hi, > > I am not sure if this is related, but I remember SimpleAlign was > adapted to cope with more gap symbols that can occur in > alignments/FastA sequences, as: . _ - = > Previous versions would throw an error on 'illegal' gap characters, > > Regards, > Bernd > > On Nov 25, 2007 4:38 PM, Chris Fields wrote: >> Albert, >> >> I was getting a single AlignIO.t fail which appeared to be related to >> this: >> >> ... >> ok 122 - The object isa Bio::Align::AlignI >> ok 123 - consensus_string on metafasta >> >> not ok 124 - symbol_chars() using metafasta >> # Failed test 'symbol_chars() using metafasta' >> # in t/AlignIO.t at line 346. >> # got: '0' >> # expected: '23' >> >> It was b/c the symbol hash was initialized in the constructor (so it >> was present, just empty). I have changed that in CVS; all tests pass >> now. >> >> chris >> >> >> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: >> >>> cvs commited now. it is calculated anyway when calling symbol_chars >>> so... >>> >>> On Nov 23, 2007 12:49 AM, Chris Fields wrote: >>>> Albert, >>>> >>>> Found it: >>>> >>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >>>> Bio/ >>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >>>> >>>> If it slows performance that dramatically, maybe we can move >>>> this to >>>> a separate AlignUtils method instead. Maybe something to ask Jason >>>> about? >>>> >>>> chris >>>> >>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>>>> only >>>>> used if one calls the symbol_chars method? >>>>> >>>>> When I comment out this line: >>>>> >>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>>>> $seq->seq; # line 257 >>>>> >>>>> I get a nice speed boost on loading alignments. >>>>> >>>>> Can I comment this line out in the CVS HEAD? >>>>> >>>>> Cheers, >>>>> >>>>> Albert. >>>>> >>>>> [init] 5.96046447753906e-06 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.0022270679473877 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 2.14348912239075 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 6.91910791397095 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 15.8402290344238 secs... >>>>> >>>>> avilella at magneto:~$ perl >>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>>>> ancestral_alleles.pl >>>>> -dir /home/avilella/ensembl/exoseq/test -verbose >>>>> [init] 1.21593475341797e-05 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.00294303894042969 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 0.510555982589722 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 1.6192569732666 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 3.86473417282104 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000203717.chr1.fasta] >>>>> 6.99602198600769 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000196188.chr1.fasta] >>>>> 7.26704716682434 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000025800.chr1.fasta] >>>>> 8.44332504272461 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000117475.chr1.fasta] >>>>> 12.103296995163 secs... >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 13:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 12:51:42 -0600 Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu> I have been making some significant changes to Bio::SeqIO::staden::read over the last few months which incorporate code from Bugzilla (bugs 2074 and 2329, very kindly donated from Chris Bailey and Joel Martin, cheers!). Significant Changes: * All Inline code in staden::read are now XS-based * A new method has been added to Bio::SeqIO::staden::read for optionally getting trace data (i.e. for drawing graphs). The method ode is now implemented in Bio::SeqIO::abi, with example code in examples/quality/svgtrace.pl. These changes should allow newer versions of Staden io_lib as well (the code is tested with io_lib 1.9.2), though they haven't been tested extensively as I am having problems compiling newer io_lib versions on my MacBook. It's very likely more changes will need to be made along the way; some issues were found with XS compilation which appear harmless but need to be investigated, and trace data from other formats need to be evaluated. The possibility exists that many of these changes break backward compatibility with older bioperl releases, though tests passed with bioperl 1.5.2. Any feedback re: platform issues, test results using newer io_lib versions, older bioperl-versions, etc would be greatly appreciated. I'm hoping this will stimulate more interest in getting other bioperl- ext modules up-to-date with bioperl-live. chris From cjfields at uiuc.edu Mon Nov 26 13:59:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 12:59:23 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: Steve, Bernd, (and Jason, since you may have some input on this as well), I am now looking into the bug Bernd submitted and it seems there is a serious discrepancy with the way the hit raw_score, bits, and significance is determined for Hit objects. Unless I am mistaken these should always come from the best HSP when they are present, falling back to the hit table data only when no HSP alignments are present. Under the latter conditions a minimal Hit object is made from data in the hit table, which reports the rounded bit score, not the raw score, so in those cases the raw score would be undefined (and you probably should get a nasty warning indicating there are no HSPs present to get the data from). What is occurring now, though, is the raw_score and significance is explicitly set from the hit table in the BLAST parser for the Hit object at all times, while the bits are correctly derived from the best HSP (no fallback to the hit table). Changing to the behavior above results in several tests failing via SearchIO.t, with each failed test reporting the expected (read:correct) raw score. I'll look through the tests just in case, but I am planning on committing changes to the BLAST parsers, GenericHit, and SearchIO.t (to reflect the correct expected data) in the next day or two unless there are any objections. chris On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > On Nov 21, 2007 9:38 AM, Bernd Web wrote: >> [snip] >> >> Further is is possible to get the raw_score of a hit. $hit->raw_score >> actually gets the bitscore (w/o decimal point). > > Hmmm. raw_score should not be the same as bit score. So given an > example blast hit line such as: > > Score = 60.0 bits (30), Expect = 1e-06 > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > Could you submit a bug report for this? http://www.bioperl.org/ > wiki/Bugs > > Thanks, > Steve > >> >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: >>> Hi Russell, >>> >>> I came across your question. At first I thought all was well on my >>> system, but indeed I also have these colouring problems. >>> I noted that scrore in the bgcolor callback gets a different value! >>> Printing score during hit parsing($hit->raw_score) gives the same >>> score as -description >>> my $score = $feature->score; However, printing score in the bgcolor >>> sub gives 2573! >>> All scores in the bgcolor routine all different and higher than the >>> real scores. Were you able to solve this colouring issue? >>> >>> Regards, >>> Bernd >>> >>> >>>> Hi all, >>>> I'm using a modified version of Lincoln's tutorial >>>> (http://www.bioperl.org/wiki/ >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) >>>> and I'm colouring the HSPs by setting the -bgcolor by score with >>>> a sub >>>> to give a similar image to that from NCBI but for some reason, my >>>> colours are coming out wrong (see attached example) >>>> They seem to be off by one but I can't see why. >>>> >>>> Any ideas? >>>> >>>> I can't be certain but I think it's only started doing this >>>> since our >>>> BLAST upgrade to 2.2.17 a few weeks ago. >>>> >>>> Here's the colouring code: >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------- >>>> my $track = $panel->add_track( >>>> -glyph => 'segments', >>>> -label => 1, >>>> -connector => 'dashed', >>>> -bgcolor => sub { >>>> my $feature = shift; >>>> my $score = $feature->score; >>>> return 'red' if $score >= 200; >>>> return 'fuchsia' if $score >>>> >= 80; >>>> return 'lime' if $score >>>> >= 50; >>>> return 'blue' if $score >= 40; >>>> return 'black'; >>>> }, >>>> -font2color => 'gray', >>>> -sort_order => 'high_score', >>>> -description => sub { >>>> my $feature = shift; >>>> return unless >>>> $feature->has_tag('description'); >>>> my ($description) = >>>> $feature->each_tag_value('description'); >>>> my $score = $feature->score; >>>> "$description, score=$score"; >>>> }, >>>> ); >>>> ------------------------------------------------------------------- >>>> ----- >>>> --------- >>>> >>>> >>>> Thanx, >>>> >>>> Russell Smithies >>>> >>>> >>>> >>>> >>>> =================================================================== >>>> ==== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> =================================================================== >>>> ==== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Mon Nov 26 14:08:41 2007 From: arvindvanam at gmail.com (vanam) Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Message-ID: <13955209.post@talk.nabble.com> i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m unable to find a downloadable version.all ther is a web interface for it. can u tell frm wher to fdownload it???? or can you just tell me how to set the location in piseapplication to localhost n wat to enter in the $email variable???? -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Nov 26 15:08:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 14:08:24 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13955209.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> <13955209.post@talk.nabble.com> Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu> On Nov 26, 2007, at 1:08 PM, vanam wrote: > i searches for the embassy version of RNAFOLD (i guess its > vrnafold) but i m > unable to find a downloadable version.all ther is a web interface > for it. > can u tell frm wher to fdownload it???? You will need to install EMBOSS as well as the EMBASSY version of VIENNA (something which is documented in the docs that come along with the distributions and I will not go into detail on): ftp://emboss.open-bio.org/pub/EMBOSS/ This would be your best bet. Understand that there is no specific class framework for dealing with RNA secondary structure in BioPerl, so you will be on your own for now. My suggestion for using Pise had the very important caveats that (1) it very well may not work, (2) I have no experience with Pise, let alone setting it up locally, therefore (3) I haven't tested it (and don't intend to, as I don't have the time). > or can you just tell me how to set the location in piseapplication to > localhost n wat to enter in the $email variable???? I have pointed out documentation previously which comes with the modules in question. Remember perldoc is your friend; consulting it saves me (and everyone else) time. From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise': ---------------------------------------------- DESCRIPTION Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli- cation objects, that let you submit jobs on a Pise server. my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -email => 'me at myhome'); The email is optional (there is default one). It can be useful, though. Your program might enter infinite loops, or just run many jobs: the Pise server maintainer needs a contact (s/he could of course cancel any requests from your address...). And if you plan to run a lot of heavy jobs, or to do a course with many students, please ask the maintainer before. The location parameter stands for the actual CGI location, except when set at the factory creation step, where it is rather the root of all CGI. There are default values for most of Pise programs. You can either set location at: 1 factory creation: my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -location => 'http://somewhere/ Pise/cgi-bin', -email => 'me at myhome'); 2 program creation: my $program = $factory->program('water', -location => 'http://somewhere/Pise/ cgi-bin/water.pl' ); 3 any time before running: $program->location('http://somewhere/Pise/cgi-bin/water.pl'); $job = $program->run(); 4 when running: $job = $program->run(-location => 'http://somewhere/Pise/cgi- bin/water.pl'); You can also retrieve a previous job results by providing its url: $job = $factory->job($url); You get the url of a job by: $job->jobid; ---------------------------------------------- chris From sac at bioperl.org Mon Nov 26 20:41:59 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 17:41:59 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Chris, Cood catch. You're on track here with one exception: WU blast and NCBI blast behave differently in what they report in the hit table: WU blast puts the raw score in the table not the bit score as NCBI blast does (see example below for reference). WU blast also swaps their location in the HSP header relative to how NCBI reports it. It would be good to verify that the blast parser isn't befuddled by this. A quick look at SearchIO::blast and it appears that data from the hit table is always getting stored as score, not bits for WU blast. Not sure if the HSP section data are parsed correctly. I'd recommend looking into these things when you do your fixes. So in the end, WU blast HSPs that are built from the hit table should report a value for raw_score and punt on bits, but NCBI HSPs so constructed should do the opposite. The downside to this arrangement is that code that works for NCBI blast hits will need modification to work for WU blast hits, but that is just the nature of the data. It shouldn't be an issue for the majority of users that stick with one flavor of blast and don't switch back and forth, or for users that get their HSP scoring data from HSP sections rather than relying on the hit table. Ideally, the HSP object would know whether it was NCBI or WU-based and issue an informative warning when attempting to access data it doesn't have. One solution might be for the parser to put a 'WU-' in front of the algorithm name for WU blast reports, so it would then be available for the contained hit/hsp objects. This could break anything dependent on algorithm name, so it would need some testing. Steve Example WU blast table and HSP header: Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh... 4141 0.0 1 gb|AAC76922.1| (AE000468) aspartokinase II and homoserine... 844 3.1e-86 1 gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi... 483 2.8e-47 1 gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c... 97 0.0010 1 >gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I [Escherichia coli] Length = 820 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0 Identities = 820/820 (100%), Positives = 820/820 (100%) Example NCBI blast table and HSP header: Score E Sequences producing significant alignments: (bits) Value ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189... 115 8e-26 >ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397 transcript:ENST00000357569 Length = 425 Score = 120 bits (301), Expect = 3e-27 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%) On Nov 26, 2007 10:59 AM, Chris Fields wrote: > Steve, Bernd, (and Jason, since you may have some input on this as > well), > > I am now looking into the bug Bernd submitted and it seems there is a > serious discrepancy with the way the hit raw_score, bits, and > significance is determined for Hit objects. Unless I am mistaken > these should always come from the best HSP when they are present, > falling back to the hit table data only when no HSP alignments are > present. Under the latter conditions a minimal Hit object is made > from data in the hit table, which reports the rounded bit score, not > the raw score, so in those cases the raw score would be undefined > (and you probably should get a nasty warning indicating there are no > HSPs present to get the data from). > > What is occurring now, though, is the raw_score and significance is > explicitly set from the hit table in the BLAST parser for the Hit > object at all times, while the bits are correctly derived from the > best HSP (no fallback to the hit table). Changing to the behavior > above results in several tests failing via SearchIO.t, with each > failed test reporting the expected (read:correct) raw score. > > I'll look through the tests just in case, but I am planning on > committing changes to the BLAST parsers, GenericHit, and SearchIO.t > (to reflect the correct expected data) in the next day or two unless > there are any objections. > > chris > > > On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > > > On Nov 21, 2007 9:38 AM, Bernd Web wrote: > >> [snip] > >> > >> Further is is possible to get the raw_score of a hit. $hit->raw_score > >> actually gets the bitscore (w/o decimal point). > > > > Hmmm. raw_score should not be the same as bit score. So given an > > example blast hit line such as: > > > > Score = 60.0 bits (30), Expect = 1e-06 > > > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > > > Could you submit a bug report for this? http://www.bioperl.org/ > > wiki/Bugs > > > > Thanks, > > Steve > > > >> > >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: > >>> Hi Russell, > >>> > >>> I came across your question. At first I thought all was well on my > >>> system, but indeed I also have these colouring problems. > >>> I noted that scrore in the bgcolor callback gets a different value! > >>> Printing score during hit parsing($hit->raw_score) gives the same > >>> score as -description > >>> my $score = $feature->score; However, printing score in the bgcolor > >>> sub gives 2573! > >>> All scores in the bgcolor routine all different and higher than the > >>> real scores. Were you able to solve this colouring issue? > >>> > >>> Regards, > >>> Bernd > >>> > >>> > >>>> Hi all, > >>>> I'm using a modified version of Lincoln's tutorial > >>>> (http://www.bioperl.org/wiki/ > >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) > >>>> and I'm colouring the HSPs by setting the -bgcolor by score with > >>>> a sub > >>>> to give a similar image to that from NCBI but for some reason, my > >>>> colours are coming out wrong (see attached example) > >>>> They seem to be off by one but I can't see why. > >>>> > >>>> Any ideas? > >>>> > >>>> I can't be certain but I think it's only started doing this > >>>> since our > >>>> BLAST upgrade to 2.2.17 a few weeks ago. > >>>> > >>>> Here's the colouring code: > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> ------- > >>>> my $track = $panel->add_track( > >>>> -glyph => 'segments', > >>>> -label => 1, > >>>> -connector => 'dashed', > >>>> -bgcolor => sub { > >>>> my $feature = shift; > >>>> my $score = $feature->score; > >>>> return 'red' if $score >= 200; > >>>> return 'fuchsia' if $score > >>>> >= 80; > >>>> return 'lime' if $score > >>>> >= 50; > >>>> return 'blue' if $score >= 40; > >>>> return 'black'; > >>>> }, > >>>> -font2color => 'gray', > >>>> -sort_order => 'high_score', > >>>> -description => sub { > >>>> my $feature = shift; > >>>> return unless > >>>> $feature->has_tag('description'); > >>>> my ($description) = > >>>> $feature->each_tag_value('description'); > >>>> my $score = $feature->score; > >>>> "$description, score=$score"; > >>>> }, > >>>> ); > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> --------- > >>>> > >>>> > >>>> Thanx, > >>>> > >>>> Russell Smithies > >>>> > >>>> > >>>> > >>>> > >>>> =================================================================== > >>>> ==== > >>>> Attention: The information contained in this message and/or > >>>> attachments > >>>> from AgResearch Limited is intended only for the persons or > >>>> entities > >>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>> material. Any review, retransmission, dissemination or other use > >>>> of, or > >>>> taking of any action in reliance upon, this information by > >>>> persons or > >>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>> Limited. If you have received this message in error, please > >>>> notify the > >>>> sender immediately. > >>>> =================================================================== > >>>> ==== > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From sac at bioperl.org Mon Nov 26 22:27:09 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 19:27:09 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com> Hi Jon, I'd recommend downloading it into a separate location of your choosing (~/lib/bioperl-ext for example) and running the installer as specified in the docs that come with the download. Then you can include the location you installed it into via a "use lib '~/lib/bioperl-ext'" statement at the top of your script. It may be handy to install it as a non-root user so that you don't alter the main perl installation. This way your ext install will stay separate from your main bioperl and perl installations. There are some docs about the ext packages you might want to check out at http://www.bioperl.org/wiki/Ext_package. Steve On Nov 21, 2007 4:35 PM, Jonathan Binkley wrote: > Hi, > > I installed bioperl on a Mac (OS 10.4, Intel) via fink, > which put it here: > > /sw/lib/perl5/5.8.6/Bio/ > > It seems to work fine, but I need bioperl-ext for > Smith-Waterman alignments. > > So, into which directory should I download bioperl-ext and > run the Makefile? > > Thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From a_arya2000 at yahoo.com Tue Nov 27 09:51:41 2007 From: a_arya2000 at yahoo.com (a_arya2000) Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST) Subject: [Bioperl-l] Bioperl-ext test fails Message-ID: <615478.1036.qm@web60113.mail.yahoo.com> Hello, I downloaded latest bioperl-ext from bioperl website, and I have io_lib v1.8.11 installed, and I was trying to install Bio::SeqIO::staden::read (of bioperl-ext). It compiled fine without any error but when I run make test I got following output. ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/staden_read....ok 3/94# Test 7 got: "0" (t/staden_read.t at line 110 *TODO*) # Expected: (We don't have the ability to write files for abi format) # t/staden_read.t line 110 is: ok(0, undef, "We don't have the ability to write files for $format format") for 1..7; # Test 8 got: "0" (t/staden_read.t at line 110 fail #2 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 9 got: "0" (t/staden_read.t at line 110 fail #3 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 10 got: "0" (t/staden_read.t at line 110 fail #4 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 11 got: "0" (t/staden_read.t at line 110 fail #5 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 12 got: "0" (t/staden_read.t at line 110 fail #6 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 13 got: "0" (t/staden_read.t at line 110 fail #7 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 14 got: "0" (t/staden_read.t at line 62 *TODO*) # Expected: (Still missing test files for alf format) # t/staden_read.t line 62 is: ok(0, undef, "Still missing test files for $format format") for (1..$formatlooptests); # Test 15 got: "0" (t/staden_read.t at line 62 fail #2 *TODO*) # Expected: (Still missing test files for alf format) # Test 16 got: "0" (t/staden_read.t at line 62 fail #3 *TODO*) # Expected: (Still missing test files for alf format) # Test 17 got: "0" (t/staden_read.t at line 62 fail #4 *TODO*) # Expected: (Still missing test files for alf format) # Test 18 got: "0" (t/staden_read.t at line 62 fail #5 *TODO*) # Expected: (Still missing test files for alf format) # Test 19 got: "0" (t/staden_read.t at line 62 fail #6 *TODO*) # Expected: (Still missing test files for alf format) # Test 20 got: "0" (t/staden_read.t at line 62 fail #7 *TODO*) # Expected: (Still missing test files for alf format) # Test 21 got: "0" (t/staden_read.t at line 62 fail #8 *TODO*) # Expected: (Still missing test files for alf format) # Test 22 got: "0" (t/staden_read.t at line 62 fail #9 *TODO*) # Expected: (Still missing test files for alf format) # Test 23 got: "0" (t/staden_read.t at line 62 fail #10 *TODO*) # Expected: (Still missing test files for alf format) # Test 24 got: "0" (t/staden_read.t at line 62 fail #11 *TODO*) # Expected: (Still missing test files for alf format) # Test 25 got: "0" (t/staden_read.t at line 62 fail #12 *TODO*) # Expected: (Still missing test files for alf format) # Test 31 got: "0" (t/staden_read.t at line 107 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # t/staden_read.t line 107 is: ok(0, undef, "Can't write valid ctf files until we have a trace object") for 1..7; # Test 32 got: "0" (t/staden_read.t at line 107 fail #2 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 33 got: "0" (t/staden_read.t at line 107 fail #3 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 34 got: "0" (t/staden_read.t at line 107 fail #4 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 35 got: "0" (t/staden_read.t at line 107 fail #5 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 36 got: "0" (t/staden_read.t at line 107 fail #6 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 37 got: "0" (t/staden_read.t at line 107 fail #7 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + 0.15 csys = 1.71 CPU) Anyone has any idea what might be going wrong here? By the way, my OS is Linux. Thank you very much. Arya ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From bix at sendu.me.uk Tue Nov 27 10:41:38 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 15:41:38 +0000 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com> References: <615478.1036.qm@web60113.mail.yahoo.com> Message-ID: <474C3AB2.5050208@sendu.me.uk> a_arya2000 wrote: > Hello, > I downloaded latest bioperl-ext from bioperl website, > and I have io_lib v1.8.11 installed, and I was trying > to install Bio::SeqIO::staden::read (of bioperl-ext). > It compiled fine without any error but when I run make > test I got following output. [...] > All tests successful. > Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + > 0.15 csys = 1.71 CPU) > > > Anyone has any idea what might be going wrong here? By > the way, my OS is Linux. Thank you very much. Not being familiar with the test script or ext, I can at least say that nothing actually went wrong: 'All tests successful'. Apparently there are some things in the test script that are known by the author to not work quite right, so he marked them as 'todo'. The problems seem harmless in any case, with things returning 0 instead of undef. So, unless you've reason to believe there is something significant going on, all is well. From alison.waller at utoronto.ca Mon Nov 26 16:06:35 2007 From: alison.waller at utoronto.ca (alison waller) Date: Mon, 26 Nov 2007 16:06:35 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results Message-ID: <005a01c83070$3a814580$d81efea9@AWALL> Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From bix at sendu.me.uk Tue Nov 27 12:01:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 17:01:36 +0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <474C4D70.2010206@sendu.me.uk> alison waller wrote: > I am trying to write a script that will parse large blast files (usually > blastx) I also want to be able to specify how many hits I want to report > information on. > > Most of the time I will only want information on the top hit, but I want to > have the flexibility to obtain information on say the top5. I am pretty > sure I have done this wrong, any advice on how to correct my script to do > this, would be great. [snip] > if ($top_hit=$result->next_hit) # this might be wrong - I want to > specify how many hits to print results for I didn't really pay attention to the rest of your code, but assuming it all works except for only ever giving you info for the top hit, you just need to change this 'if' to a loop of some kind. # ... my $hits = 0; while (my $hit = $result->next_hit) { $hits++; last if $hits > $tophit; # ... } From David.Messina at sbc.su.se Tue Nov 27 12:55:44 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 27 Nov 2007 18:55:44 +0100 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <474C4D70.2010206@sendu.me.uk> References: <005a01c83070$3a814580$d81efea9@AWALL> <474C4D70.2010206@sendu.me.uk> Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Hi Alison, As Sendu mentioned, the key bit is adding a condition to the hit loop to limit the number of hits that are printed. I didn't test the below extensively, but give it a try... Dave #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while ( my $result = $report->next_result ) { my $i = 0; while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { while ( my $hsp = $hit->next_hsp ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } From Russell.Smithies at agresearch.co.nz Tue Nov 27 14:31:29 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 28 Nov 2007 08:31:29 +1300 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: Do the hits need to be sorted first or is this done automagicly? I ask this as I know Megablast doesn't provide sorted output for most of it's formats. Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Dave Messina > Sent: Wednesday, 28 November 2007 6:56 a.m. > To: alison waller > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > Hi Alison, > As Sendu mentioned, the key bit is adding a condition to the hit loop to > limit the number of hits that are printed. I didn't test the below > extensively, but give it a try... > > > Dave > > > > #!/usr/local/bin/perl -w > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > use strict; > use warnings; > use Bio::SearchIO; > > my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; > if (@ARGV != 2) { die $usage; } > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > print OUT > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t > Qstrand\tHstrand\n"; > > # Go through BLAST reports one by one > while ( my $result = $report->next_result ) { > my $i = 0; > while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { > while ( my $hsp = $hit->next_hsp ) { > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > > if ($i == 0) { print OUT "no hits\n"; } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Tue Nov 27 16:09:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:09:43 -0600 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <474C3AB2.5050208@sendu.me.uk> References: <615478.1036.qm@web60113.mail.yahoo.com> <474C3AB2.5050208@sendu.me.uk> Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu> You can always test it within the bioperl suite after it's installed; several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read. In general though if it's passing tests it should be fine. chris On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote: > a_arya2000 wrote: >> Hello, >> I downloaded latest bioperl-ext from bioperl website, >> and I have io_lib v1.8.11 installed, and I was trying >> to install Bio::SeqIO::staden::read (of bioperl-ext). >> It compiled fine without any error but when I run make >> test I got following output. > [...] >> All tests successful. >> Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + >> 0.15 csys = 1.71 CPU) >> >> >> Anyone has any idea what might be going wrong here? By >> the way, my OS is Linux. Thank you very much. > > Not being familiar with the test script or ext, I can at least say > that > nothing actually went wrong: 'All tests successful'. Apparently there > are some things in the test script that are known by the author to not > work quite right, so he marked them as 'todo'. The problems seem > harmless in any case, with things returning 0 instead of undef. > > So, unless you've reason to believe there is something significant > going > on, all is well. From cjfields at uiuc.edu Tue Nov 27 16:00:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:00:33 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Tue Nov 27 20:06:30 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT) Subject: [Bioperl-l] Bio::Tools::Run::Primer3 Message-ID: Hello, I was playing around with Primer3, and I hit a problem. Not sure if it's a bug or if I was doing something I wasn't supposed to, but if it's the latter, I thought it might save someone else half an hour of banging their head of a keyboard if I mentioned it: What I was doing was roughly: # create a primer3 obj my $p3 = ...Primer3->new(); # loop through some sequences generating primers for # each of them using the same primer3 obj while (@some_bio_seqs){ my $res = $p3->run; ... } This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC, at which point it worked for a few sequences then I got a "can't place primer on sequence" error. After a bit of faffing about, I think the problem occurs when no primers are found. In which case $p3 still has the primers from the previous run, which don't come from the current sequence, so can't be placed on it. I tried calling $p3->cleanup in the loop, but that didn't work either. Creating a new $p3 every time works fine. Are you supposed to create a new Primer3 object for every sequence? (Apologies if I missed the relevant bit of the docs). Cheers, Cass xx From alison.waller at utoronto.ca Tue Nov 27 16:32:07 2007 From: alison.waller at utoronto.ca (alison waller) Date: Tue, 27 Nov 2007 16:32:07 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Thanks Everyone, Your edits worked Dave, however after looking at the output I realized that I only want information on the top hsp per query returned. For example some of the querys the top hit has two hsps so it returned both. I tried to further edit it, but after 3 attempts they are all failing, I think due to me using the loops wrong. I also have another problem, I also want to retrieve the gi, this doesn't seem to be straight forward as it should. I found another method _get_seq_identifiers, but this looks awkward, isn't there and object for the gi? I've pasted my non-working script below if there are any suggestions on how to get it to print out just the first hsp per hit, that would be great. Thanks, #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t strand\tHstrand\n"; # Go through BLAST reports one by one while (my $result = $report->next_result) { my $i=0; while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, November 27, 2007 4:01 PM To: Smithies, Russell Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dennis.prickett at bbsrc.ac.uk Wed Nov 28 05:18:26 2007 From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C)) Date: Wed, 28 Nov 2007 10:18:26 -0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk> Dear Alison Or, if you are absolutely only interested in the top hit you could limit it to that in the blast command by adding the parameters " -b 1 ". This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps, etc). Your blasts run faster and then you won't have to worry about how to parse out the top blast hit(s). However, if there are any caveats for using this parameter that I am not aware of please let us know. Dennis Prickett Institute of Animal Health Compton, nr Newbury RG2 9FS United Kingdom -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller Sent: 26 November 2007 21:07 To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] help using SEARCH IO to parse blast results Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From t.nugent at cs.ucl.ac.uk Wed Nov 28 08:10:41 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Wed, 28 Nov 2007 13:10:41 +0000 Subject: [Bioperl-l] Helical Wheel module Message-ID: <474D68D1.3080602@cs.ucl.ac.uk> Hi everyone, I've been drawing a lot of helical wheels recently so put all my code into a module. I don't think there's anything in bioperl to do this yet though there are a few programs written in perl and flash on the web to do the same thing. I was thinking this could fit into biographics. Has lots of options to adjust labels, colours, ttf fonts and can output to png & svg. Tim ... Here's the output, converted to jpg from svg: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg Module: http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz Works like this: use DrawHelicalWheel; my $im = DrawHelicalWheel->new(-title=>$title, -sequence=>$sequence, -helices=>\@helices, -ttf_font=>$font); open(OUTPUT, ">$svg"); binmode OUTPUT; print OUTPUT $im->svg; close OUTPUT; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From tristan.lefebure at gmail.com Wed Nov 28 10:46:11 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:46:11 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281046.11146.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From bix at sendu.me.uk Wed Nov 28 11:19:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Nov 2007 16:19:36 +0000 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <200711281046.11146.tnl7@cornell.edu> References: <200711281046.11146.tnl7@cornell.edu> Message-ID: <474D9518.7010201@sendu.me.uk> Tristan Lefebure wrote: > Hello! > > I was wondering if there was a function to remove sites/columns of an > alignment. Something like: $aln->remove_sites(@sites_to_remove) > I looked around Bio::SimpleAlign but did not find exactly that. There is > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. You might want to take a second look at the docs. You can supply column number ranges to remove_columns(), so it does exactly what you want. From tnl7 at cornell.edu Wed Nov 28 10:44:17 2007 From: tnl7 at cornell.edu (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:44:17 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281044.17770.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From cjfields at uiuc.edu Wed Nov 28 08:57:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:57:27 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Message-ID: I had some code which does this which I committed yesterday to CVS; it catches the GI for the query and the hits: $result->query_gi; $hit->ncbi_gi; I am in the midst of fixing additional problems with WU-BLAST parsing but you are more than welcome to try it. chris On Nov 27, 2007, at 3:32 PM, alison waller wrote: > Thanks Everyone, > > Your edits worked Dave, however after looking at the output I > realized that > I only want information on the top hsp per query returned. For > example some > of the querys the top hit has two hsps so it returned both. > > I tried to further edit it, but after 3 attempts they are all > failing, I > think due to me using the loops wrong. > > I also have another problem, I also want to retrieve the gi, this > doesn't > seem to be straight forward as it should. I found another method > _get_seq_identifiers, but this looks awkward, isn't there and object > for the > gi? > > I've pasted my non-working script below if there are any suggestions > on how > to get it to print out just the first hsp per hit, that would be > great. > > Thanks, > > > #!/usr/local/bin/perl -w > > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > > use strict; > use warnings; > use Bio::SearchIO; > > > my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; > if (@ARGV != 2) { die $usage; } > > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! > \n"; > > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > > print OUT > > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tgaps\t > strand\tHstrand\n"; > > > # Go through BLAST reports one by one > while (my $result = $report->next_result) { > my $i=0; > while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ > while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { > > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > if ($i == 0) { print OUT "no hits\n"; } > > } > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 27, 2007 4:01 PM > To: Smithies, Russell > Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > The hits/HSPs are generally in the order they appear in the report. > > If you are looking for best/worst HSP after parsing you can use the > $hit->hsp() method: > > # best and worst > my $best = $hit->hsp('best'); # also 'first' > my $worst = $hit->hsp('worst'); # also last > > The SearchIO text BLAST parser also has several options implemented > for finer control: > > -inclusion_threshold => e-value threshold for inclusion in the > PSI-BLAST score matrix model (blastpgp) > -signif => float or scientific notation number to be used > as a P- or Expect value cutoff > -score => integer or scientific notation number to be used > as a blast score value cutoff > -bits => integer or scientific notation number to be used > as a bit score value cutoff > -hit_filter => reference to a function to be used for > filtering hits based on arbitrary criteria. > All hits of each BLAST report must satisfy > this criteria to be retained. > If a hit fails this test, it is ignored. > This function should take a > Bio::Search::Hit::BlastHit.pm object as its first > argument and return true > if the hit should be retained. > Sample filter function: > -hit_filter => sub { $hit = shift; > $hit->gaps == 0; }, > (Note: -filt_func is synonymous with -hit_filter) > -overlap => integer. The amount of overlap to permit between > adjacent HSPs when tiling HSPs. A reasonable > value is 2. > Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. > > chris > > On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > >> Do the hits need to be sorted first or is this done automagicly? >> I ask this as I know Megablast doesn't provide sorted output for >> most of >> it's formats. >> >> Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open- >>> bio.org] On Behalf Of Dave Messina >>> Sent: Wednesday, 28 November 2007 6:56 a.m. >>> To: alison waller >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >>> >>> Hi Alison, >>> As Sendu mentioned, the key bit is adding a condition to the hit >>> loop >> to >>> limit the number of hits that are printed. I didn't test the below >>> extensively, but give it a try... >>> >>> >>> Dave >>> >>> >>> >>> #!/usr/local/bin/perl -w >>> >>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >>> # alison waller November 2007 >>> >>> use strict; >>> use warnings; >>> use Bio::SearchIO; >>> >>> my $usage = "to run type: blast_parse_aw.pl <# of >> hits>\n"; >>> if (@ARGV != 2) { die $usage; } >>> >>> my $infile = $ARGV[0]; >>> my $outfile = $infile . '.parsed'; >>> my $tophit = $ARGV[1]; # to specify in the command line how many >>> hits >>> # to report for each query >>> >>> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >>> \n"; >>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! >> $!\n"; >>> >>> my $report = new Bio::SearchIO( >>> -file => "$infile", >>> -format => "blast" >>> ); >>> >>> print OUT >>> >> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent >> \tga >> ps\t >>> Qstrand\tHstrand\n"; >>> >>> # Go through BLAST reports one by one >>> while ( my $result = $report->next_result ) { >>> my $i = 0; >>> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >>> while ( my $hsp = $hit->next_hsp ) { >>> >>> # Print some tab-delimited data about this hit >>> print OUT $result->query_name, "\t"; >>> print OUT $hit->name, "\t"; >>> print OUT $hit->significance, "\t"; >>> print OUT $hit->bits, "\t"; >>> print OUT $hsp->evalue, "\t"; >>> print OUT $hsp->percent_identity, "\t"; >>> print OUT $hsp->length('total'), "\t"; >>> print OUT $hsp->num_identical, "\t"; >>> print OUT $hsp->gaps, "\t"; >>> print OUT $hsp->strand('query'), "\t"; >>> print OUT $hsp->strand('hit'), "\n"; >>> } >>> } >>> >>> if ($i == 0) { print OUT "no hits\n"; } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use of, >> or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 08:54:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:54:39 -0600 Subject: [Bioperl-l] Helical Wheel module In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk> References: <474D68D1.3080602@cs.ucl.ac.uk> Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu> Looks good! We recently added in your transmembrane module, so we could definitely add this in. chris On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote: > Hi everyone, > > I've been drawing a lot of helical wheels recently so put all my code > into a module. I don't think there's anything in bioperl to do this > yet > though there are a few programs written in perl and flash on the web > to > do the same thing. I was thinking this could fit into biographics. Has > lots of options to adjust labels, colours, ttf fonts and can output to > png & svg. > > Tim > > ... > > Here's the output, converted to jpg from svg: > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg > > Module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz > > Works like this: > > use DrawHelicalWheel; > > my $im = DrawHelicalWheel->new(-title=>$title, > -sequence=>$sequence, > -helices=>\@helices, > -ttf_font=>$font); > open(OUTPUT, ">$svg"); > binmode OUTPUT; > print OUTPUT $im->svg; > close OUTPUT; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > http://www.cs.ucl.ac.uk/staff/T.Nugent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 13:43:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 12:43:58 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu> On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote: > Chris, > > Cood catch. You're on track here with one exception: WU blast and NCBI > blast behave differently in what they report in the hit table: WU > blast puts the raw score in the table not the bit score as NCBI blast > does (see example below for reference). WU blast also swaps their > location in the HSP header relative to how NCBI reports it. It would > be good to verify that the blast parser isn't befuddled by this. A > quick look at SearchIO::blast and it appears that data from the hit > table is always getting stored as score, not bits for WU blast. Not > sure if the HSP section data are parsed correctly. I'd recommend > looking into these things when you do your fixes. What I have now after commits is: GenericHit - use the best HSP when possible for bits, score/raw_score, significance. When there is no HSP, construct a minimal Hit object using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST maps to bits(), both map evalue/pvalue to significance). HSP mapping seems to be correct. One issue that has popped up is GenericHit::significance preferentially uses the best HSP. However, GenericHSP::significance uses evalues preferentially over pvalues; both Expect and P appear to be parsed for WU-BLAST HSPs now (so the evalue is reported); this apparently wasn't always the case if I read the GenericHit docs correctly. As NCBI BLAST doesn't report pvalues we could change that so it preferentially returns a pvalue if present, falling back to an evalue. This would match what is found hit table more closely and resembles what is documented for the method (for significance(), WU- BLAST gets pvalues, NCBI BLAST gets evalues). > So in the end, WU blast HSPs that are built from the hit table should > report a value for raw_score and punt on bits, but NCBI HSPs so > constructed should do the opposite. The downside to this arrangement > is that code that works for NCBI blast hits will need modification to > work for WU blast hits, but that is just the nature of the data. It > shouldn't be an issue for the majority of users that stick with one > flavor of blast and don't switch back and forth, or for users that get > their HSP scoring data from HSP sections rather than relying on the > hit table. In general I get my data from the HSPs, so this shouldn't be a significant issue (bad pun). I did find that changing it so that Hit objects use HSP data pointed out issues with test data; hit table raw/ bit scores were rounded from the HSP score data or vice versa since all data came from the hit table, so tests flunked. I think changing the way minimal hit objects report data (particularly for NCBI BLAST) will lead to a lot of confusion unless we clarify warnings when one or the other is missing (as you also indicated). I'm working on that now. > Ideally, the HSP object would know whether it was NCBI or WU-based and > issue an informative warning when attempting to access data it doesn't > have. One solution might be for the parser to put a 'WU-' in front of > the algorithm name for WU blast reports, so it would then be available > for the contained hit/hsp objects. This could break anything dependent > on algorithm name, so it would need some testing. > > Steve I can probably work around as noted above that unless you think it's warranted to add a 'WU' designation (the version info in the Result object has 'WashU' attached, so one could feasibly use that for distinguishing the two report types). Anyway, I'm committing my first batch of fixes, the significance test will fail for at least a day until I can look into it more. chris From tristan.lefebure at gmail.com Wed Nov 28 14:03:44 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 14:03:44 -0500 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <474D9518.7010201@sendu.me.uk> References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Hoops. I was reading the BioPerl 1.4 documentation. Actually, http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be the 1.4documentation... Thank you, it works great. On Nov 28, 2007 11:19 AM, Sendu Bala wrote: > Tristan Lefebure wrote: > > Hello! > > > > I was wondering if there was a function to remove sites/columns of an > > alignment. Something like: $aln->remove_sites(@sites_to_remove) > > I looked around Bio::SimpleAlign but did not find exactly that. There is > > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' > criteria. > > You might want to take a second look at the docs. You can supply column > number ranges to remove_columns(), so it does exactly what you want. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Nov 28 16:57:14 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 29 Nov 2007 10:57:14 +1300 Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk> Message-ID: Has anyone got a good example of parsing ASN.1 with Bio::SeqIO::entrezgene? I'm trying to get GO ids and KEGG terms out but it's quite deeply nested and my Perl isn't that good :-( Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From stefan.kirov at bms.com Wed Nov 28 17:16:18 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time) Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Here is an example for GO, will send the one for KEGG later: my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', -service_record=>'yes');#, -locuslink=>'convert'); while (my $seq=$eio->next_seq) { my $gid=$seq->accession_number; foreach my $ot ($ann->get_Annotations('OntologyTerm')) { next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers my $evid=$ot->comment; $evid=~s/evidence: //i; my @ref=$ot->term->get_references; #Really there should be just one? my $id=$ot->identifier; my $fid='GO:' . sprintf("%07u",$id); print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n"; } } Please note there is a bug in the parser that makes it suck a lot of RAM. I am fixing this and will commit probably by the week's end- you will have to update at that point. If you work with few records this should not matter. Stefan On Thu, 29 Nov 2007, Smithies, Russell wrote: > Has anyone got a good example of parsing ASN.1 with > Bio::SeqIO::entrezgene? > I'm trying to get GO ids and KEGG terms out but it's quite deeply nested > and my Perl isn't that good :-( > > Russell > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Nov 29 18:06:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 17:06:42 -0600 Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu> For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST parsing in Bio::SearchIO::blastxml (though it appears to be pretty stable!). Since there isn't any easy way to distinguish between normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to BLAST, you have to indicate how the report is to be parsed by passing in a '-blasttype' parameter: $searchio = Bio::SearchIO->new('-tempfile' => 1, '-format' => 'blastxml', '-file' => 'psiblast.xml', '-blasttype' => 'psiblast'); Otherwise it chunks the individual iterations out as separate BLAST reports and parses them as separate reports. Tests have also been added to SearchIO.t. I will update the HOWTO and blastxml docs soon. chris From cjfields at uiuc.edu Thu Nov 29 21:41:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 20:41:49 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Primer3 In-Reply-To: References: Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu> It's probably safer to create a new instance each time but it really shouldn't be necessary for a wrapper module; this sounds like a bug to me. Could you file it in Bugzilla? On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote: > Hello, > > I was playing around with Primer3, and I hit a problem. Not sure if > it's a > bug or if I was doing something I wasn't supposed to, but if it's the > latter, I thought it might save someone else half an hour of banging > their > head of a keyboard if I mentioned it: > > What I was doing was roughly: > > # create a primer3 obj > my $p3 = ...Primer3->new(); > > # loop through some sequences generating primers for > # each of them using the same primer3 obj > while (@some_bio_seqs){ > my $res = $p3->run; > ... > } > > This worked fine for a while, but broke when I tried to set > PRIMER_MIN_GC, > at which point it worked for a few sequences then I got a "can't place > primer on sequence" error. > > After a bit of faffing about, I think the problem occurs when no > primers > are found. In which case $p3 still has the primers from the previous > run, > which don't come from the current sequence, so can't be placed on > it. I > tried calling $p3->cleanup in the loop, but that didn't work either. > Creating a new $p3 every time works fine. > > Are you supposed to create a new Primer3 object for every sequence? > (Apologies if I missed the relevant bit of the docs). > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paulhengen at coh.org Wed Nov 28 20:20:42 2007 From: paulhengen at coh.org (Paul N. Hengen) Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST) Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs Message-ID: <14017289.post@talk.nabble.com> Hi. I have a number of gene IDs from Entrez and I want to find the start and end locations in the human genome. This seemed simple enough, so I started working through some of the examples for using the EntrezGene module at www.bioperl.org Of course this did not work because the core installation does not include this module. So, I think I have two choices (1) install the module (how?), or (2) find an easier way to get the locations in the human genome. I want to use the locations to grab sequences out of the genome. Can anyone offer advice on this? Thanks. -Paul. -- Paul N. Hengen, Ph.D. Hematopoietic Stem Cell and Leukemia Research City of Hope National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 USA mailto:paulhengen at coh.org -- View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From Viktor.Martyanov at Dartmouth.EDU Thu Nov 29 15:20:19 2007 From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov) Date: 29 Nov 2007 15:20:19 -0500 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases Message-ID: <193573097@newdonner.Dartmouth.EDU> A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 445 bytes Desc: not available URL: From alison.waller at utoronto.ca Thu Nov 29 11:20:59 2007 From: alison.waller at utoronto.ca (alison waller) Date: Thu, 29 Nov 2007 11:20:59 -0500 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL> Hi all, I would like to install the CVS version of bioperl as I know of some code changes that will be useful to me. However, I am having problems installing it. I am trying to install bioperl in my home directly on a linux cluster. I used > cd bioperl-live * perl Build.PL -install /home/awaller However after the build command I got a lot of errors. Do I have to also have perl installed in my home directory?? There is perl installed on the cluster in /usr/bin. Do I need to point to this or does Build.PL automatically look there? I noticed a few errors about not having permission and a few about not being able to connect. I've copied a portion of the messages after my Build.pl command. Any help would be appreciated, alison Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/02packages.details.txt.gz Trying to get away with old file: 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 /root/.cpan/sources/modules/02packages.details.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Database was generated on Sat, 10 Nov 2007 22:36:34 GMT There's a new CPAN.pm version (v1.9204) available! [Current version is v1.7601] You might want to try install Bundle::CPAN reload cpan without quitting the current session. It should be a seamless upgrade while we are running... Warning: You are not allowed to write into directory "/root/.cpan/sources/modules". I'll continue, but if you encounter problems, they may be due to insufficient permissions. Fetching with LWP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied] Fetching with Net::FTP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from ftp.nrc.ca Fetching with LWP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[FTP close response: 500 Network seems to have barfed - Let's all phone our ISP and go postal! Unknown command. ] Fetching with Net::FTP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca Fetching with LWP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'cpan.mirror.cygnal.ca'] Fetching with Net::FTP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Fetching with LWP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'mirror.isurf.ca'] Fetching with Net::FTP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Trying with "/usr/bin/lynx -source" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: cpan.mirror.cygnal.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/03modlist.data.gz Trying to get away with old file: 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 /root/.cpan/sources/modules/03modlist.data.gz Going to read /root/.cpan/sources/modules/03modlist.data.gz Going to write /root/.cpan/Metadata can't create /root/.cpan/Metadata: Permission denied at /usr/share/perl/5.8/CPAN.pm line 3432 Running install for module Test::Harness Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2342 ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From cjfields at uiuc.edu Thu Nov 29 23:53:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:53:09 -0600 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: Alison, There are directions on how to do this here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA (TinyURL link) http://tinyurl.com/3263dd Note the additional configuration for CPAN in that section; you'll need to set up CPAN so it installs everything locally. chris On Nov 29, 2007, at 10:20 AM, alison waller wrote: > Hi all, > > > > I would like to install the CVS version of bioperl as I know of > some code > changes that will be useful to me. However, I am having problems > installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. > > > > I used > > > >> cd bioperl-live > > * perl Build.PL -install /home/awaller > > > > However after the build command I got a lot of errors. Do I have to > also > have perl installed in my home directory?? There is perl installed > on the > cluster in /usr/bin. Do I need to point to this or does Build.PL > automatically look there? I noticed a few errors about not having > permission and a few about not being able to connect. I've copied a > portion > of the messages after my Build.pl command. > > > > Any help would be appreciated, > > > > alison > > > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/02packages.details.txt.gz > > Trying to get away with old file: > > 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 > /root/.cpan/sources/modules/02packages.details.txt.gz > > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > > Database was generated on Sat, 10 Nov 2007 22:36:34 GMT > > > > There's a new CPAN.pm version (v1.9204) available! > > [Current version is v1.7601] > > You might want to try > > install Bundle::CPAN > > reload cpan > > without quitting the current session. It should be a seamless upgrade > > while we are running... > > > > Warning: You are not allowed to write into directory > "/root/.cpan/sources/modules". > > I'll continue, but if you encounter problems, they may be due > > to insufficient permissions. > > Fetching with LWP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[Cannot write to > '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission > denied] > > Fetching with Net::FTP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from ftp.nrc.ca > > Fetching with LWP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[FTP close response: 500 Network > seems to > have barfed - Let's all phone our ISP and go postal! > > Unknown command. > > ] > > Fetching with Net::FTP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca > > Fetching with LWP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'cpan.mirror.cygnal.ca'] > > Fetching with Net::FTP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Fetching with LWP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'mirror.isurf.ca'] > > Fetching with Net::FTP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: cpan.mirror.cygnal.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > . > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/03modlist.data.gz > > Trying to get away with old file: > > 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 > /root/.cpan/sources/modules/03modlist.data.gz > > Going to read /root/.cpan/sources/modules/03modlist.data.gz > > Going to write /root/.cpan/Metadata > > can't create /root/.cpan/Metadata: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 3432 > > Running install for module Test::Harness > > Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz > > mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 2342 > > ****************************************** > Alison S. Waller M.A.Sc. > Doctoral Candidate > awaller at chem-eng.utoronto.ca > 416-978-4222 (lab) > Department of Chemical Engineering > Wallberg Building > 200 College st. > Toronto, ON > M5S 3E5 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 29 23:57:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:57:36 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- core (I think they were added prior to the 1.5.1 release, but I'm not positive). If possible you should try installing bioperl 1.5.2 or the latest code from CVS. For directions on installing Bioperl for most OS's go here: http://www.bioperl.org/wiki/Installing_BioPerl From CVS: http://www.bioperl.org/wiki/Using_CVS chris On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org > > -- > View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 30 03:45:57 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 30 Nov 2007 08:45:57 +0000 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: <474FCDC5.5020100@sendu.me.uk> alison waller wrote: > I would like to install the CVS version of bioperl as I know of some code > changes that will be useful to me. However, I am having problems installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. [...] > Please check, if the URLs I found in your configuration file > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are > valid. The urllist can be edited. E.g. with 'o conf urllist push > ftp://myurl/' Either these urls are invalid as suggested (try setting the urllist to nothing), or your linux cluster doesn't have internet access. You can't do a 'proper' install of BioPerl and its dependencies without internet access. However, for most purposes simply downloading the BioPerl modules (ie. from a different machine with internet access) and pointing your PERL5LIB to their location is sufficient. You can download CVS modules from the BioPerl website individually, or as a tarball or everything. From MEC at stowers-institute.org Fri Nov 30 09:12:09 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 30 Nov 2007 08:12:09 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: How many, how often? Use ensembl biomart! First time interactively. Then if you to pipeline it, take the perl code it generates for you and run it - of course you'll have to install the Ensembl Perl API.... Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Paul N. Hengen > Sent: Wednesday, November 28, 2007 7:21 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs > > > Hi. > > I have a number of gene IDs from Entrez and I want to find > the start and end locations in the human genome. This seemed > simple enough, so I started working through some of the > examples for using the EntrezGene module at www.bioperl.org > Of course this did not work because the core installation > does not include this module. So, I think I have two choices > (1) install the module (how?), or (2) find an easier way to > get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research City of Hope > National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 > USA mailto:paulhengen at coh.org > > -- > View this message in context: > http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E > ntrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Fri Nov 30 09:38:58 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 30 Nov 2007 09:38:58 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> Message-ID: Paul, Have you taken a look at this page? http://www.bioperl.org/wiki/Getting_Genomic_Sequences There's code there that looks similar to what you're proposing. Brian O. On 11/28/07 8:20 PM, "Paul N. Hengen" wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org From cjfields at uiuc.edu Fri Nov 30 10:47:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Nov 2007 09:47:32 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47502C75.60809@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask Mingyi Liu if he would like to include this parser with BioPerl (since it requires it, makes sense to me, and it avoids the circular dependency that has plagued these modules). chris On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > Chris Fields wrote: > Chris, > Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the > low-level parser and is not part of bioperl. There is a circular > dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... > Paul, you can get it from CPAN and this should make > Bio::SeqIO::entrezgene functional for you. > Stefan > > >> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >> core (I think they were added prior to the 1.5.1 release, but I'm not >> positive). If possible you should try installing bioperl 1.5.2 or >> the >> latest code from CVS. >> >> For directions on installing Bioperl for most OS's go here: >> >> http://www.bioperl.org/wiki/Installing_BioPerl >> >> From CVS: >> >> http://www.bioperl.org/wiki/Using_CVS >> >> chris >> >> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >> >> >>> Hi. >>> >>> I have a number of gene IDs from Entrez and I want to find the >>> start and end locations in the human genome. This seemed simple >>> enough, so I started working through some of the examples for >>> using the EntrezGene module at www.bioperl.org Of course this >>> did not work because the core installation does not include this >>> module. So, I think I have two choices (1) install the module >>> (how?), >>> or (2) find an easier way to get the locations in the human genome. >>> I want to use the locations to grab sequences out of the genome. >>> Can anyone offer advice on this? Thanks. >>> >>> -Paul. >>> >>> -- >>> Paul N. Hengen, Ph.D. >>> Hematopoietic Stem Cell and Leukemia Research >>> City of Hope National Medical Center >>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>> mailto:paulhengen at coh.org >>> >>> -- >>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Fri Nov 30 11:12:22 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 11:12:22 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> Message-ID: <47503666.8090004@bms.com> Chris Fields wrote: > My bad. I always forget about Bio::ASN1::Entrezgene. We should ask > Mingyi Liu if he would like to include this parser with BioPerl (since > it requires it, makes sense to me, and it avoids the circular > dependency that has plagued these modules). > Yes, I think this would be a good step. Stefan > chris > > On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > > >> Chris Fields wrote: >> Chris, >> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >> low-level parser and is not part of bioperl. There is a circular >> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >> Paul, you can get it from CPAN and this should make >> Bio::SeqIO::entrezgene functional for you. >> Stefan >> >> >> >>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>> core (I think they were added prior to the 1.5.1 release, but I'm not >>> positive). If possible you should try installing bioperl 1.5.2 or >>> the >>> latest code from CVS. >>> >>> For directions on installing Bioperl for most OS's go here: >>> >>> http://www.bioperl.org/wiki/Installing_BioPerl >>> >>> From CVS: >>> >>> http://www.bioperl.org/wiki/Using_CVS >>> >>> chris >>> >>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>> >>> >>> >>>> Hi. >>>> >>>> I have a number of gene IDs from Entrez and I want to find the >>>> start and end locations in the human genome. This seemed simple >>>> enough, so I started working through some of the examples for >>>> using the EntrezGene module at www.bioperl.org Of course this >>>> did not work because the core installation does not include this >>>> module. So, I think I have two choices (1) install the module >>>> (how?), >>>> or (2) find an easier way to get the locations in the human genome. >>>> I want to use the locations to grab sequences out of the genome. >>>> Can anyone offer advice on this? Thanks. >>>> >>>> -Paul. >>>> >>>> -- >>>> Paul N. Hengen, Ph.D. >>>> Hematopoietic Stem Cell and Leukemia Research >>>> City of Hope National Medical Center >>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>> mailto:paulhengen at coh.org >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From stefan.kirov at bms.com Fri Nov 30 10:29:57 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 10:29:57 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: <47502C75.60809@bms.com> Chris Fields wrote: Chris, Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the low-level parser and is not part of bioperl. There is a circular dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... Paul, you can get it from CPAN and this should make Bio::SeqIO::entrezgene functional for you. Stefan > Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- > core (I think they were added prior to the 1.5.1 release, but I'm not > positive). If possible you should try installing bioperl 1.5.2 or the > latest code from CVS. > > For directions on installing Bioperl for most OS's go here: > > http://www.bioperl.org/wiki/Installing_BioPerl > > From CVS: > > http://www.bioperl.org/wiki/Using_CVS > > chris > > On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find the >> start and end locations in the human genome. This seemed simple >> enough, so I started working through some of the examples for >> using the EntrezGene module at www.bioperl.org Of course this >> did not work because the core installation does not include this >> module. So, I think I have two choices (1) install the module (how?), >> or (2) find an easier way to get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research >> City of Hope National Medical Center >> 1500 E. Duarte Road, Duarte, CA 91010 USA >> mailto:paulhengen at coh.org >> >> -- >> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arareko at campus.iztacala.unam.mx Fri Nov 30 12:01:29 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 30 Nov 2007 11:01:29 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47503666.8090004@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> <47503666.8090004@bms.com> Message-ID: <475041E9.8050909@campus.iztacala.unam.mx> I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the past, he mentioned he doesn't track the list closely). Mauricio. Stefan Kirov wrote: > Chris Fields wrote: >> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask >> Mingyi Liu if he would like to include this parser with BioPerl (since >> it requires it, makes sense to me, and it avoids the circular >> dependency that has plagued these modules). >> > Yes, I think this would be a good step. > Stefan >> chris >> >> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: >> >> >>> Chris Fields wrote: >>> Chris, >>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >>> low-level parser and is not part of bioperl. There is a circular >>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >>> Paul, you can get it from CPAN and this should make >>> Bio::SeqIO::entrezgene functional for you. >>> Stefan >>> >>> >>> >>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>>> core (I think they were added prior to the 1.5.1 release, but I'm not >>>> positive). If possible you should try installing bioperl 1.5.2 or >>>> the >>>> latest code from CVS. >>>> >>>> For directions on installing Bioperl for most OS's go here: >>>> >>>> http://www.bioperl.org/wiki/Installing_BioPerl >>>> >>>> From CVS: >>>> >>>> http://www.bioperl.org/wiki/Using_CVS >>>> >>>> chris >>>> >>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>>> >>>> >>>> >>>>> Hi. >>>>> >>>>> I have a number of gene IDs from Entrez and I want to find the >>>>> start and end locations in the human genome. This seemed simple >>>>> enough, so I started working through some of the examples for >>>>> using the EntrezGene module at www.bioperl.org Of course this >>>>> did not work because the core installation does not include this >>>>> module. So, I think I have two choices (1) install the module >>>>> (how?), >>>>> or (2) find an easier way to get the locations in the human genome. >>>>> I want to use the locations to grab sequences out of the genome. >>>>> Can anyone offer advice on this? Thanks. >>>>> >>>>> -Paul. >>>>> >>>>> -- >>>>> Paul N. Hengen, Ph.D. >>>>> Hematopoietic Stem Cell and Leukemia Research >>>>> City of Hope National Medical Center >>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>>> mailto:paulhengen at coh.org >>>>> >>>>> -- >>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Fri Nov 30 15:21:13 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 30 Nov 2007 12:21:13 -0800 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases In-Reply-To: <193573097@newdonner.Dartmouth.EDU> References: <193573097@newdonner.Dartmouth.EDU> Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org> Viktor - Bio::SearchIO helps you parse BLAST reports, but don't underestimate the power of going as low-tech as possible and outputting scores with the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular format that is parseable with the 'split' function in Perl. See the wiki http://bioperl.org/wiki for HOWTOs and examples of using the parsers. You might also consider already-written tools like OrthoMCL, InParanoid, and other that help you define relationships like orthologs and paralogs among species. There also exist a few published web resources that have pre-computed homologs for you, might take a look around first unless the point of the project is to learn how to run these kinds of searches. For general Perl help consider Perlmonks.org and some of the introductory books that are available. -jason -- Jason Stajich jason at bioperl.org On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote: > Hello, > > My name is Viktor Martyanov and I am a Ph.D. student in biology at > Dartmouth. > > I need to be able to use a set of genes or FASTA sequences from S. > cerevisiae and retrieve a set of corresponding homologs from other > fungal species via BLASTP searches. > > I would like to find out if there are Perl scripts that approach > this problem. By the way, is there a Perl community or forum where > I could post this question? > > Thanks very much. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri Nov 30 17:03:23 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 30 Nov 2007 15:03:23 -0700 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: Paul, One other alternative is to use the UCSC table browser (http:// genome.ucsc.edu/cgi-bin/hgTables?command=start). Select your organism, upload your ID list. Select you output options. You can download the coordinates or the fasta directly. You have options for including or excluding various parts of the gene, and upstream/ downstream sequences. This is similar the solution that Malcom suggested except the Ensembl option can be run repeatedly as perl code as he pointed out. UCSC allows you to do remote connections to their MySQL server so you could set up a repeated task and more complex queries that way with the UCSC method. Barry On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote: > How many, how often? > > Use ensembl biomart! > > First time interactively. > > Then if you to pipeline it, take the perl code it generates for you > and > run it - of course you'll have to install the Ensembl Perl API.... > > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Paul N. Hengen >> Sent: Wednesday, November 28, 2007 7:21 PM >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez >> IDs >> >> >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find >> the start and end locations in the human genome. This seemed >> simple enough, so I started working through some of the >> examples for using the EntrezGene module at www.bioperl.org >> Of course this did not work because the core installation >> does not include this module. So, I think I have two choices >> (1) install the module (how?), or (2) find an easier way to >> get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research City of Hope >> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 >> USA mailto:paulhengen at coh.org >> >> -- >> View this message in context: >> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E >> ntrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Nov 30 23:37:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Nov 2007 22:37:50 -0600 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <000901c833bf$33d53500$0a02a8c0@AWALL> References: <000901c833bf$33d53500$0a02a8c0@AWALL> Message-ID: <75FF7E93-1633-4D43-9BC0-8BE2A6A7711D@uiuc.edu> Make sure to keep this on the list. ncbi_gi() is only in bioperl-live (CVS); my guess is you either somehow got 1.5.2 instead or the bioperl-live version is not found in your path. It's very likely the latter, as perl's looking for whatever else is present (which appears to be an older version of bioperl). That should give you a hint that the problem may be with your lib path. Try changing the 'Use lib '/home/awaller/bioperl-live/ Bio'' to: use lib '/home/awaller/bioperl-live'; chris On Nov 30, 2007, at 8:09 PM, alison waller wrote: > Okay so Now I'm really confused. > I edited > #!usr/bin/perl >> Use lib '/home/awaller/bioperl-live/Bio. > I ran the script below with the *special hit->ncbi from Chris. It > worked, > it was great, I got the gi! No errors, no bugs that I saw in > checking the > output. Then I went back in, edited the script to retrieve further > info > (specifically the strand). Saved it, now when I try to run it I get > the > same error message that I was previously getting. > > barrett ~ $ perl blast_parse_awcf.pl OldMoBlastxGiTest.txt 1 > Can't locate object method "ncbi_gi" via package > "Bio::Search::Hit::BlastHit" at blast_parse_awcf.pl line 50, > line > 189. > > Thanks soo much, > > > #!usr/bin/perl > > use strict; > use warnings; > use lib "/home/awaller/bioperl-live/Bio"; > use Bio::Perl; > use Bio::SearchIO; > > my $usage = "to run type: blast_parse_aw.pl <# of > hits per > query> \n"; if (@ARGV != 2) { die $usage; } > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! > \n"; > > my $report = Bio::SearchIO->new( > -file => $infile, > -format => "blast" > ); > > print OUT join("\t",qw( > Query > HitDesc > HitAccess > HitGi > HitBits > Evalue > %id > AlignLen > NumIdent > NumPos > gaps > Qframe > Qstrand > Hframe > Hstrand))."\n"; > > # Go through BLAST reports one by one > while ( my $result = $report->next_result ) { > my $ct = 0; > my @tophits = grep {$ct++ < $tophit } $result->hits; > if (scalar(@tophits) == 0) { > print OUT "no hits\n"; > } > for my $hit (@tophits) { > my $tophsp=$hit->hsp('best'); > # Print some tab-delimited data about this hit > print OUT join("\t", > $result->query_name, > $hit->description, > $hit->accession, > $hit->ncbi_gi, > $hit->bits, > $tophsp->evalue, > $tophsp->percent_identity, > $tophsp->length('total'), > $tophsp->num_identical, > $tophsp->num_conserved, > $tophsp->gaps, > $tophsp->query->frame, > $tophsp->strand('query'), > $tophsp->hit->frame, > $tophsp->strand('hit'), > )."\n"; > } > } > > > > > -----Original Message----- > From: Sendu Bala [mailto:bix at sendu.me.uk] > Sent: Friday, November 30, 2007 6:24 PM > To: alison waller > Subject: Re: [Bioperl-l] Problems installing bioperl (bioperl-live > tarball > from CVS) > > alison waller wrote: >> Thank you Sendu, >> >> So I'm trying the second option. I have downloaded the bioperl-live > tarball >> from the CVS on my windows laptop, and then moved it to my home >> directory > in >> the linux cluster where I unzipped and tared it. So I now have a > directory >> /home/awaller/bioperl-live. >> >> I edited my .bashrc file as below: >> Export PERL5LIB='/home/awaller/bioperl-live' >> >> I also edited a sample script to include: >> #!usr/bin/perl >> Use lib '/home/awaller/bioperl-live' > > Does this directory contain a 'Bio' directory with all the BioPerl > modules inside it? > > >> But it still isn't working. >> At the prompt I typed$ perl script.pl >> It gave me the warning - can't locate object method ncbi_gi which >> is why > I'm >> trying to download the CVS version as Chris Fields added code to >> make the >> ncbi-gi object. > > You'll have to give me the complete, unedited error message and > ideally > the script itself before I can help you further. > > >> Don't I have to do something similar to what the Build.PL file does? > > Probably not. It doesn't matter where your perl executable is, btw, as > long as the system knows how to run perl, which it obviously does. > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From barry.moore at genetics.utah.edu Thu Nov 1 00:03:01 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 31 Oct 2007 22:03:01 -0600 Subject: [Bioperl-l] BLAST output parsing In-Reply-To: References: <13519112.post@talk.nabble.com> Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu> Swapna- If you are using NCBI fasta files you can use files from NCBIs gene database to map your gene IDs to names and organisms. Look in particular at the files gene2accession, gene2refseq, and gene_info. For example, if you had RefSeq protein IDs like NP_123456, you could use gene2refseq to map those RefSeq accessions to gene IDs and then gene_info to map the gene IDs to organisms and gene name. B On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote: > Swapna, > >> I am new to bioperl. I did BLAST search of ~4000 genes and I need >> to parse >> it. I did use -m 9 option to get a tabular information of the >> blast data. >> But it does not include the gene names or the names of the >> organisms of each >> hit. Are there any parsers that can do this job ?? > > The -m 9 tabular output does not include gene descriptions and > organisms. It only includes the "gene id" that was present immediately > after the ">" sign in the FASTA file that was used to create the BLAST > database you specified with the -d option when you ran BLAST. > > Hence, no parser will help you. You either have to re-do the BLAST > with a different -m value that includes the information you desire, or > write code to convert your gene IDs into what you want. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 05:45:43 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 10:45:43 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de> Dear all, I have emboss installed on a windows machine. (Embosswin). I can run this from the dos command line and the path is present. However, when I try to call an emboss application from bioperl I get a "Application not found error" my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); gives the following error -------------------- WARNING --------------------- MSG: Application [fuzznuc] is not available! --------------------------------------------------- Can't call method "run" on an undefined value at searchPatterns.pl line 102. Can somebody help me fix this ? best regards Rohit -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 10:22:14 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:22:14 -0400 Subject: [Bioperl-l] PAML/Codeml parsing Message-ID: PAML4 breaks our PAML parser right now because the order of things in the result file has changed. Now sequences precede the information about the version or the program run. This means that $result- >get_seqs() fails because we don't parse the sequences. We'll see what we can do, but as usual with supporting 3rd party programs it is brittle when file formats change. Th -jason -- Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Nov 1 10:32:06 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:32:06 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Presumably the PATH is not getting set properly - you should play around printing the $ENV{PATH} variable in a perl script to see if actually contains the directory where the emboss programs are installed. Bioperl can only guess so much as to where to find an application. It is also possible that we aren't creating the proper path to the executable - you can print the executable path with print $fuzznuc->executable I believe unless it is throwing an error at the program() line. It looks like the code in the Factory object is a little fragile assuming that the programs HAVE to be in your $PATH. I don't know if windows+perl is special in any way that it run things so I can't really tell if there is specific things you have to do here. You may have to run this through cygwin in case PATH and such are just not available properly to windowsPerl. -jason On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. However, > when I > try to call > an emboss application from bioperl I get a "Application not found > error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at searchPatterns.pl > line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Thu Nov 1 10:54:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 09:54:09 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu> This worked for me previously when I tested with WinXP on my old machine using EMBOSS v5: ftp://emboss.open-bio.org/pub/EMBOSS/windows I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably better to use the latest EMBOSS version anyway so I suggest trying the version in the above link. I'll test it again today and let you know what I find. chris On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, >> when I >> try to call >> an emboss application from bioperl I get a "Application not found >> error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl >> line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Thu Nov 1 11:31:40 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 11:31:40 -0400 Subject: [Bioperl-l] PAML3 vs 4 Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org> Small tweaks were needed to parse PAML4 results. Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly now on both PAML 3 and 4. You'll need to get the latest code from CVS in order to see the changes to Bio/Tools/Phylo/PAML.pm I've added tests for PAML4 in the parser and the run code. If you have scripts that use codeml please give it a try. I have not attempted to play with BASEML or AAML results at this point so if you also have codes that use those programs, please try it out and provide bugreports if we need to fix things. -jason -- Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Nov 1 13:25:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 1 Nov 2007 10:25:30 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl onwindows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu> Sounds like a path issue. Try to tell bioperl the full path to the executable rather than just the executable name. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 2:46 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl: cannot run emboss programs > using bioperl onwindows > > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. > However, when I > try to call > an emboss application from bioperl I get a "Application not > found error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at > searchPatterns.pl line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 14:06:48 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:06:48 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de> Thanks for all the suggestions... but I unfortunately still cannot run emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), and the path is set correctly. I printed $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct location. I also tried setting the path directly but I'm not sure how to do this, so I tried this... my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); this also did not work. Also tried printing... $fuzznuc->executable() gave the following error again -------------------- WARNING --------------------- MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! --------------------------------------------------- Any more ideas ? thanks ! Rohit here's the code... use strict; use Bio::Factory::EMBOSS; use Data::Dumper; # # print "PATH=$ENV{PATH}\n"; # path contains C:\EMBOSSwin which is the correct location # embossversion is 2.10.0-Win-0.8 my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper ($f); my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe as well, print Dump ($fuzznuc); #dump of fuzznuc #$VAR1 = bless( { # '_programgroup' => {}, # '_programs' => {}, # '_groups' => {} # }, 'Bio::Factory::EMBOSS' ); #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work my $infile = "temp.fasta"; my $motif = "ATGTCGATC"; my $outfile = "test.out"; $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); Here's the error again.... #-------------------- WARNING --------------------- #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! #--------------------------------------------------- Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, when I >> try to call >> an emboss application from bioperl I get a "Application not found error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 14:37:24 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 14:37:24 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> You could try this - can't test it though so not sure. my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); -jason On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > Thanks for all the suggestions... but I unfortunately still cannot run > emboss. I am running the latest version of embosswin (2.10.0- > Win-0.8), > and the > path is set correctly. I printed $ENV{$PATH} and this contains > C:\EMBOSSwin which is the correct location. > I also tried setting the path directly but I'm not sure how to do > this, > so I tried this... > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > this also did not work. > > Also tried printing... > $fuzznuc->executable() > > gave the following error again > -------------------- WARNING --------------------- > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > --------------------------------------------------- > > Any more ideas ? > > thanks ! > Rohit > > > here's the code... > > use strict; > use Bio::Factory::EMBOSS; > use Data::Dumper; > > # > # print "PATH=$ENV{PATH}\n"; > # path contains C:\EMBOSSwin which is the correct location > # embossversion is 2.10.0-Win-0.8 > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > print Dumper ($f); > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > fuzznuc.exe > as well, > print Dump ($fuzznuc); > > #dump of fuzznuc > #$VAR1 = bless( { > # '_programgroup' => {}, > # '_programs' => {}, > # '_groups' => {} > # }, 'Bio::Factory::EMBOSS' ); > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > my $infile = "temp.fasta"; > my $motif = "ATGTCGATC"; > my $outfile = "test.out"; > > > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > > Here's the error again.... > > #-------------------- WARNING --------------------- > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > #--------------------------------------------------- > > > > > Jason Stajich wrote: >> Presumably the PATH is not getting set properly - you should play >> around printing the $ENV{PATH} variable in a perl script to see if >> actually contains the directory where the emboss programs are >> installed. Bioperl can only guess so much as to where to find an >> application. It is also possible that we aren't creating the proper >> path to the executable - you can print the executable path with >> print $fuzznuc->executable >> I believe unless it is throwing an error at the program() line. >> >> It looks like the code in the Factory object is a little fragile >> assuming that the programs HAVE to be in your $PATH. I don't know if >> windows+perl is special in any way that it run things so I can't >> really tell if there is specific things you have to do here. You may >> have to run this through cygwin in case PATH and such are just not >> available properly to windowsPerl. >> >> -jason >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >> >>> Dear all, >>> >>> I have emboss installed on a windows machine. (Embosswin). I can run >>> this from the dos command line and the path is present. However, >>> when I >>> try to call >>> an emboss application from bioperl I get a "Application not found >>> error" >>> >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> gives the following error >>> >>> -------------------- WARNING --------------------- >>> MSG: Application [fuzznuc] is not available! >>> --------------------------------------------------- >>> Can't call method "run" on an undefined value at >>> searchPatterns.pl line >>> 102. >>> >>> Can somebody help me fix this ? >>> >>> best regards >>> Rohit >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 14:41:41 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:41:41 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de> Hi Jason I tried this as well. This also gives the same error message. -Rohit Jason Stajich wrote: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > >> >> >> Thanks for all the suggestions... but I unfortunately still cannot run >> emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), >> and the >> path is set correctly. I printed $ENV{$PATH} and this contains >> C:\EMBOSSwin which is the correct location. >> I also tried setting the path directly but I'm not sure how to do this, >> so I tried this... >> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >> >> this also did not work. >> >> Also tried printing... >> $fuzznuc->executable() >> >> gave the following error again >> -------------------- WARNING --------------------- >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> --------------------------------------------------- >> >> Any more ideas ? >> >> thanks ! >> Rohit >> >> >> here's the code... >> >> use strict; >> use Bio::Factory::EMBOSS; >> use Data::Dumper; >> >> # >> # print "PATH=$ENV{PATH}\n"; >> # path contains C:\EMBOSSwin which is the correct location >> # embossversion is 2.10.0-Win-0.8 >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> print Dumper ($f); >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >> print Dump ($fuzznuc); >> >> #dump of fuzznuc >> #$VAR1 = bless( { >> # '_programgroup' => {}, >> # '_programs' => {}, >> # '_groups' => {} >> # }, 'Bio::Factory::EMBOSS' ); >> >> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >> >> my $infile = "temp.fasta"; >> my $motif = "ATGTCGATC"; >> my $outfile = "test.out"; >> >> >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> >> Here's the error again.... >> >> #-------------------- WARNING --------------------- >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> #--------------------------------------------------- >> >> >> >> >> Jason Stajich wrote: >>> Presumably the PATH is not getting set properly - you should play >>> around printing the $ENV{PATH} variable in a perl script to see if >>> actually contains the directory where the emboss programs are >>> installed. Bioperl can only guess so much as to where to find an >>> application. It is also possible that we aren't creating the proper >>> path to the executable - you can print the executable path with >>> print $fuzznuc->executable >>> I believe unless it is throwing an error at the program() line. >>> >>> It looks like the code in the Factory object is a little fragile >>> assuming that the programs HAVE to be in your $PATH. I don't know if >>> windows+perl is special in any way that it run things so I can't >>> really tell if there is specific things you have to do here. You may >>> have to run this through cygwin in case PATH and such are just not >>> available properly to windowsPerl. >>> >>> -jason >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>> >>>> Dear all, >>>> >>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>> this from the dos command line and the path is present. However, >>>> when I >>>> try to call >>>> an emboss application from bioperl I get a "Application not found >>>> error" >>>> >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> my $fuzznuc = $f->program('fuzznuc'); >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> gives the following error >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: Application [fuzznuc] is not available! >>>> --------------------------------------------------- >>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>> line >>>> 102. >>>> >>>> Can somebody help me fix this ? >>>> >>>> best regards >>>> Rohit >>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From MEC at stowers-institute.org Thu Nov 1 14:57:33 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 1 Nov 2007 13:57:33 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: in the code http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 there is a call to `wossname` (c.f. http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html ) is wossname in your path? Maybe it needs to be wossname.exe under windows? Malcolm Cook > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 1:42 PM > To: Jason Stajich > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs > usingbioperlonwindows > > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: > > You could try this - can't test it though so not sure. > > my $fuzznuc = $f->program('fuzznuc'); > > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > > > -jason > > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > >> > >> > >> Thanks for all the suggestions... but I unfortunately still cannot > >> run emboss. I am running the latest version of embosswin > >> (2.10.0-Win-0.8), and the path is set correctly. I printed > >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct > >> location. > >> I also tried setting the path directly but I'm not sure how to do > >> this, so I tried this... > >> > >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > >> > >> this also did not work. > >> > >> Also tried printing... > >> $fuzznuc->executable() > >> > >> gave the following error again > >> -------------------- WARNING --------------------- > >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> --------------------------------------------------- > >> > >> Any more ideas ? > >> > >> thanks ! > >> Rohit > >> > >> > >> here's the code... > >> > >> use strict; > >> use Bio::Factory::EMBOSS; > >> use Data::Dumper; > >> > >> # > >> # print "PATH=$ENV{PATH}\n"; > >> # path contains C:\EMBOSSwin which is the correct location # > >> embossversion is 2.10.0-Win-0.8 > >> > >> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS > application > >> object from the factory print Dumper ($f); my $fuzznuc = > >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe > as well, > >> print Dump ($fuzznuc); > >> > >> #dump of fuzznuc > >> #$VAR1 = bless( { > >> # '_programgroup' => {}, > >> # '_programs' => {}, > >> # '_groups' => {} > >> # }, 'Bio::Factory::EMBOSS' ); > >> > >> #print "executing -- >", $fuzznuc->executable, "\n" ; # > doesn't work > >> > >> my $infile = "temp.fasta"; > >> my $motif = "ATGTCGATC"; > >> my $outfile = "test.out"; > >> > >> > >> $fuzznuc->run( > >> { -sequence => $infile, > >> -pattern => $motif, > >> -outfile => $outfile > >> }); > >> > >> Here's the error again.... > >> > >> #-------------------- WARNING --------------------- > >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> #--------------------------------------------------- > >> > >> > >> > >> > >> Jason Stajich wrote: > >>> Presumably the PATH is not getting set properly - you should play > >>> around printing the $ENV{PATH} variable in a perl script > to see if > >>> actually contains the directory where the emboss programs are > >>> installed. Bioperl can only guess so much as to where to find an > >>> application. It is also possible that we aren't creating > the proper > >>> path to the executable - you can print the executable path with > >>> print $fuzznuc->executable I believe unless it is > throwing an error > >>> at the program() line. > >>> > >>> It looks like the code in the Factory object is a little fragile > >>> assuming that the programs HAVE to be in your $PATH. I > don't know > >>> if > >>> windows+perl is special in any way that it run things so I can't > >>> really tell if there is specific things you have to do > here. You may > >>> have to run this through cygwin in case PATH and such are > just not > >>> available properly to windowsPerl. > >>> > >>> -jason > >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >>> > >>>> Dear all, > >>>> > >>>> I have emboss installed on a windows machine. (Embosswin). I can > >>>> run this from the dos command line and the path is present. > >>>> However, when I try to call an emboss application from bioperl I > >>>> get a "Application not found error" > >>>> > >>>> > >>>> my $f = Bio::Factory::EMBOSS->new(); > >>>> # get an EMBOSS application object from the factory > >>>> my $fuzznuc = $f->program('fuzznuc'); > >>>> $fuzznuc->run( > >>>> { -sequence => $infile, > >>>> -pattern => $motif, > >>>> -outfile => $outfile > > >>>> }); > >>>> gives the following error > >>>> > >>>> -------------------- WARNING --------------------- > >>>> MSG: Application [fuzznuc] is not available! > >>>> --------------------------------------------------- > >>>> Can't call method "run" on an undefined value at > searchPatterns.pl > >>>> line 102. > >>>> > >>>> Can somebody help me fix this ? > >>>> > >>>> best regards > >>>> Rohit > >>>> > >>>> -- > >>>> > >>>> Dr. Rohit Ghai > >>>> Institute of Medical Microbiology > >>>> Faculty of Medicine > >>>> Justus-Liebig University > >>>> Frankfurter Strasse 107 > >>>> 35392 - Giessen > >>>> GERMANY > >>>> > >>>> Tel : 0049 (0)641-9946413 > >>>> Fax : 0049 (0)641-9946409 > >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> -- > >>> Jason Stajich > >>> jason at bioperl.org > >>> > >> > >> -- > >> > >> Dr. Rohit Ghai > >> Institute of Medical Microbiology > >> Faculty of Medicine > >> Justus-Liebig University > >> Frankfurter Strasse 107 > >> 35392 - Giessen > >> GERMANY > >> > >> Tel : 0049 (0)641-9946413 > >> Fax : 0049 (0)641-9946409 > >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Thu Nov 1 15:51:41 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Nov 2007 13:51:41 -0600 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx> Doesn't EMBOSS binaries live under 'bin'? Perhaps setting PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this: my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc'); Adding .exe might be worth trying as well. Mauricio. Cook, Malcolm wrote: > in the code > http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 > > there is a call to `wossname` (c.f. > http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html > ) > > is wossname in your path? > > Maybe it needs to be wossname.exe under windows? > > > Malcolm Cook > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai >> Sent: Thursday, November 01, 2007 1:42 PM >> To: Jason Stajich >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs >> usingbioperlonwindows >> >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot >>>> run emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), and the path is set correctly. I printed >>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct >>>> location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location # >>>> embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS >> application >>>> object from the factory print Dumper ($f); my $fuzznuc = >>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # >> doesn't work >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script >> to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating >> the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable I believe unless it is >> throwing an error >>>>> at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I >> don't know >>>>> if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do >> here. You may >>>>> have to run this through cygwin in case PATH and such are >> just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can >>>>>> run this from the dos command line and the path is present. >>>>>> However, when I try to call an emboss application from bioperl I >>>>>> get a "Application not found error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >> >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at >> searchPatterns.pl >>>>>> line 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> >>>>>> Dr. Rohit Ghai >>>>>> Institute of Medical Microbiology >>>>>> Faculty of Medicine >>>>>> Justus-Liebig University >>>>>> Frankfurter Strasse 107 >>>>>> 35392 - Giessen >>>>>> GERMANY >>>>>> >>>>>> Tel : 0049 (0)641-9946413 >>>>>> Fax : 0049 (0)641-9946409 >>>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Nov 1 16:07:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 15:07:39 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> I did a little investigating using my old PC and was able to get fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a hoop or two but I managed to get it working. First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. You need to remove EMBOSSWin and install the one I linked to previously (this is an actual EMBOSS beta release). It's possible older EMBOSSWin can be configured, but I don't plan on checking it out myself. Next, you need to ensure the binaries are in your PATH env. variable (test by running 'wossname' on the command line), then set EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP recognizes the UNIX'y form as a valid path. If you don't know how to set env. variables go here: http://vlaurie.com/computers2/Articles/environment.htm Once that is set up you should be able to run the script using the latest (greatest?) EMBOSS. chris On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: >> You could try this - can't test it though so not sure. >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >> >> -jason >> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >> >>> >>> >>> Thanks for all the suggestions... but I unfortunately still >>> cannot run >>> emboss. I am running the latest version of embosswin (2.10.0- >>> Win-0.8), >>> and the >>> path is set correctly. I printed $ENV{$PATH} and this contains >>> C:\EMBOSSwin which is the correct location. >>> I also tried setting the path directly but I'm not sure how to do >>> this, >>> so I tried this... >>> >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>> >>> this also did not work. >>> >>> Also tried printing... >>> $fuzznuc->executable() >>> >>> gave the following error again >>> -------------------- WARNING --------------------- >>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> --------------------------------------------------- >>> >>> Any more ideas ? >>> >>> thanks ! >>> Rohit >>> >>> >>> here's the code... >>> >>> use strict; >>> use Bio::Factory::EMBOSS; >>> use Data::Dumper; >>> >>> # >>> # print "PATH=$ENV{PATH}\n"; >>> # path contains C:\EMBOSSwin which is the correct location >>> # embossversion is 2.10.0-Win-0.8 >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> print Dumper ($f); >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>> fuzznuc.exe >>> as well, >>> print Dump ($fuzznuc); >>> >>> #dump of fuzznuc >>> #$VAR1 = bless( { >>> # '_programgroup' => {}, >>> # '_programs' => {}, >>> # '_groups' => {} >>> # }, 'Bio::Factory::EMBOSS' ); >>> >>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't >>> work >>> >>> my $infile = "temp.fasta"; >>> my $motif = "ATGTCGATC"; >>> my $outfile = "test.out"; >>> >>> >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> >>> Here's the error again.... >>> >>> #-------------------- WARNING --------------------- >>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> #--------------------------------------------------- >>> >>> >>> >>> >>> Jason Stajich wrote: >>>> Presumably the PATH is not getting set properly - you should play >>>> around printing the $ENV{PATH} variable in a perl script to see if >>>> actually contains the directory where the emboss programs are >>>> installed. Bioperl can only guess so much as to where to find an >>>> application. It is also possible that we aren't creating the >>>> proper >>>> path to the executable - you can print the executable path with >>>> print $fuzznuc->executable >>>> I believe unless it is throwing an error at the program() line. >>>> >>>> It looks like the code in the Factory object is a little fragile >>>> assuming that the programs HAVE to be in your $PATH. I don't >>>> know if >>>> windows+perl is special in any way that it run things so I can't >>>> really tell if there is specific things you have to do here. You >>>> may >>>> have to run this through cygwin in case PATH and such are just not >>>> available properly to windowsPerl. >>>> >>>> -jason >>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>> >>>>> Dear all, >>>>> >>>>> I have emboss installed on a windows machine. (Embosswin). I >>>>> can run >>>>> this from the dos command line and the path is present. However, >>>>> when I >>>>> try to call >>>>> an emboss application from bioperl I get a "Application not found >>>>> error" >>>>> >>>>> >>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>> # get an EMBOSS application object from the factory >>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>> $fuzznuc->run( >>>>> { -sequence => $infile, >>>>> -pattern => $motif, >>>>> -outfile => $outfile >>>>> }); >>>>> gives the following error >>>>> >>>>> -------------------- WARNING --------------------- >>>>> MSG: Application [fuzznuc] is not available! >>>>> --------------------------------------------------- >>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>> line >>>>> 102. >>>>> >>>>> Can somebody help me fix this ? >>>>> >>>>> best regards >>>>> Rohit >>>>> >>>>> -- >>>>> >>>>> Dr. Rohit Ghai >>>>> Institute of Medical Microbiology >>>>> Faculty of Medicine >>>>> Justus-Liebig University >>>>> Frankfurter Strasse 107 >>>>> 35392 - Giessen >>>>> GERMANY >>>>> >>>>> Tel : 0049 (0)641-9946413 >>>>> Fax : 0049 (0)641-9946409 >>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From neetisomaiya at gmail.com Fri Nov 2 00:20:27 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 2 Nov 2007 09:50:27 +0530 Subject: [Bioperl-l] need help Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Hi, This is a perl question, not bioperl. Can anyone point me to a perl program/code/function which can calculate the number of days between any two given dates. Any help will be deeply appreciated. Thanks. -- -Neeti Even my blood says, B positive From whs at ebi.ac.uk Fri Nov 2 01:01:20 2007 From: whs at ebi.ac.uk (Will Spooner) Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT) Subject: [Bioperl-l] need help In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Message-ID: Hi Neeti, A non-bioperl answer to your perl questio; Date::Calc should do the trick. Will On Fri, 2 Nov 2007, neeti somaiya wrote: > Hi, > > This is a perl question, not bioperl. > Can anyone point me to a perl program/code/function which can calculate the > number of days between any two given dates. > Any help will be deeply appreciated. > Thanks. > > From smarkel at accelrys.com Sat Nov 3 02:01:38 2007 From: smarkel at accelrys.com (Scott Markel) Date: Fri, 2 Nov 2007 23:01:38 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: I set multiple environment variables in my code. $ENV{EMBOSS_ROOT} = $embossPath; $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); $ENV{EMBOSS_DB_DIR} = File::Spec->catdir($embossPath, "test"); $ENV{EMBOSS_DATA} = File::Spec->catdir($embossPath, "data"); $ENV{PATH} = $embossPath; I found it necessary to set both PATH and EMBOSS_ROOT. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > > > > > Thanks for all the suggestions... but I unfortunately still cannot run > > emboss. I am running the latest version of embosswin (2.10.0- > > Win-0.8), > > and the > > path is set correctly. I printed $ENV{$PATH} and this contains > > C:\EMBOSSwin which is the correct location. > > I also tried setting the path directly but I'm not sure how to do > > this, > > so I tried this... > > > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > > > this also did not work. > > > > Also tried printing... > > $fuzznuc->executable() > > > > gave the following error again > > -------------------- WARNING --------------------- > > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > --------------------------------------------------- > > > > Any more ideas ? > > > > thanks ! > > Rohit > > > > > > here's the code... > > > > use strict; > > use Bio::Factory::EMBOSS; > > use Data::Dumper; > > > > # > > # print "PATH=$ENV{PATH}\n"; > > # path contains C:\EMBOSSwin which is the correct location > > # embossversion is 2.10.0-Win-0.8 > > > > my $f = Bio::Factory::EMBOSS->new(); > > # get an EMBOSS application object from the factory > > print Dumper ($f); > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > > fuzznuc.exe > > as well, > > print Dump ($fuzznuc); > > > > #dump of fuzznuc > > #$VAR1 = bless( { > > # '_programgroup' => {}, > > # '_programs' => {}, > > # '_groups' => {} > > # }, 'Bio::Factory::EMBOSS' ); > > > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > > > my $infile = "temp.fasta"; > > my $motif = "ATGTCGATC"; > > my $outfile = "test.out"; > > > > > > $fuzznuc->run( > > { -sequence => $infile, > > -pattern => $motif, > > -outfile => $outfile > > }); > > > > Here's the error again.... > > > > #-------------------- WARNING --------------------- > > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > #--------------------------------------------------- > > > > > > > > > > Jason Stajich wrote: > >> Presumably the PATH is not getting set properly - you should play > >> around printing the $ENV{PATH} variable in a perl script to see if > >> actually contains the directory where the emboss programs are > >> installed. Bioperl can only guess so much as to where to find an > >> application. It is also possible that we aren't creating the proper > >> path to the executable - you can print the executable path with > >> print $fuzznuc->executable > >> I believe unless it is throwing an error at the program() line. > >> > >> It looks like the code in the Factory object is a little fragile > >> assuming that the programs HAVE to be in your $PATH. I don't know if > >> windows+perl is special in any way that it run things so I can't > >> really tell if there is specific things you have to do here. You may > >> have to run this through cygwin in case PATH and such are just not > >> available properly to windowsPerl. > >> > >> -jason > >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> > >>> Dear all, > >>> > >>> I have emboss installed on a windows machine. (Embosswin). I can run > >>> this from the dos command line and the path is present. However, > >>> when I > >>> try to call > >>> an emboss application from bioperl I get a "Application not found > >>> error" > >>> > >>> > >>> my $f = Bio::Factory::EMBOSS->new(); > >>> # get an EMBOSS application object from the factory > >>> my $fuzznuc = $f->program('fuzznuc'); > >>> $fuzznuc->run( > >>> { -sequence => $infile, > >>> -pattern => $motif, > >>> -outfile => $outfile > >>> }); > >>> gives the following error > >>> > >>> -------------------- WARNING --------------------- > >>> MSG: Application [fuzznuc] is not available! > >>> --------------------------------------------------- > >>> Can't call method "run" on an undefined value at > >>> searchPatterns.pl line > >>> 102. > >>> > >>> Can somebody help me fix this ? > >>> > >>> best regards > >>> Rohit > >>> > >>> -- > >>> > >>> Dr. Rohit Ghai > >>> Institute of Medical Microbiology > >>> Faculty of Medicine > >>> Justus-Liebig University > >>> Frankfurter Strasse 107 > >>> 35392 - Giessen > >>> GERMANY > >>> > >>> Tel : 0049 (0)641-9946413 > >>> Fax : 0049 (0)641-9946409 > >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > > > > -- > > > > Dr. Rohit Ghai > > Institute of Medical Microbiology > > Faculty of Medicine > > Justus-Liebig University > > Frankfurter Strasse 107 > > 35392 - Giessen > > GERMANY > > > > Tel : 0049 (0)641-9946413 > > Fax : 0049 (0)641-9946409 > > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Sat Nov 3 10:07:52 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Sat, 03 Nov 2007 15:07:52 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. #however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; Chris Fields wrote: > I did a little investigating using my old PC and was able to get > fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a > hoop or two but I managed to get it working. > > First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. > You need to remove EMBOSSWin and install the one I linked to > previously (this is an actual EMBOSS beta release). It's possible > older EMBOSSWin can be configured, but I don't plan on checking it out > myself. > > Next, you need to ensure the binaries are in your PATH env. variable > (test by running 'wossname' on the command line), then set EMBOSS_DATA > to point at the EMBOSS data directory using a UNIX-like path (i.e. > 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP > recognizes the UNIX'y form as a valid path. If you don't know how to > set env. variables go here: > > http://vlaurie.com/computers2/Articles/environment.htm > > Once that is set up you should be able to run the script using the > latest (greatest?) EMBOSS. > > chris > > On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot run >>>> emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), >>>> and the >>>> path is set correctly. I printed $ENV{$PATH} and this contains >>>> C:\EMBOSSwin which is the correct location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, >>>> so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location >>>> # embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> print Dumper ($f); >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>>> fuzznuc.exe >>>> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >>>> >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable >>>>> I believe unless it is throwing an error at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I don't know if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do here. You may >>>>> have to run this through cygwin in case PATH and such are just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>>>> this from the dos command line and the path is present. However, >>>>>> when I >>>>>> try to call >>>>>> an emboss application from bioperl I get a "Application not found >>>>>> error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>>> line >>>>>> 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> > > From hlapp at gmx.net Sun Nov 4 12:42:13 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 4 Nov 2007 12:42:13 -0500 Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net> Hi Stefanie, sorry for taking so long to respond - your email got buried in a pile while I was away on travel. The Bio::SeqFeature::Gene::* modules were written mostly with the motivation to have a model that can represent the results of gene predictors. GenBank AFAIK doesn't annotate introns explicitly, though they should be implicit from cDNA (or mRNA? or gene, as you say) features on genomic sequence. The Bioperl SeqIO parsers won't transform those into a Bio::SeqFeature::Gene-based model, but instead will yield just plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent processing to build these into more hierarchical models. I'm not sure whether someone's done this already for GenBank-type feature tables. There is a Unflattener that at least attempts to build a feature hierarchy from the flat array that's compliant with the Sequence Ontology (or so I recall). I'm copying the list in case others have additional suggestions. -hilmar On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote: > > > Hello Hilmar, > > I have a question about your bioperl module > Bio::SeqFeature::Gene::Transcript: > > I can't figure out how to generate the $gene object for use in this > line: > @introns = $gene->introns(); > > The data I'm working with is a local file in genbank format, and > I'm interested in extracting intron sequences (and maybe flanking > exons) for certain genes. I have been trying to get the introns via > the sequence features ('CDS' or 'gene'), but this has not been > working. Which approach will I have to take? > I'd be very grateful if you could point me into the right direction! > > Hope things are going well in Durham! And thank you in advance! > > Stefanie > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From downloadondemand at gmail.com Sun Nov 4 13:39:42 2007 From: downloadondemand at gmail.com (download on demand) Date: Sun, 4 Nov 2007 20:39:42 +0200 Subject: [Bioperl-l] Help with Bio::SeqIO Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Hi to all. I have a problem with a simplest script: use Bio::SeqIO; # get command-line arguments, or die with a usage statement my $usage = "x2y.pl infile infileformat outfile outfileformat\n"; my $infile = shift or die $usage; my $infileformat = shift or die $usage; # my $outfile = shift or die $usage; my $outfileformat = shift or die $usage; # create one SeqIO object to read in,and another to write out my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, '-format' => $outfileformat); # write each entry in the input file to the output file while (my $inseq = $seq_in->next_seq) { # $seq_out->write_seq($inseq); # Whole sequence not needed for my $feat_object ($inseq->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->get_tag_values('product'),"\n"; print $feat_object->location->start,"..",$feat_object->location->end,"\n"; print $feat_object->spliced_seq->seq,"\n\n"; } } The result seems OK to me, but in case of first CDS of NC_005213.gbk from here the output is wrong: It is: hypothetical protein 1..490885 TAAATGCGATTGCTATTAGAA..................................Truncated sequence................................... Should be: hypothetical protein 879..490883 ATGCGATTGCTATTAGAA...................................Truncated sequence....................................TAA This CDS have an unnatural location string: CDS complement(join(490883..490885,1..879)), but spliced_seq should handle these things? Please help me! Best regards, N. From cjfields at uiuc.edu Sun Nov 4 19:08:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 4 Nov 2007 18:08:34 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Pass in (-nosort => 1) to spliced_seq: print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; This ensures no sorting of sublocations occurs, if you want for instance typical GenBank/EMBL 'join' behavior. To the other devs: shouldn't -nosort be the default behavior when the split location is a 'join'? In other words, should spliced_seq() be modified to take into account the split location type when returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly indicates the order of the sequences is important when joined together; the current behavior is more like that for 'order'. chris On Nov 4, 2007, at 12:39 PM, download on demand wrote: > Hi to all. > > I have a problem with a simplest script: > > > > use Bio::SeqIO; > # get command-line arguments, or die with a usage statement > my $usage = "x2y.pl infile infileformat outfile > outfileformat\n"; > my $infile = shift or die $usage; > my $infileformat = shift or die $usage; > # my $outfile = shift or die $usage; > my $outfileformat = shift or die $usage; > > # create one SeqIO object to read in,and another to write out > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, > '-format' => $outfileformat); > > # write each entry in the input file to the output file > while (my $inseq = $seq_in->next_seq) { > > # $seq_out->write_seq($inseq); # Whole sequence not needed > > for my $feat_object ($inseq->get_SeqFeatures) > { > if ($feat_object->primary_tag eq "CDS") > { > print $feat_object->get_tag_values('product'),"\n"; > print > $feat_object->location->start,"..",$feat_object->location->end,"\n"; > print $feat_object->spliced_seq->seq,"\n\n"; > } > } > > > > The result seems OK to me, but in case of first CDS of > NC_005213.gbk from > here > the > output is wrong: > > It is: > hypothetical protein > 1..490885 > TAAATGCGATTGCTATTAGAA..................................Truncated > sequence................................... > > Should be: > hypothetical protein > 879..490883 > ATGCGATTGCTATTAGAA...................................Truncated > sequence....................................TAA > > > > This CDS have an unnatural location string: > CDS complement(join(490883..490885,1..879)), but > spliced_seq > should handle these things? > > Please help me! > Best regards, N. > _______________________________________________ > From jean-luc.jany at univ-brest.fr Mon Nov 5 03:26:52 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Mon, 05 Nov 2007 09:26:52 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <472ED3CC.2050305@univ-brest.fr> Dear Bioperl and Mac users, I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables. I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?) Actually, my blast file is in myname directory and comprises a /bin and a /data file. I have got my blastall and other executables in myname/blast/bin/blastall. Thank you in anticipation for your help. Jean-Luc From Rohit.Ghai at mikrobio.med.uni-giessen.de Mon Nov 5 06:36:16 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Mon, 05 Nov 2007 12:36:16 +0100 Subject: [Bioperl-l] bioperl and emboss on windows Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing # # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. # # # # however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; From neetisomaiya at gmail.com Mon Nov 5 07:20:04 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 5 Nov 2007 17:50:04 +0530 Subject: [Bioperl-l] perl question Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Again a perl question, and maybe a very trivial one. How do I terminate a number like 3.1232010098 to only 3 decimal places in perl? -- -Neeti Even my blood says, B positive From biology0046 at hotmail.com Mon Nov 5 07:16:13 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Mon, 05 Nov 2007 12:16:13 +0000 Subject: [Bioperl-l] how to extract intron information from gff files. Message-ID: Dear all: i got a poplar genome gff file like this: LG_I src exon 2598 3280 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 2598 3280 . - 0 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 4 LG_I src start_codon 3278 3280 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src stop_codon 2598 2600 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src exon 3544 3918 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 3544 3918 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 3 LG_I src exon 4258 4740 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 4258 4740 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 2 LG_I src exon 5344 6388 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 5344 6388 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 1 LG_I src exon 8259 8528 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8259 8528 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 3 LG_I src stop_codon 8259 8261 . - 0 name "fgenesh1_pg.C_LG_I000002" LG_I src exon 8897 8987 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8897 8987 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 2 LG_I src exon 9831 9892 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 9831 9892 . - 1 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 1 LG_I src start_codon 9890 9892 . - 0 name "fgenesh1_pg.C_LG_I000002" I try to use Bio::DB::GFF, but this module only applies to methods given in the gff file. what i want to get is "intron, 5utr, 3utr", but this information do not contain in this gff file. how can i get these information through bioperl? This file do not contain intron information if i consider gaps between exons as introns, non cds parts of the first and last exon as utrs, how can i extract them through this gff file. Thanks~~ Wenkai _________________________________________________________________ ?????????????????????????????? MSN Hotmail?? http://www.hotmail.com From spiros at lokku.com Mon Nov 5 07:36:36 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Mon, 5 Nov 2007 12:36:36 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: Hey, use the `sprintf` function. More information can be found at , http://perldoc.perl.org/functions/sprintf.html. For more proper rounding, you could use the Math::Round module, http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm. hope this helps, spiros On 11/5/07, neeti somaiya wrote: > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ak at ebi.ac.uk Mon Nov 5 07:43:06 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 12:43:06 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <20071105124305.GC4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? When displaying: printf( "The number is %.3f\n", $number ); When making a string: my $string = sprintf( "%.3f", $number ); BTW, this is cutting, not rounding. Cheers, Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From t.nugent at cs.ucl.ac.uk Mon Nov 5 07:37:15 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 05 Nov 2007 12:37:15 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F0E7B.60303@cs.ucl.ac.uk> Use Math:Round and nearest_ceil: http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From bix at sendu.me.uk Mon Nov 5 07:47:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 12:47:17 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F10D5.5060006@sendu.me.uk> neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? Please don't use this list to ask general Perl questions. See these instead: http://perldoc.perl.org/perlfaq4.html http://lists.cpan.org/ http://www.perlmonks.org/ $rounded = sprintf("%.3f", $number); From Marc.Logghe at DEVGEN.com Mon Nov 5 07:39:36 2007 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Mon, 5 Nov 2007 13:39:36 +0100 Subject: [Bioperl-l] perl question References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com> Hi, Have a look at http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w idth In your particular case: my $f = 3.1232010098; printf "%0.3f", $f; HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > neeti somaiya > Sent: Monday, November 05, 2007 1:20 PM > To: bioperl-l > Subject: [Bioperl-l] perl question > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 > decimal places in perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Mon Nov 5 08:24:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 13:24:25 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <20071105124305.GC4491@ebi.ac.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> Message-ID: <472F1989.90105@sendu.me.uk> Andreas Kahari wrote: > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: >> Again a perl question, and maybe a very trivial one. >> How do I terminate a number like 3.1232010098 to only 3 decimal places in >> perl? > > When displaying: > > printf( "The number is %.3f\n", $number ); > > When making a string: > > my $string = sprintf( "%.3f", $number ); > > > BTW, this is cutting, not rounding. (s)printf rounds (ie. doesn't simply truncate), though for critical applications you should use your own rounding algorithm. From ak at ebi.ac.uk Mon Nov 5 08:56:24 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 13:56:24 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <472F1989.90105@sendu.me.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk> Message-ID: <20071105135624.GD4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote: > Andreas Kahari wrote: > > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > >> Again a perl question, and maybe a very trivial one. > >> How do I terminate a number like 3.1232010098 to only 3 decimal places in > >> perl? > > > > When displaying: > > > > printf( "The number is %.3f\n", $number ); > > > > When making a string: > > > > my $string = sprintf( "%.3f", $number ); > > > > > > BTW, this is cutting, not rounding. > > (s)printf rounds (ie. doesn't simply truncate), though for critical > applications you should use your own rounding algorithm. They do indeed. Mea culpa. Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From jay at jays.net Mon Nov 5 10:14:17 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 10:14:17 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > To the other devs: shouldn't -nosort be the default behavior when the > split location is a 'join'? I certainly think so. > In other words, should spliced_seq() be > modified to take into account the split location type when returning > sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly > indicates the order of the sequences is important when joined > together; the current behavior is more like that for 'order'. I don't see any value to the sorting algorithm. All tests invoke - nosort => 1 (except a phase test where nosort doesn't matter anyway). In my limited experience the sorting only serves to break real-world splicing. If there is no valid use then we can remove ~20 lines from SeqFeatureI.pm circa line 505. If there is a valid use and someone would be so kind as to educate me I'd be happy to add tests which demonstrate them. :) P.S. CSHL is neato. I plan on understanding some of this stuff some day. :) j http://www.bioperl.org/wiki/User:Jhannah From hlapp at duke.edu Mon Nov 5 11:03:16 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 11:03:16 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: I agree that there should be a meaningful default that results in "doing the right thing" in most cases if the user doesn't intervene. I'm not sure I understand all the details, but it sounds sorting or not sorting should depend on the split location type unless the user overrides it by argument. That's what you're suggesting, right? -hilmar On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > Pass in (-nosort => 1) to spliced_seq: > > print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; > > This ensures no sorting of sublocations occurs, if you want for > instance typical GenBank/EMBL 'join' behavior. > > To the other devs: shouldn't -nosort be the default behavior when > the split location is a 'join'? In other words, should spliced_seq > () be modified to take into account the split location type when > returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' > explicitly indicates the order of the sequences is important when > joined together; the current behavior is more like that for 'order'. > > chris > > On Nov 4, 2007, at 12:39 PM, download on demand wrote: > >> Hi to all. >> >> I have a problem with a simplest script: >> >> >> >> use Bio::SeqIO; >> # get command-line arguments, or die with a usage statement >> my $usage = "x2y.pl infile infileformat outfile >> outfileformat\n"; >> my $infile = shift or die $usage; >> my $infileformat = shift or die $usage; >> # my $outfile = shift or die $usage; >> my $outfileformat = shift or die $usage; >> >> # create one SeqIO object to read in,and another to write >> out >> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >> '-format' => $infileformat); >> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >> '-format' => $outfileformat); >> >> # write each entry in the input file to the output file >> while (my $inseq = $seq_in->next_seq) { >> >> # $seq_out->write_seq($inseq); # Whole sequence not needed >> >> for my $feat_object ($inseq->get_SeqFeatures) >> { >> if ($feat_object->primary_tag eq "CDS") >> { >> print $feat_object->get_tag_values('product'),"\n"; >> print >> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >> print $feat_object->spliced_seq->seq,"\n\n"; >> } >> } >> >> >> >> The result seems OK to me, but in case of first CDS of >> NC_005213.gbk from >> here > Nanoarchaeum_equitans/> the >> output is wrong: >> >> It is: >> hypothetical protein >> 1..490885 >> TAAATGCGATTGCTATTAGAA..................................Truncated >> sequence................................... >> >> Should be: >> hypothetical protein >> 879..490883 >> ATGCGATTGCTATTAGAA...................................Truncated >> sequence....................................TAA >> >> >> >> This CDS have an unnatural location string: >> CDS complement(join(490883..490885,1..879)), but >> spliced_seq >> should handle these things? >> >> Please help me! >> Best regards, N. >> _______________________________________________ >> > > > From bernd.web at gmail.com Mon Nov 5 11:53:01 2007 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 5 Nov 2007 17:53:01 +0100 Subject: [Bioperl-l] PSI-BLAST Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com> Hi, Is it possible with SearchIO to select a specific iteration (Results from round i) part of the PSI-blast report, when parsing this with SearchIO::blast? It seems the parser parses the complete report. If not implemented I could of course extract the specific part of the psi-blast report and then give it too SearchIO (e.g. with IO::String), but maybe I am missing a built-in option? Regards, Bernd From jay at jays.net Mon Nov 5 11:54:13 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 11:54:13 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? If someone knows why spliced_seq() should ever sort then I'm suggesting we add a test demonstrating a useful example of that. If no one has a useful example of when you would want spliced_seq() to sort then I'm suggesting we remove the sorting altogether and nosort goes away. I can provide/add many examples where sorting is bad. I do not know of a case where sorting is good. j http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Mon Nov 5 12:07:10 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Nov 2007 12:07:10 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: At one point the location order was not respected/saved I believe. I guess we will just assume the user will build up a SplitLocation in order (i.e. add_SubLocation). I'll try and remember if there were any other particular reasons. -jason On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar > > On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > >> Pass in (-nosort => 1) to spliced_seq: >> >> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >> >> This ensures no sorting of sublocations occurs, if you want for >> instance typical GenBank/EMBL 'join' behavior. >> >> To the other devs: shouldn't -nosort be the default behavior when >> the split location is a 'join'? In other words, should spliced_seq >> () be modified to take into account the split location type when >> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >> explicitly indicates the order of the sequences is important when >> joined together; the current behavior is more like that for 'order'. >> >> chris >> >> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >> >>> Hi to all. >>> >>> I have a problem with a simplest script: >>> >>> >>> >>> use Bio::SeqIO; >>> # get command-line arguments, or die with a usage statement >>> my $usage = "x2y.pl infile infileformat outfile >>> outfileformat\n"; >>> my $infile = shift or die $usage; >>> my $infileformat = shift or die $usage; >>> # my $outfile = shift or die $usage; >>> my $outfileformat = shift or die $usage; >>> >>> # create one SeqIO object to read in,and another to write >>> out >>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>> '-format' => $infileformat); >>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>> '-format' => $outfileformat); >>> >>> # write each entry in the input file to the output file >>> while (my $inseq = $seq_in->next_seq) { >>> >>> # $seq_out->write_seq($inseq); # Whole sequence not >>> needed >>> >>> for my $feat_object ($inseq->get_SeqFeatures) >>> { >>> if ($feat_object->primary_tag eq "CDS") >>> { >>> print $feat_object->get_tag_values('product'),"\n"; >>> print >>> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >>> print $feat_object->spliced_seq->seq,"\n\n"; >>> } >>> } >>> >>> >>> >>> The result seems OK to me, but in case of first CDS of >>> NC_005213.gbk from >>> here >> Nanoarchaeum_equitans/> the >>> output is wrong: >>> >>> It is: >>> hypothetical protein >>> 1..490885 >>> TAAATGCGATTGCTATTAGAA..................................Truncated >>> sequence................................... >>> >>> Should be: >>> hypothetical protein >>> 879..490883 >>> ATGCGATTGCTATTAGAA...................................Truncated >>> sequence....................................TAA >>> >>> >>> >>> This CDS have an unnatural location string: >>> CDS complement(join(490883..490885,1..879)), but >>> spliced_seq >>> should handle these things? >>> >>> Please help me! >>> Best regards, N. >>> _______________________________________________ >>> >> >> >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Mon Nov 5 12:16:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:16:10 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Yes, we would sort based on the splittype() and default to a particular behavior ('join') if one isn't designated, maybe with a warning indicating the splittype() isn't defined. Using an 'order' or other defined types could also delineate a default sort/nosort behavior (probably the previous as it would replicate prior behavior). chris On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar From cjfields at uiuc.edu Mon Nov 5 12:20:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:20:35 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu> On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote: > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? > > If someone knows why spliced_seq() should ever sort then I'm > suggesting we add a test demonstrating a useful example of that. > > If no one has a useful example of when you would want spliced_seq() > to sort then I'm suggesting we remove the sorting altogether and > nosort goes away. > > I can provide/add many examples where sorting is bad. I do not know > of a case where sorting is good. > > j > http://www.bioperl.org/wiki/User:Jhannah The behavior would be based on the current use of 'join', 'order', and 'bond' (the latter in GenPept records). I documented some cases here a while back: http://www.bioperl.org/wiki/BioPerl_Locations#Split chris From hlapp at duke.edu Mon Nov 5 12:32:24 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 12:32:24 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu> Sounds good to me. -hilmar On Nov 5, 2007, at 12:16 PM, Chris Fields wrote: > Yes, we would sort based on the splittype() and default to a > particular behavior ('join') if one isn't designated, maybe with a > warning indicating the splittype() isn't defined. Using an 'order' > or other defined types could also delineate a default sort/nosort > behavior (probably the previous as it would replicate prior behavior). > > chris > > On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 12:41:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:41:27 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: It may have something to do with remote locations or setting strand() in sublocations. This may have popped up in relation to a LocationI code audit I proposed a while back on the list which I never got around to. Oh well... I at least managed getting a wiki page started in case we decided to make changes, with the intention of making it a HOWTO at some point: http://www.bioperl.org/wiki/BioPerl_Locations If we go through with the changes to spliced_seq(), should it be implemented for inclusion in v1.6 or wait until v1.7? chris On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote: > > At one point the location order was not respected/saved I believe. > I guess we will just assume the user will build up a SplitLocation > in order (i.e. add_SubLocation). I'll try and remember if there > were any other particular reasons. > > > -jason > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar >> >> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: >> >>> Pass in (-nosort => 1) to spliced_seq: >>> >>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >>> >>> This ensures no sorting of sublocations occurs, if you want for >>> instance typical GenBank/EMBL 'join' behavior. >>> >>> To the other devs: shouldn't -nosort be the default behavior when >>> the split location is a 'join'? In other words, should spliced_seq >>> () be modified to take into account the split location type when >>> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >>> explicitly indicates the order of the sequences is important when >>> joined together; the current behavior is more like that for 'order'. >>> >>> chris >>> >>> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >>> >>>> Hi to all. >>>> >>>> I have a problem with a simplest script: >>>> >>>> >>>> >>>> use Bio::SeqIO; >>>> # get command-line arguments, or die with a usage >>>> statement >>>> my $usage = "x2y.pl infile infileformat outfile >>>> outfileformat\n"; >>>> my $infile = shift or die $usage; >>>> my $infileformat = shift or die $usage; >>>> # my $outfile = shift or die $usage; >>>> my $outfileformat = shift or die $usage; >>>> >>>> # create one SeqIO object to read in,and another to write >>>> out >>>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>>> '-format' => $infileformat); >>>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>>> '-format' => >>>> $outfileformat); >>>> >>>> # write each entry in the input file to the output file >>>> while (my $inseq = $seq_in->next_seq) { >>>> >>>> # $seq_out->write_seq($inseq); # Whole sequence not >>>> needed >>>> >>>> for my $feat_object ($inseq->get_SeqFeatures) >>>> { >>>> if ($feat_object->primary_tag eq "CDS") >>>> { >>>> print $feat_object->get_tag_values('product'),"\n"; >>>> print >>>> $feat_object->location->start,"..",$feat_object->location- >>>> >end,"\n"; >>>> print $feat_object->spliced_seq->seq,"\n\n"; >>>> } >>>> } >>>> >>>> >>>> >>>> The result seems OK to me, but in case of first CDS of >>>> NC_005213.gbk from >>>> here >>> Nanoarchaeum_equitans/> the >>>> output is wrong: >>>> >>>> It is: >>>> hypothetical protein >>>> 1..490885 >>>> TAAATGCGATTGCTATTAGAA..................................Truncated >>>> sequence................................... >>>> >>>> Should be: >>>> hypothetical protein >>>> 879..490883 >>>> ATGCGATTGCTATTAGAA...................................Truncated >>>> sequence....................................TAA >>>> >>>> >>>> >>>> This CDS have an unnatural location string: >>>> CDS complement(join(490883..490885,1..879)), but >>>> spliced_seq >>>> should handle these things? >>>> >>>> Please help me! >>>> Best regards, N. >>>> _______________________________________________ >>>> >>> >>> >>> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Mon Nov 5 11:05:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 05 Nov 2007 12:05:41 -0400 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: <472ED3CC.2050305@univ-brest.fr> Message-ID: Jean-luc, >From what you written it sounds like you're using bash and not some other shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file in your home directory, as well as a .ncbirc file. This should work. I'm no Unix expert but I've always configured tcsh on the Mac in the same ways I'd configure it on Linux machines. Similarly, if you're using bash then it will read its .bashrc file, regardless of what flavor of Unix you use (and the same thing holds true for zsh or csh or ...). Brian O. On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > Dear Bioperl and Mac users, > > I am a Mac user and would like to run a script I made using > Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate > to Bioperl the pathway to Blastall and other executables. > > I read carefully the following link > http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the > path to Blast, but I guess the way to proceed is slightly different in Mac and > that I should not create .ncbirc and .bashrc files (e.g. should I modify the > .profile file instead of .bashrc?) > > Actually, my blast file is in myname directory and comprises a /bin and a > /data file. I have got my blastall and other executables in > myname/blast/bin/blastall. > > Thank you in anticipation for your help. > > Jean-Luc > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Nov 5 13:35:56 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 05 Nov 2007 12:35:56 -0600 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: References: Message-ID: <472F628C.2000506@campus.iztacala.unam.mx> If the ~/.bashrc file doesn't work for you, try renaming it to ~/.bash_profile and re-login, that might work best. ~/.bashrc works as an individual per-interactive-shell startup file, whereas ~/.bash_profile is a personal initialization file, executed for login shells. Hope this helps. Regards, Mauricio. Brian Osborne wrote: > Jean-luc, > >>From what you written it sounds like you're using bash and not some other > shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file > in your home directory, as well as a .ncbirc file. This should work. > > I'm no Unix expert but I've always configured tcsh on the Mac in the same > ways I'd configure it on Linux machines. Similarly, if you're using bash > then it will read its .bashrc file, regardless of what flavor of Unix you > use (and the same thing holds true for zsh or csh or ...). > > Brian O. > > > On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > >> Dear Bioperl and Mac users, >> >> I am a Mac user and would like to run a script I made using >> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate >> to Bioperl the pathway to Blastall and other executables. >> >> I read carefully the following link >> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the >> path to Blast, but I guess the way to proceed is slightly different in Mac and >> that I should not create .ncbirc and .bashrc files (e.g. should I modify the >> .profile file instead of .bashrc?) >> >> Actually, my blast file is in myname directory and comprises a /bin and a >> /data file. I have got my blastall and other executables in >> myname/blast/bin/blastall. >> >> Thank you in anticipation for your help. >> >> Jean-Luc >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at duke.edu Mon Nov 5 16:04:11 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 16:04:11 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > If we go through with the changes to spliced_seq(), should it be > implemented for inclusion in v1.6 or wait until v1.7? I would say they should be implemented ASAP because they 1) should not change behavior for those for which the current default behavior was already broken (and who therefore pass in --no_sort), and 2) fix the behavior for those who erroneously assumed that the code was going to do the right thing by default. I.e., it sounds mostly like a bugfix to me. Am I overlooking something? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 17:12:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 16:12:23 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu> On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote: > > On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > >> If we go through with the changes to spliced_seq(), should it be >> implemented for inclusion in v1.6 or wait until v1.7? > > I would say they should be implemented ASAP because they 1) should > not change behavior for those for which the current default > behavior was already broken (and who therefore pass in --no_sort), > and 2) fix the behavior for those who erroneously assumed that the > code was going to do the right thing by default. > > I.e., it sounds mostly like a bugfix to me. Am I overlooking > something? > > -hilmar > -- Okay; I'll try to get this in soon. chris From jean-luc.jany at univ-brest.fr Tue Nov 6 04:00:07 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Tue, 06 Nov 2007 10:00:07 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <47302D17.2030500@univ-brest.fr> Thanks Brian. Yes I use bash. I am going to follow your advice as soon as possible (for some reasons I am unable to run bioperl) and come back to you to tell you if it runs. Jean-Luc From jason at bioperl.org Tue Nov 6 16:18:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 16:18:35 -0500 Subject: [Bioperl-l] lightweight sequence features Message-ID: I started a branch for implementing and playing with lightweight feature object. The branch is called 'lightweight_feature_branch'. Right now it is about 70% faster just in object creation based on parsing features using Bio::Tools::GFF and swapping the types of features that are created. It uses arrays instead of hashes under the hood. So the objects don't have locations under the hood. My hope is if this works okay we could use it for creating objects where we KNOW the underlying features have simple locations so such as parsing in GFF data. -jason -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Tue Nov 6 16:57:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Nov 2007 15:57:17 -0600 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: References: Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Bravo! I once benchmarked Location instance creation once and found it contributed quite a bit of overhead so the speedup with that and the use of arrays makes quite a bit of sense to me. You mention only simple locations; I'm guessing this doesn't handle 'fuzzy' ends? If it did I could see layering the feature data from the get-go, so it could be used just about anywhere in the place of SF::Generic. Maybe something to test out in 1.7? chris On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > I started a branch for implementing and playing with lightweight > feature object. The branch is called 'lightweight_feature_branch'. > > Right now it is about 70% faster just in object creation based on > parsing features using Bio::Tools::GFF and swapping the types of > features that are created. It uses arrays instead of hashes under > the hood. > > So the objects don't have locations under the hood. My hope is if > this works okay we could use it for creating objects where we KNOW > the underlying features have simple locations so such as parsing in > GFF data. > > -jason > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue Nov 6 23:14:55 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 23:14:55 -0500 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> References: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Message-ID: Right - only for simple locations. I've got a bunch more tests and fixes to put in. I am hoping this can be fast replacement in the case where we're dealing with this "unflattened" data (i.e. GFF in FeatureIO & Gbrowse). This is sort of a playground until I feel like it can really get it tested a bit more. I'll give an all clear when the dust settles in terms of the design if anyone wants to play/help. -jason On Nov 6, 2007, at 4:57 PM, Chris Fields wrote: > Bravo! I once benchmarked Location instance creation once and > found it contributed quite a bit of overhead so the speedup with > that and the use of arrays makes quite a bit of sense to me. > > You mention only simple locations; I'm guessing this doesn't handle > 'fuzzy' ends? If it did I could see layering the feature data from > the get-go, so it could be used just about anywhere in the place of > SF::Generic. Maybe something to test out in 1.7? > > chris > > On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > >> I started a branch for implementing and playing with lightweight >> feature object. The branch is called 'lightweight_feature_branch'. >> >> Right now it is about 70% faster just in object creation based on >> parsing features using Bio::Tools::GFF and swapping the types of >> features that are created. It uses arrays instead of hashes under >> the hood. >> >> So the objects don't have locations under the hood. My hope is if >> this works okay we could use it for creating objects where we KNOW >> the underlying features have simple locations so such as parsing in >> GFF data. >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki at sanbi.ac.za Wed Nov 7 05:05:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 7 Nov 2007 12:05:59 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Mdust Message-ID: <200711071205.59576.heikki@sanbi.ac.za> Hi Donald, I started using your Mdust module in bioperl-run and run into problems immediately. * Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects, although the docs say otherwise * Sequences are modified in place. That is really bad, because that means that the user has to know to create a copy before running Mdust on it. * The docs say that you have to set MDUSTDIR envvar to tell the program where to find the binary. That is actually optional if the binary is on your path. * The tests do not cover any of the options to the program As a quick fix, I suggest that we: * leave the current way of working for Bio::SeqI objects: sequence string is not masked but seqfeatures to that effect are added * Modify run() to return the new masked sequence object when the target is a Bio::PrimarySeqI. * fix the documentation After that it will be possible to simply write: use Bio::Tools::Run::Mdust; $mdust = Bio::Tools::Run::Mdust->new(); $seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI); Are you happy for me to do this or do you want to do it yourself? Yours, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho _/_/_/_/_/ heikki at_sanbi _ac _za skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Kevin.M.Brown at asu.edu Wed Nov 7 13:04:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 7 Nov 2007 11:04:50 -0700 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> I installed bioperl-ext from CVS, but can't figure out what else is missing to utilize Bio::Tools::pSW. The error I get from the example script in the wiki is: The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. Compilation failed in require at ./align_test.pl line 3. BEGIN failed--compilation aborted at ./align_test.pl line 3. In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called Align, but no Align.pm file. I followed the directions in the wiki to install 1.5.2_102 (think I had _100 installed previously). Any thoughts on what I'm missing? From jason at bioperl.org Wed Nov 7 14:52:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 14:52:16 -0500 Subject: [Bioperl-l] (no subject) Message-ID: The array-based Bio::SeqFeature::Slim is only about 7% faster than Bio::Graphics::Feature so I suspect most of the speedup comes from removing location objects. Generic 6.75 -- -37% -41% GraphicsF 4.26 58% -- -7% Slim 3.98 70% 7% -- this is using code on the lightweight_feature_branch so cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r lightweight_feature_branch -d core_lwf bioperl-live http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl and the GFF3 file I used to parse http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 -jason From lstein at cshl.edu Wed Nov 7 15:04:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Nov 2007 15:04:24 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: References: Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> I wonder if it is worth moving to the array-based version more generally, then. How does the array based feature object deal with tags? Lincoln On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > The array-based Bio::SeqFeature::Slim is only about 7% faster than > Bio::Graphics::Feature so I suspect most of the speedup comes from removing > location objects. > > Generic 6.75 -- -37% -41% > GraphicsF 4.26 58% -- -7% > Slim 3.98 70% 7% -- > > this is using code on the lightweight_feature_branch so > cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r > lightweight_feature_branch -d core_lwf bioperl-live > > http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl > and the GFF3 file I used to parse > http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 > > -jason > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Wed Nov 7 15:09:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 15:09:35 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> It uses hashes there so technically it is not entirely array based. -jason On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > I wonder if it is worth moving to the array-based version more > generally, > then. > > How does the array based feature object deal with tags? > > Lincoln > > On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > >> The array-based Bio::SeqFeature::Slim is only about 7% faster than >> Bio::Graphics::Feature so I suspect most of the speedup comes from >> removing >> location objects. >> >> Generic 6.75 -- -37% -41% >> GraphicsF 4.26 58% -- -7% >> Slim 3.98 70% 7% -- >> >> this is using code on the lightweight_feature_branch so >> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >> lightweight_feature_branch -d core_lwf bioperl-live >> >> http://jason.open-bio.org/~jason/bioperl/ >> seqfeature_speed.pl> seqfeature_speed.pl> >> and the GFF3 file I used to parse >> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >> >> -jason >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Nov 7 16:12:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 15:12:35 -0600 Subject: [Bioperl-l] (no subject) In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> I can see preferring a lightweight simple SF over SF::Generic in the next BioPerl dev cycle. I guess we would just layer split locations as simple sub-features/segments, typing when necessary? That shouldn't be much more overhead than creating a layered Location::Split. chris On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > It uses hashes there so technically it is not entirely array based. > > -jason > On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > >> I wonder if it is worth moving to the array-based version more >> generally, >> then. >> >> How does the array based feature object deal with tags? >> >> Lincoln >> >> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >> >>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>> removing >>> location objects. >>> >>> Generic 6.75 -- -37% -41% >>> GraphicsF 4.26 58% -- -7% >>> Slim 3.98 70% 7% -- >>> >>> this is using code on the lightweight_feature_branch so >>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>> lightweight_feature_branch -d core_lwf bioperl-live >>> >>> http://jason.open-bio.org/~jason/bioperl/ >>> seqfeature_speed.pl>> seqfeature_speed.pl> >>> and the GFF3 file I used to parse >>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>> >>> -jason >>> >> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Wed Nov 7 18:19:15 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 7 Nov 2007 18:19:15 -0500 Subject: [Bioperl-l] lightweight features In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: It seems to me that there are applications where you're dealing with a huge number of features (such as GFF) and where therefore a lightweight object makes tremendous sense. But when you parse a genbank file, I'm not sure that's the bottleneck, unless maybe it's a large contig with lots of feature annotations. I guess we'll ultimately want a way to control the type of feature being instantiated by a parser, e..g using a factory. -hilmar On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > I can see preferring a lightweight simple SF over SF::Generic in the > next BioPerl dev cycle. I guess we would just layer split locations > as simple sub-features/segments, typing when necessary? That > shouldn't be much more overhead than creating a layered > Location::Split. > > chris > > On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > >> It uses hashes there so technically it is not entirely array based. >> >> -jason >> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >> >>> I wonder if it is worth moving to the array-based version more >>> generally, >>> then. >>> >>> How does the array based feature object deal with tags? >>> >>> Lincoln >>> >>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>> >>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>> removing >>>> location objects. >>>> >>>> Generic 6.75 -- -37% -41% >>>> GraphicsF 4.26 58% -- -7% >>>> Slim 3.98 70% 7% -- >>>> >>>> this is using code on the lightweight_feature_branch so >>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>>> lightweight_feature_branch -d core_lwf bioperl-live >>>> >>>> http://jason.open-bio.org/~jason/bioperl/ >>>> seqfeature_speed.pl>>> seqfeature_speed.pl> >>>> and the GFF3 file I used to parse >>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>> >>>> -jason >>>> >>> >>> >>> >>> -- >>> Lincoln D. Stein >>> Cold Spring Harbor Laboratory >>> 1 Bungtown Road >>> Cold Spring Harbor, NY 11724 >>> (516) 367-8380 (voice) >>> (516) 367-8389 (fax) >>> FOR URGENT MESSAGES & SCHEDULING, >>> PLEASE CONTACT MY ASSISTANT, >>> SANDRA MICHELSEN, AT michelse at cshl.edu >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Nov 7 20:04:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 19:04:05 -0600 Subject: [Bioperl-l] lightweight features In-Reply-To: References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: I'm also thinking a factory is a good possibility; maybe something to take the place of FTHelper. chris On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote: > It seems to me that there are applications where you're dealing with > a huge number of features (such as GFF) and where therefore a > lightweight object makes tremendous sense. But when you parse a > genbank file, I'm not sure that's the bottleneck, unless maybe it's a > large contig with lots of feature annotations. > > I guess we'll ultimately want a way to control the type of feature > being instantiated by a parser, e..g using a factory. > > -hilmar > > On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > >> I can see preferring a lightweight simple SF over SF::Generic in the >> next BioPerl dev cycle. I guess we would just layer split locations >> as simple sub-features/segments, typing when necessary? That >> shouldn't be much more overhead than creating a layered >> Location::Split. >> >> chris >> >> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: >> >>> It uses hashes there so technically it is not entirely array based. >>> >>> -jason >>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >>> >>>> I wonder if it is worth moving to the array-based version more >>>> generally, >>>> then. >>>> >>>> How does the array based feature object deal with tags? >>>> >>>> Lincoln >>>> >>>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>>> >>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>>> removing >>>>> location objects. >>>>> >>>>> Generic 6.75 -- -37% -41% >>>>> GraphicsF 4.26 58% -- -7% >>>>> Slim 3.98 70% 7% -- >>>>> >>>>> this is using code on the lightweight_feature_branch so >>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl >>>>> co -r >>>>> lightweight_feature_branch -d core_lwf bioperl-live >>>>> >>>>> http://jason.open-bio.org/~jason/bioperl/ >>>>> seqfeature_speed.pl>>>> seqfeature_speed.pl> >>>>> and the GFF3 file I used to parse >>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>>> >>>>> -jason >>>>> >>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Cold Spring Harbor Laboratory >>>> 1 Bungtown Road >>>> Cold Spring Harbor, NY 11724 >>>> (516) 367-8380 (voice) >>>> (516) 367-8389 (fax) >>>> FOR URGENT MESSAGES & SCHEDULING, >>>> PLEASE CONTACT MY ASSISTANT, >>>> SANDRA MICHELSEN, AT michelse at cshl.edu >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 7 23:45:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 22:45:26 -0600 Subject: [Bioperl-l] test please ignore Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> From cjfields at uiuc.edu Thu Nov 8 10:50:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Nov 2007 09:50:02 -0600 Subject: [Bioperl-l] test please ignore In-Reply-To: <47332534.5090205@bms.com> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> <47332534.5090205@bms.com> Message-ID: And respond back! Just checking the mail list; the open-bio wiki pages were down last night. chris On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote: > Chris Fields wrote: >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > This is the best way to make everyone open this e-mail ;-) > Stefan From stefan.kirov at bms.com Thu Nov 8 10:03:16 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 08 Nov 2007 10:03:16 -0500 Subject: [Bioperl-l] test please ignore In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> Message-ID: <47332534.5090205@bms.com> Chris Fields wrote: > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > This is the best way to make everyone open this e-mail ;-) Stefan From Kevin.M.Brown at asu.edu Thu Nov 8 17:30:24 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Nov 2007 15:30:24 -0700 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org> References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> <20071108003638.GA5892@eniac.jgi-psf.org> Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu> OK, found the issue. For whatever reason the Align.pm file is inside the Align folder and so the package name and path don't match up once it is installed. This would cause it to have a name of "Bio::Ext::Align::Align" instead of "Bio::Ext::Align". Not sure why this wasn't caught when I did "perl Makefile.pl && make && make test && make install" > -----Original Message----- > From: Joel Martin [mailto:j_martin at lbl.gov] > Sent: Wednesday, November 07, 2007 5:37 PM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Ext::Align? > > Hello, > Might be a side effect of fixing the other bioperl-ext package, > what steps exactly did this entail: > > > I installed bioperl-ext from CVS, > > ? > > you can probably bypass it at the moment by doing this after > unpacking the > bioperl-ext package > > cd Bio/Ext/Align > perl Makefile.PL > make > make test > make install > > and > > cd Bio/Ext/HMM > perl Makefile.PL > make > make test > make install > > Joel > > but can't figure out what else is > > missing to utilize Bio::Tools::pSW. The error I get from > the example > > script in the wiki is: > > > > The C-compiled engine for Smith Waterman alignments > (Bio::Ext::Align) > > has not been installed. > > Please read the install the bioperl-ext package > > > > BEGIN failed--compilation aborted at > > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. > > Compilation failed in require at ./align_test.pl line 3. > > BEGIN failed--compilation aborted at ./align_test.pl line 3. > > > > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called > > Align, but no Align.pm file. > > > > I followed the directions in the wiki to install 1.5.2_102 > (think I had > > _100 installed previously). Any thoughts on what I'm missing? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From akarger at CGR.Harvard.edu Fri Nov 9 09:53:02 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 9 Nov 2007 09:53:02 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? Message-ID: When I tblastn ENSP00000349467 against the human genome, I get a few hits on chr10, among which are: Score = 192 bits (487), Expect(2) = 5e-64 Identities = 99/109 (90%), Positives = 99/109 (90%) Frame = +2 Query: 40 LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99 L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F VFDKDGNG Sbjct: 71593562 LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741 Query: 100 YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148 YIS EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA 71593885 Score = 75.1 bits (183), Expect(2) = 5e-64 Identities = 36/43 (83%), Positives = 39/43 (90%) Frame = +1 Query: 1 MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43 MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS ++ Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575 As you can see from Sbjct lines, these two hits are basically contiguous. I was surprised to see that the bit scores and identities and alignment lengths here are totally different but the expectation values are identical. After a bit of grepping in the BLAST source, I found reference to "sum segments" and "a collection [of] multiple distinct alignments with asymmetric gaps between the alignments" and decided it was time to cry for help. When does BLAST decide that two or more alignments belong "together" and how does the affect the evalue? Is the evalue really showing how good those two alignments combined are, despite the frame shift? (It so happens that that's what I want.) And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University From cjfields at uiuc.edu Fri Nov 9 12:58:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Nov 2007 11:58:16 -0600 Subject: [Bioperl-l] GFF3loader and indexing Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu> Quick question: shouldn't the new Index attribute be passed on to seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping purposes (for instance, properly reloading dumped gff3 data)? I'm testing out a feature editor using volvox.gff3 data in GBrowse and the mRNA features appear to drop this attribute once loaded: Original data: ctgA example gene 1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN. 1;Note=Eden splice form 1;Index=1 ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.1 partial gff3_string(1) output: ctgA example gene 1050 9000 . + . Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . Name=EDEN. 1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1 ctgA example five_prime_UTR 1050 1200 . + . Parent=51;ID=52 ... chris From David.Messina at sbc.su.se Sat Nov 10 06:04:25 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 10 Nov 2007 12:04:25 +0100 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave From sac at bioperl.org Sat Nov 10 17:59:28 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Nov 2007 14:59:28 -0800 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> The Bioperl blast parser should extract that value and you can obtain it from an HSP object, via the HSPI::n() method, documented here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23 Dave's basically correct in his explanation. It's a result of the application of sum statistics by the blast algorithm. You can read all about it in Korf et al's BLAST book. Here's the relevant section: http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1 Steve On Nov 10, 2007 3:04 AM, Dave Messina wrote: > Hi Amir, > > I don't have my BLAST book handy, and my memory is a little fuzzy, but I > think the Expect(2) you're seeing is the E-value based on both HSPs > combined. And I think this is why you see the same Expect value for both -- > because it is shared between them (which sounds like what you wanted). > > Again, this is just from memory, but I think this is an option that has to > be turned on rather than something which Blast decides to do on its own. > > > I don't know whether BioPerl reports this or not. Would you mind e-mailing > me a entire BLAST report as a sample? When I have some time I'd like to play > around with this a bit. > > Thanks, > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Tue Nov 13 06:57:04 2007 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 13 Nov 2007 12:57:04 +0100 Subject: [Bioperl-l] Panel link Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com> Hi, Is it possible with Panel to provide javascript event handlers? With -link we can provide hrefs as: -link => 'http://www.google.com/search?q=$description' or use a coderef that returns a href. However, I'd like to set-up links as: Is this possible by default with Panel? Regards, Bernd From akarger at CGR.Harvard.edu Tue Nov 13 12:12:32 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:12:32 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: Thanks for the reply. I'm curious as to how BLAST decides to do this, but not curious enough to buy the BLAST book. If you want to see this, you could just tblastn the ENSP00000349467 sequence vs. the genome: MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE EVDEMIREADIDGDGQVNYEEFVQMMTAK against the human genome at NCBI or locally. I've attached the tblastn report for that protein, which includes the results I quoted. (It was done as part of a blast of 150 proteins vs. the genome.) -Amir ________________________________ From: dave at davemessina.com [mailto:dave at davemessina.com] On Behalf Of Dave Messina Sent: Saturday, November 10, 2007 6:04 AM To: Amir Karger Cc: bioperl-l Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: ENSP00000349467_tblastn.txt.gz Type: application/x-gzip Size: 9755 bytes Desc: ENSP00000349467_tblastn.txt.gz URL: From akarger at CGR.Harvard.edu Tue Nov 13 12:30:52 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:30:52 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: > From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > Of Steve Chervitz > > The Bioperl blast parser should extract that value and you can obtain > it from an HSP object, via the HSPI::n() method, documented here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B io/Search/HSP/HSPI.html#POD23 As I mentioned in my email: And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) And the docs for n() actually say, "This value is not defined with NCBI Blast2 with gapping" although they don't say why. Which may explain why, when I ran the following code on the blast result I included in my last email, I got empty values for all of the n's. (Why is n() undefined for gapped blast if I'm getting n's in my results from that blast?) use warnings; use strict; use Bio::SearchIO; my $blast_out = $ARGV[0]; my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_out, -report_type => 'tblastn'); print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N Evalue)), "\n"; while(my $query = $in->next_result) { while(my $subject = $query->next_hit) { while (my $hsp = $subject->next_hsp) { print join("\t", $query->query_name, $hsp->start("query"), $hsp->end("query"), $hsp->strand("hit"), $subject->name, $hsp->start("hit"), $hsp->end("hit"), $subject->frame, $hsp->n, $hsp->evalue, ),"\n"; } } } > Dave's basically correct in his explanation. It's a result of the > application of sum statistics by the blast algorithm. You can read all > about it in Korf et al's BLAST book. Here's the relevant section: [snip] Thanks, -Amir From cjfields at uiuc.edu Tue Nov 13 12:42:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Nov 2007 11:42:07 -0600 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Amir, Can you file this as a bug? Dave mentioned he would look into it but I think it warrants tracking to make sure it gets fixed: http://www.bioperl.org/wiki/Bugs Attach the example BLAST report from your last post to the report. BTW, I wonder how this appears in XML output? chris On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf >> Of Steve Chervitz >> >> The Bioperl blast parser should extract that value and you can obtain >> it from an HSP object, via the HSPI::n() method, documented here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/Search/HSP/HSPI.html#POD23 > > As I mentioned in my email: > > And does anyone know off-hand if Bioperl will tell me when situations > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > subroutine > would help, but I just get a bunch of empty strings for that, > whether or > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > {"_n"} is > undef.) > > And the docs for n() actually say, "This value is not defined with > NCBI > Blast2 with gapping" although they don't say why. Which may explain > why, > when I ran the following code on the blast result I included in my > last > email, I got empty values for all of the n's. (Why is n() undefined > for > gapped blast if I'm getting n's in my results from that blast?) > > use warnings; > use strict; > use Bio::SearchIO; > > my $blast_out = $ARGV[0]; > my $in = new Bio::SearchIO(-format => 'blast', > -file => $blast_out, > -report_type => 'tblastn'); > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N > Evalue)), "\n"; > while(my $query = $in->next_result) { > while(my $subject = $query->next_hit) { > while (my $hsp = $subject->next_hsp) { > print join("\t", > $query->query_name, > $hsp->start("query"), > $hsp->end("query"), > $hsp->strand("hit"), > $subject->name, > $hsp->start("hit"), > $hsp->end("hit"), > $subject->frame, > $hsp->n, > $hsp->evalue, > ),"\n"; > } > } > } > >> Dave's basically correct in his explanation. It's a result of the >> application of sum statistics by the blast algorithm. You can read >> all >> about it in Korf et al's BLAST book. Here's the relevant section: > > [snip] > > Thanks, > > -Amir > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lskatz at gatech.edu Tue Nov 13 20:27:45 2007 From: lskatz at gatech.edu (Lee Katz) Date: Tue, 13 Nov 2007 20:27:45 -0500 Subject: [Bioperl-l] chromatogram Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Hi, I would like to know how to draw a chromatogram file. Does anyone have any sample code where you read in an scf file and create a jpeg or other image file? For that matter, I want to be able to customize these images with base calls if possible. I really appreciate the help, so thanks! -- Lee Katz From mvrmakam at yahoo.com Wed Nov 14 04:52:13 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST) Subject: [Bioperl-l] Installing Bioperl on Windows XP Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com> Hi, I am encountering a problem while installing Bioperl on Windows XP. I have installed ActivePerl version 5.8.8.822. I am using Perl Package Manager GUI. Also, I am following the instructions outlined for installing Bioperl on Windows. I am getting an error. The error is as follows: Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com') I do not know how to overcome this problem. The other issue is when I type bioperl in the search box I do not see any packages of bioperl. I do not know what the problem is. If anyone of you could guide me through the installation process I would appreciate it. Thanks, Roshan ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From cjfields at uiuc.edu Wed Nov 14 09:02:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Nov 2007 08:02:05 -0600 Subject: [Bioperl-l] Installing Bioperl on Windows XP In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com> References: <235423.72586.qm@web33703.mail.mud.yahoo.com> Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu> The instructions are pretty specific: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Note the section on adding new repositories. As for the PPM connection error, it's more than likely an error with the default address but it isn't bioperl-related; maybe answers lie here: http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- faq2.html#ppm_repositories chris On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote: > Hi, > > I am encountering a problem while installing Bioperl on Windows > XP. I have installed ActivePerl version 5.8.8.822. I am using > Perl Package Manager GUI. Also, I am following the instructions > outlined for installing Bioperl on Windows. I am getting an > error. The error is as follows: > > Downloading ActiveState Package Repository packlist ... failed 500 > Can't connect to ppm4.activestate.com:80 (Bad hostname > 'ppm4.activestate.com') > > I do not know how to overcome this problem. The other issue is > when I type bioperl in the search box I do not see any packages of > bioperl. I do not know what the problem is. If anyone of you > could guide me through the installation process I would appreciate it. > > Thanks, > > Roshan From reshetovdenis at gmail.com Wed Nov 14 12:28:40 2007 From: reshetovdenis at gmail.com (Denis Reshetov) Date: Wed, 14 Nov 2007 20:28:40 +0300 Subject: [Bioperl-l] how to load all genomes Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Dear BioPerl-db Creators, I`m trying to load all genomes from NCBI ftp site to my BioSql database using common script load_seqdatabase.pl But it seems very slow. Let me know what is the better way to do it? Thank you very much, Denis. From barry.moore at genetics.utah.edu Wed Nov 14 14:18:29 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 14 Nov 2007 12:18:29 -0700 Subject: [Bioperl-l] how to load all genomes In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu> Denis, You might be interested in this thread from a couple years ago. I was having a similar problem, that I eventually resolved. Unfortunately the reason for the problem and the solution weren't entirely clear, but you may be able to glean some ideas from it. Also, you may have already done this, but I suggest searching the archives from this list because it seems like this comes up every now and then, so there may be other postings similar to the one I'm sending you that could help you. http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html Finally, if you are still having problems, you'll want to include a few more details about your situation. What DB are you using, have you preloaded taxonomy data etc. How fast/slow are your sequences loading? Barry On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote: > Dear BioPerl-db Creators, > > I`m trying to load all genomes from NCBI ftp site > to my BioSql database using common script load_seqdatabase.pl > > But it seems very slow. Let me know what is the better way to do it? > > Thank you very much, > > Denis. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Nov 14 14:57:49 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 08:57:49 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Here's my trace viewer. Please excuse my dodgy Perl and debugging code as it's still under development :-) Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ------------------------------------------------------------------------ ------------------ #!perl -w use ABI; use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Data::Dumper; use Getopt::Long; use constant HEIGHT => 300; GetOptions ('h|height=i' => \$HEIGHT, 'f|file=s' => \$FILE, 'o|out=s' => \$OUTFILE, 'l|left=s' => \$LEFT_SEQ, 'r|right=s' => \$RIGHT_SEQ, 's|size=i' => \$SIZE, ) || die < Set height of image (${\HEIGHT} pixels default) --file Filename for the ABI trace file --out Filename for the generated .png image --left --right --size Parse an ABI trace file and render a PNG image. See http://search.cpan.org/dist/ABI/ABI.pm or http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm USAGE my $height = $HEIGHT || HEIGHT; my $file = $FILE; my $outfile = $OUTFILE; my $abi = ABI->new(-file=> $file); my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" my @base_calls = $abi->get_base_calls(); # Get the base calls my $sequence =$abi->get_sequence(); @bp = split(//, $sequence); # iterate over array $size = $abi->get_trace_length(); for ($i=0,$count = 0; $i<$size; $i++) { if(grep(/\b$i\b/, @base_calls)){ $bases[$i] = $bp[$count]; $count++; }else{ $bases[$i] = ' '; } } # create the data. see GD::Graph::Data for details of the format my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); $graph->set( title => $abi->get_sample_name(), # y_max_value => $abi->get_max_trace() + 50, x_max_value => $abi->get_trace_length(), t_margin => 5, b_margin => 5, l_margin => 5, r_margin => 5, x_ticks => 0, text_space => 0, line_width => 1, transparent => 0, b_margin => 30, t_margin => 35, x_plot_values => 0, interlaced => 1, ); # allocate some colors for drawing the bases #use colors same as Chromas $graph->set( dclrs => [ qw( green blue black red pink) ] ); #plot the data my $gd = $graph->plot(\@data); $black = $gd->colorAllocate(0,0,0); # A $blue = $gd->colorAllocate(0,0,255); # C $red = $gd->colorAllocate(255,0,0); # G $green = $gd->colorAllocate(0,255,0); # T $magenta =$gd->colorAllocate(255,0,255); # N $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn $gray = $gd->colorAllocate(210,210,210); %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", $magenta, " ",$white); #$start_base = index(lc($sequence),lc($LEFT_SEQ)); $start_base = find_match($sequence,$LEFT_SEQ); #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ $end_base = find_match($sequence,$RIGHT_SEQ, 1); if($end_base){ $end_base += length($RIGHT_SEQ); } # get the coords of the features on the image @coords = $graph->get_hotspot(1); $size = @coords; $printed_num = 1; $basecount = 0; $numstoprint = $basecount - $start_base; # draw the colored bases and scale at top and bottom of image for ($i=0,$count = 0; $i<$size; $i++) { $c = $coords[$i]; (undef, $xs, undef, undef, undef, undef) = @$c; $base = $bases[$i]; if($base =~ /[ACGTN]/){ if($start_base - 1 == $basecount){$start_base_coord = $xs;} if($end_base - 1 == $basecount){$end_base_coord = $xs;} if(defined($SIZE) && $start_base+$SIZE -2 == $basecount){$end_base_coord_by_size = $xs;} $basecount++; $numstoprint++; $printed_num = 0; } # print the bases top and bottom $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); # print scale if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ if($LEFT_SEQ){ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; }else{ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; } } $top_right_corner = $xs; } # only draw the clipped region if the calculated size is + or - 6bp #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) - $SIZE >= -6 ){ # draw the clipped regions as gray #if LEFT_SEQ supplied and a match found if($LEFT_SEQ && $start_base > 0){ $gd->filledRectangle(38,35,$start_base_coord - 1,$height - 33,$red); $clipped = 1; } #if RIGHT_SEQ supplied and a match found if($RIGHT_SEQ && $end_base > 0){ print join("\t", ($end_base)),"\n"; $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - 33,$gray); $clipped = 1; } #if no RIGHT_SEQ supplied or no match found, use left match + seq length if(!$RIGHT_SEQ || $end_base < 0){ $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh t - 33,$blue); $clipped = 1; } # set height based on max trace within clipped region $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); # need to re-plot the data over the grayed out area $graph->plot(\@data) if $clipped; $gd->filledRectangle(0,0,$top_right_corner,33,$white); #} #print the graph open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; binmode OUT; print OUT $gd->png; close OUT; sub find_match{ my ($sequence,$query,$last) = @_; return -1 if length($query) < 6; my($odds, $evens, $ones, $twos, $threes, $match_pos); # try exact match $match_pos = do_regex($query, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every second base starting from the second base e.g. it will be .C.T.C.G.etc map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} ($query=~m/(\w\w)/g); $match_pos = do_regex($odds, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($evens, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every third base starting from the first base e.g. it will be C..T..G..T etc map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; $threes.="..$3"} ($query =~m/(\w\w\w)/g); $match_pos = do_regex($ones, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($twos, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($threes, $sequence,$last); return $match_pos if $match_pos > 0; # not found return -1; } sub do_regex(){ my ($query,$sequence,$last)= @_; #print "trying $query \n"; my $result = -1; $result = pos($sequence)-length($query)+1 if $last && ($sequence =~ m/.*($query)/ig); $result = pos($sequence)-length($query)+1 if($sequence =~ m/.*?($query)/ig); return $result; } ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 15:47:20 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 15:47:20 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: <473B5ED8.1090201@mail.nih.gov> I guess you need chromatogram from SCF. I can't help in that. ABI.pm is not in Bioperl distribution. But to make the record straight, you can use one step chromatogram drawing in SVG from ABI file using my BioSVG module, available at: http://www.bioinformatics.org/~malay/biosvg/ Malay Smithies, Russell wrote: > Here's my trace viewer. > Please excuse my dodgy Perl and debugging code as it's still under > development :-) > > > Russell Smithies > > Bioinformatics Software Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > ------------------------------------------------------------------------ > ------------------ > > #!perl -w > use ABI; > > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Data::Dumper; > > > use Getopt::Long; > > use constant HEIGHT => 300; > > GetOptions ('h|height=i' => \$HEIGHT, > 'f|file=s' => \$FILE, > 'o|out=s' => \$OUTFILE, > 'l|left=s' => \$LEFT_SEQ, > 'r|right=s' => \$RIGHT_SEQ, > 's|size=i' => \$SIZE, > ) || die < Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > test2.png -l actacgtacgta -r atgatcgtacgtac > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > Options: > --height Set height of image (${\HEIGHT} pixels default) > --file Filename for the ABI trace file > --out Filename for the generated .png image > --left > --right > --size > > Parse an ABI trace file and render a PNG image. > See http://search.cpan.org/dist/ABI/ABI.pm > or > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > USAGE > > my $height = $HEIGHT || HEIGHT; > my $file = $FILE; > my $outfile = $OUTFILE; > > my $abi = ABI->new(-file=> $file); > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > my @base_calls = $abi->get_base_calls(); # Get the base calls > my $sequence =$abi->get_sequence(); > @bp = split(//, $sequence); > > > > # iterate over array > $size = $abi->get_trace_length(); > for ($i=0,$count = 0; $i<$size; $i++) { > if(grep(/\b$i\b/, @base_calls)){ > $bases[$i] = $bp[$count]; > $count++; > }else{ > $bases[$i] = ' '; > } > } > > # create the data. see GD::Graph::Data for details of the format > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > $graph->set( > title => $abi->get_sample_name(), > # y_max_value => $abi->get_max_trace() + 50, > x_max_value => $abi->get_trace_length(), > t_margin => 5, > b_margin => 5, > l_margin => 5, > r_margin => 5, > x_ticks => 0, > text_space => 0, > line_width => 1, > transparent => 0, > b_margin => 30, > t_margin => 35, > x_plot_values => 0, > interlaced => 1, > ); > > # allocate some colors for drawing the bases > #use colors same as Chromas > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > #plot the data > my $gd = $graph->plot(\@data); > > $black = $gd->colorAllocate(0,0,0); # A > $blue = $gd->colorAllocate(0,0,255); # C > $red = $gd->colorAllocate(255,0,0); # G > $green = $gd->colorAllocate(0,255,0); # T > $magenta =$gd->colorAllocate(255,0,255); # N > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > $gray = $gd->colorAllocate(210,210,210); > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > $magenta, " ",$white); > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > $start_base = find_match($sequence,$LEFT_SEQ); > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > if($end_base){ > $end_base += length($RIGHT_SEQ); > } > > > # get the coords of the features on the image > @coords = $graph->get_hotspot(1); > $size = @coords; > $printed_num = 1; > $basecount = 0; > $numstoprint = $basecount - $start_base; > > # draw the colored bases and scale at top and bottom of image > for ($i=0,$count = 0; $i<$size; $i++) { > $c = $coords[$i]; > (undef, $xs, undef, undef, undef, undef) = @$c; > $base = $bases[$i]; > if($base =~ /[ACGTN]/){ > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > if(defined($SIZE) && $start_base+$SIZE -2 == > $basecount){$end_base_coord_by_size = $xs;} > $basecount++; > $numstoprint++; > $printed_num = 0; > } > # print the bases top and bottom > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > # print scale > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > if($LEFT_SEQ){ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > }else{ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > } > } > $top_right_corner = $xs; > } > > > > # only draw the clipped region if the calculated size is + or - 6bp > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > - $SIZE >= -6 ){ > # draw the clipped regions as gray > #if LEFT_SEQ supplied and a match found > if($LEFT_SEQ && $start_base > 0){ > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > 33,$red); > $clipped = 1; > } > #if RIGHT_SEQ supplied and a match found > if($RIGHT_SEQ && $end_base > 0){ > print join("\t", ($end_base)),"\n"; > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > 33,$gray); > $clipped = 1; > } > #if no RIGHT_SEQ supplied or no match found, use left match + seq > length > if(!$RIGHT_SEQ || $end_base < 0){ > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > t - 33,$blue); > $clipped = 1; > } > > > > # set height based on max trace within clipped region > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > # need to re-plot the data over the grayed out area > $graph->plot(\@data) if $clipped; > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > #} > > #print the graph > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > binmode OUT; > print OUT $gd->png; > close OUT; > > > sub find_match{ > my ($sequence,$query,$last) = @_; > return -1 if length($query) < 6; > my($odds, $evens, $ones, $twos, $threes, $match_pos); > # try exact match > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > $match_pos > 0; > > # try matching every second base starting from the second base e.g. > it will be .C.T.C.G.etc > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > ($query=~m/(\w\w)/g); > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > if $match_pos > 0; > > # try matching every third base starting from the first base e.g. it > will be C..T..G..T etc > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > if $match_pos > 0; > > # not found > return -1; > } > > sub do_regex(){ > my ($query,$sequence,$last)= @_; > #print "trying $query \n"; > my $result = -1; > $result = pos($sequence)-length($query)+1 if $last && ($sequence > =~ m/.*($query)/ig); > $result = pos($sequence)-length($query)+1 if($sequence =~ > m/.*?($query)/ig); > return $result; > } > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Lee Katz >> Sent: Wednesday, 14 November 2007 2:28 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] chromatogram >> >> Hi, >> I would like to know how to draw a chromatogram file. Does anyone >> have any sample code where you read in an scf file and create a jpeg >> or other image file? >> For that matter, I want to be able to customize these images with base >> calls if possible. I really appreciate the help, so thanks! >> >> -- >> Lee Katz >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Malay K Basu www.malaybasu.net From Russell.Smithies at agresearch.co.nz Wed Nov 14 15:58:19 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 09:58:19 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B5ED8.1090201@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: We try and avoid SVG at all costs as installing plugins and viewers in a locked down corporate environment can be more trouble than it's worth whereas generating .png images works for any browser with no extras required. We actually call this trace drawing code from Python which then generates webpages with the embedded image. It also means we don't need to licence, install and maintain a trace viewer like Chromas. :-) Russell > -----Original Message----- > From: Malay [mailto:mbasu at mail.nih.gov] > Sent: Thursday, 15 November 2007 9:47 a.m. > To: Smithies, Russell > Cc: Lee Katz; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] chromatogram > > I guess you need chromatogram from SCF. I can't help in that. ABI.pm is > not in Bioperl distribution. But to make the record straight, you can > use one step chromatogram drawing in SVG from ABI file using my BioSVG > module, available at: > > http://www.bioinformatics.org/~malay/biosvg/ > > Malay > > > > > Smithies, Russell wrote: > > Here's my trace viewer. > > Please excuse my dodgy Perl and debugging code as it's still under > > development :-) > > > > > > Russell Smithies > > > > Bioinformatics Software Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > ------------------------------------------------------------------------ > > ------------------ > > > > #!perl -w > > use ABI; > > > > use GD::Graph::lines; > > use GD::Graph::colour; > > use GD::Graph::Data; > > > > use Data::Dumper; > > > > > > use Getopt::Long; > > > > use constant HEIGHT => 300; > > > > GetOptions ('h|height=i' => \$HEIGHT, > > 'f|file=s' => \$FILE, > > 'o|out=s' => \$OUTFILE, > > 'l|left=s' => \$LEFT_SEQ, > > 'r|right=s' => \$RIGHT_SEQ, > > 's|size=i' => \$SIZE, > > ) || die < > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > > test2.png -l actacgtacgta -r atgatcgtacgtac > > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > > > Options: > > --height Set height of image (${\HEIGHT} pixels default) > > --file Filename for the ABI trace file > > --out Filename for the generated .png image > > --left > > --right > > --size > > > > Parse an ABI trace file and render a PNG image. > > See http://search.cpan.org/dist/ABI/ABI.pm > > or > > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > > USAGE > > > > my $height = $HEIGHT || HEIGHT; > > my $file = $FILE; > > my $outfile = $OUTFILE; > > > > my $abi = ABI->new(-file=> $file); > > > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > > > my @base_calls = $abi->get_base_calls(); # Get the base calls > > my $sequence =$abi->get_sequence(); > > @bp = split(//, $sequence); > > > > > > > > # iterate over array > > $size = $abi->get_trace_length(); > > for ($i=0,$count = 0; $i<$size; $i++) { > > if(grep(/\b$i\b/, @base_calls)){ > > $bases[$i] = $bp[$count]; > > $count++; > > }else{ > > $bases[$i] = ' '; > > } > > } > > > > # create the data. see GD::Graph::Data for details of the format > > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > > $graph->set( > > title => $abi->get_sample_name(), > > # y_max_value => $abi->get_max_trace() + 50, > > x_max_value => $abi->get_trace_length(), > > t_margin => 5, > > b_margin => 5, > > l_margin => 5, > > r_margin => 5, > > x_ticks => 0, > > text_space => 0, > > line_width => 1, > > transparent => 0, > > b_margin => 30, > > t_margin => 35, > > x_plot_values => 0, > > interlaced => 1, > > ); > > > > # allocate some colors for drawing the bases > > #use colors same as Chromas > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > > > #plot the data > > my $gd = $graph->plot(\@data); > > > > $black = $gd->colorAllocate(0,0,0); # A > > $blue = $gd->colorAllocate(0,0,255); # C > > $red = $gd->colorAllocate(255,0,0); # G > > $green = $gd->colorAllocate(0,255,0); # T > > $magenta =$gd->colorAllocate(255,0,255); # N > > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > > $gray = $gd->colorAllocate(210,210,210); > > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > > $magenta, " ",$white); > > > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > > $start_base = find_match($sequence,$LEFT_SEQ); > > > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > > if($end_base){ > > $end_base += length($RIGHT_SEQ); > > } > > > > > > # get the coords of the features on the image > > @coords = $graph->get_hotspot(1); > > $size = @coords; > > $printed_num = 1; > > $basecount = 0; > > $numstoprint = $basecount - $start_base; > > > > # draw the colored bases and scale at top and bottom of image > > for ($i=0,$count = 0; $i<$size; $i++) { > > $c = $coords[$i]; > > (undef, $xs, undef, undef, undef, undef) = @$c; > > $base = $bases[$i]; > > if($base =~ /[ACGTN]/){ > > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > > if(defined($SIZE) && $start_base+$SIZE -2 == > > $basecount){$end_base_coord_by_size = $xs;} > > $basecount++; > > $numstoprint++; > > $printed_num = 0; > > } > > # print the bases top and bottom > > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > > > # print scale > > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > > if($LEFT_SEQ){ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > }else{ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > } > > } > > $top_right_corner = $xs; > > } > > > > > > > > # only draw the clipped region if the calculated size is + or - 6bp > > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > > - $SIZE >= -6 ){ > > # draw the clipped regions as gray > > #if LEFT_SEQ supplied and a match found > > if($LEFT_SEQ && $start_base > 0){ > > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > > 33,$red); > > $clipped = 1; > > } > > #if RIGHT_SEQ supplied and a match found > > if($RIGHT_SEQ && $end_base > 0){ > > print join("\t", ($end_base)),"\n"; > > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > > 33,$gray); > > $clipped = 1; > > } > > #if no RIGHT_SEQ supplied or no match found, use left match + seq > > length > > if(!$RIGHT_SEQ || $end_base < 0){ > > > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > > t - 33,$blue); > > $clipped = 1; > > } > > > > > > > > # set height based on max trace within clipped region > > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > > > # need to re-plot the data over the grayed out area > > $graph->plot(\@data) if $clipped; > > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > > > #} > > > > #print the graph > > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > > binmode OUT; > > print OUT $gd->png; > > close OUT; > > > > > > sub find_match{ > > my ($sequence,$query,$last) = @_; > > return -1 if length($query) < 6; > > my($odds, $evens, $ones, $twos, $threes, $match_pos); > > # try exact match > > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > > $match_pos > 0; > > > > # try matching every second base starting from the second base e.g. > > it will be .C.T.C.G.etc > > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > > ($query=~m/(\w\w)/g); > > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # try matching every third base starting from the first base e.g. it > > will be C..T..G..T etc > > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # not found > > return -1; > > } > > > > sub do_regex(){ > > my ($query,$sequence,$last)= @_; > > #print "trying $query \n"; > > my $result = -1; > > $result = pos($sequence)-length($query)+1 if $last && ($sequence > > =~ m/.*($query)/ig); > > $result = pos($sequence)-length($query)+1 if($sequence =~ > > m/.*?($query)/ig); > > return $result; > > } > > > > ------------------------------------------------------------------------ > > ------------------ > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of Lee Katz > >> Sent: Wednesday, 14 November 2007 2:28 p.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] chromatogram > >> > >> Hi, > >> I would like to know how to draw a chromatogram file. Does anyone > >> have any sample code where you read in an scf file and create a jpeg > >> or other image file? > >> For that matter, I want to be able to customize these images with base > >> calls if possible. I really appreciate the help, so thanks! > >> > >> -- > >> Lee Katz > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ============================================================= > ========== > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > > ============================================================= > ========== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Malay K Basu > www.malaybasu.net ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 16:04:25 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 16:04:25 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: <473B62D9.8010004@mail.nih.gov> You don't need any plugin. Firefox natively can show most of the SVG files. -Malay Smithies, Russell wrote: > We try and avoid SVG at all costs as installing plugins and viewers in a > locked down corporate environment can be more trouble than it's worth > whereas generating .png images works for any browser with no extras > required. > We actually call this trace drawing code from Python which then > generates webpages with the embedded image. > It also means we don't need to licence, install and maintain a trace > viewer like Chromas. > :-) > > Russell > >> -----Original Message----- >> From: Malay [mailto:mbasu at mail.nih.gov] >> Sent: Thursday, 15 November 2007 9:47 a.m. >> To: Smithies, Russell >> Cc: Lee Katz; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] chromatogram >> >> I guess you need chromatogram from SCF. I can't help in that. ABI.pm > is >> not in Bioperl distribution. But to make the record straight, you can >> use one step chromatogram drawing in SVG from ABI file using my BioSVG >> module, available at: >> >> http://www.bioinformatics.org/~malay/biosvg/ >> >> Malay >> >> >> >> >> Smithies, Russell wrote: >>> Here's my trace viewer. >>> Please excuse my dodgy Perl and debugging code as it's still under >>> development :-) >>> >>> >>> Russell Smithies >>> >>> Bioinformatics Software Developer >>> T +64 3 489 9085 >>> E russell.smithies at agresearch.co.nz >>> >>> Invermay Research Centre >>> Puddle Alley, >>> Mosgiel, >>> New Zealand >>> T +64 3 489 3809 >>> F +64 3 489 9174 >>> www.agresearch.co.nz >>> >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>> #!perl -w >>> use ABI; >>> >>> use GD::Graph::lines; >>> use GD::Graph::colour; >>> use GD::Graph::Data; >>> >>> use Data::Dumper; >>> >>> >>> use Getopt::Long; >>> >>> use constant HEIGHT => 300; >>> >>> GetOptions ('h|height=i' => \$HEIGHT, >>> 'f|file=s' => \$FILE, >>> 'o|out=s' => \$OUTFILE, >>> 'l|left=s' => \$LEFT_SEQ, >>> 'r|right=s' => \$RIGHT_SEQ, >>> 's|size=i' => \$SIZE, >>> ) || die <>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o >>> test2.png -l actacgtacgta -r atgatcgtacgtac >>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 >>> --out test2.png --left actacgtacgta --right atgatcgtacgtac >>> >>> Options: >>> --height Set height of image (${\HEIGHT} pixels default) >>> --file Filename for the ABI trace file >>> --out Filename for the generated .png image >>> --left >>> --right >>> --size >>> >>> Parse an ABI trace file and render a PNG image. >>> See http://search.cpan.org/dist/ABI/ABI.pm >>> or >>> http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm >>> USAGE >>> >>> my $height = $HEIGHT || HEIGHT; >>> my $file = $FILE; >>> my $outfile = $OUTFILE; >>> >>> my $abi = ABI->new(-file=> $file); >>> >>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" >>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" >>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" >>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" >>> >>> my @base_calls = $abi->get_base_calls(); # Get the base calls >>> my $sequence =$abi->get_sequence(); >>> @bp = split(//, $sequence); >>> >>> >>> >>> # iterate over array >>> $size = $abi->get_trace_length(); >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> if(grep(/\b$i\b/, @base_calls)){ >>> $bases[$i] = $bp[$count]; >>> $count++; >>> }else{ >>> $bases[$i] = ' '; >>> } >>> } >>> >>> # create the data. see GD::Graph::Data for details of the format >>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); >>> >>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); >>> $graph->set( >>> title => $abi->get_sample_name(), >>> # y_max_value => $abi->get_max_trace() + 50, >>> x_max_value => $abi->get_trace_length(), >>> t_margin => 5, >>> b_margin => 5, >>> l_margin => 5, >>> r_margin => 5, >>> x_ticks => 0, >>> text_space => 0, >>> line_width => 1, >>> transparent => 0, >>> b_margin => 30, >>> t_margin => 35, >>> x_plot_values => 0, >>> interlaced => 1, >>> ); >>> >>> # allocate some colors for drawing the bases >>> #use colors same as Chromas >>> $graph->set( dclrs => [ qw( green blue black red pink) ] ); >>> >>> #plot the data >>> my $gd = $graph->plot(\@data); >>> >>> $black = $gd->colorAllocate(0,0,0); # A >>> $blue = $gd->colorAllocate(0,0,255); # C >>> $red = $gd->colorAllocate(255,0,0); # G >>> $green = $gd->colorAllocate(0,255,0); # T >>> $magenta =$gd->colorAllocate(255,0,255); # N >>> $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn >>> $gray = $gd->colorAllocate(210,210,210); >>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", >>> $magenta, " ",$white); >>> >>> #$start_base = index(lc($sequence),lc($LEFT_SEQ)); >>> $start_base = find_match($sequence,$LEFT_SEQ); >>> >>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ >>> $end_base = find_match($sequence,$RIGHT_SEQ, 1); >>> if($end_base){ >>> $end_base += length($RIGHT_SEQ); >>> } >>> >>> >>> # get the coords of the features on the image >>> @coords = $graph->get_hotspot(1); >>> $size = @coords; >>> $printed_num = 1; >>> $basecount = 0; >>> $numstoprint = $basecount - $start_base; >>> >>> # draw the colored bases and scale at top and bottom of image >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> $c = $coords[$i]; >>> (undef, $xs, undef, undef, undef, undef) = @$c; >>> $base = $bases[$i]; >>> if($base =~ /[ACGTN]/){ >>> if($start_base - 1 == $basecount){$start_base_coord = $xs;} >>> if($end_base - 1 == $basecount){$end_base_coord = $xs;} >>> if(defined($SIZE) && $start_base+$SIZE -2 == >>> $basecount){$end_base_coord_by_size = $xs;} >>> $basecount++; >>> $numstoprint++; >>> $printed_num = 0; >>> } >>> # print the bases top and bottom >>> $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); >>> $gd->string(GD::Font->Small(),$xs,$height - > 30,$base,$colors{$base}); >>> # print scale >>> if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ >>> if($LEFT_SEQ){ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> }else{ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> } >>> } >>> $top_right_corner = $xs; >>> } >>> >>> >>> >>> # only draw the clipped region if the calculated size is + or - 6bp >>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - > $start_base) >>> - $SIZE >= -6 ){ >>> # draw the clipped regions as gray >>> #if LEFT_SEQ supplied and a match found >>> if($LEFT_SEQ && $start_base > 0){ >>> $gd->filledRectangle(38,35,$start_base_coord - 1,$height - >>> 33,$red); >>> $clipped = 1; >>> } >>> #if RIGHT_SEQ supplied and a match found >>> if($RIGHT_SEQ && $end_base > 0){ >>> print join("\t", ($end_base)),"\n"; >>> $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height > - >>> 33,$gray); >>> $clipped = 1; >>> } >>> #if no RIGHT_SEQ supplied or no match found, use left match + seq >>> length >>> if(!$RIGHT_SEQ || $end_base < 0){ >>> >>> > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh >>> t - 33,$blue); >>> $clipped = 1; >>> } >>> >>> >>> >>> # set height based on max trace within clipped region >>> $graph->set( y_max_value => 3000);#$abi->get_max_trace() + > 50); >>> # need to re-plot the data over the grayed out area >>> $graph->plot(\@data) if $clipped; >>> $gd->filledRectangle(0,0,$top_right_corner,33,$white); >>> >>> #} >>> >>> #print the graph >>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; >>> binmode OUT; >>> print OUT $gd->png; >>> close OUT; >>> >>> >>> sub find_match{ >>> my ($sequence,$query,$last) = @_; >>> return -1 if length($query) < 6; >>> my($odds, $evens, $ones, $twos, $threes, $match_pos); >>> # try exact match >>> $match_pos = do_regex($query, $sequence,$last); return > $match_pos if >>> $match_pos > 0; >>> >>> # try matching every second base starting from the second base > e.g. >>> it will be .C.T.C.G.etc >>> map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} >>> ($query=~m/(\w\w)/g); >>> $match_pos = do_regex($odds, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($evens, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # try matching every third base starting from the first base > e.g. it >>> will be C..T..G..T etc >>> map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; >>> $threes.="..$3"} ($query =~m/(\w\w\w)/g); >>> $match_pos = do_regex($ones, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($twos, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($threes, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # not found >>> return -1; >>> } >>> >>> sub do_regex(){ >>> my ($query,$sequence,$last)= @_; >>> #print "trying $query \n"; >>> my $result = -1; >>> $result = pos($sequence)-length($query)+1 if $last && > ($sequence >>> =~ m/.*($query)/ig); >>> $result = pos($sequence)-length($query)+1 if($sequence =~ >>> m/.*?($query)/ig); >>> return $result; >>> } >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open- >>>> bio.org] On Behalf Of Lee Katz >>>> Sent: Wednesday, 14 November 2007 2:28 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] chromatogram >>>> >>>> Hi, >>>> I would like to know how to draw a chromatogram file. Does anyone >>>> have any sample code where you read in an scf file and create a > jpeg >>>> or other image file? >>>> For that matter, I want to be able to customize these images with > base >>>> calls if possible. I really appreciate the help, so thanks! >>>> >>>> -- >>>> Lee Katz >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ============================================================= >> ========== >>> Attention: The information contained in this message and/or > attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or > privileged >>> material. Any review, retransmission, dissemination or other use of, > or >>> taking of any action in reliance upon, this information by persons > or >>> entities other than the intended recipients is prohibited by > AgResearch >>> Limited. If you have received this message in error, please notify > the >>> sender immediately. >>> >> ============================================================= >> ========== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Malay K Basu >> www.malaybasu.net > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- Malay K Basu www.malaybasu.net From tomboy at cs.huji.ac.il Wed Nov 14 21:43:43 2007 From: tomboy at cs.huji.ac.il (Tomer Hertz) Date: Wed, 14 Nov 2007 18:43:43 -0800 Subject: [Bioperl-l] problems in stalling bio perl Message-ID: hi when I try to install bioperl I get the following error message: hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 $ perl Build.PL Can't find file lib/Module/Build.pm to determine version at /usr/lib/perl5/site_ perl/5.8/Module/Build/Base.pm line 950. can you please help. I have tried reinstalling the build command and that does not seem to help as well. many thanks --Tomer -- -------------------------------------------------------------------------------- Tomer Hertz Postdoctoral Researcher Machine Learning and Applied Statistics Microsoft Research One Microsoft Way, Redmond, WA, 98052, USA Homepage: www.cs.huji.ac.il/~tomboy Email: hertz at microsoft dot com Tel: (425)-421-8313 Fax: (425) 936-7329 -------------------------------------------------------------------------------- From lskatz at gatech.edu Thu Nov 15 08:24:02 2007 From: lskatz at gatech.edu (Lee Katz) Date: Thu, 15 Nov 2007 08:24:02 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B62D9.8010004@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Thank you all. Are you all sure in that there is no way to go from an scf to an image though? I do have abi files, but I am relying on Phred output for base calls for other things and I want to stay consistent. This means that if I use the fasta files that I get from Phred in another part of my program, I need to use the scf files it produces. If this is not possible, do you know if drawing an scf is in the works? Thanks. -- Lee Katz http://www.lskatz.com From cain.cshl at gmail.com Thu Nov 15 09:21:26 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 15 Nov 2007 09:21:26 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <1195136486.2785.12.camel@localhost.localdomain> Hi Lee, Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses Bio::SCF to draw trace files onto a Bio::Graphics::Panel. Bio::SCF is not part of bioperl, so you have to get it from CPAN and it depends on the Staden io-lib package, so you'll need that too. You can get GBrowse from http://www.gmod.org/gbrowse , and you can look at the tutorial for more information on configuring the trace glyph. Scott On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote: > Thank you all. > Are you all sure in that there is no way to go from an scf to an image > though? I do have abi files, but I am relying on Phred output for > base calls for other things and I want to stay consistent. This means > that if I use the fasta files that I get from Phred in another part of > my program, I need to use the scf files it produces. > > If this is not possible, do you know if drawing an scf is in the works? Thanks. > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From bosborne11 at verizon.net Thu Nov 15 09:18:05 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 09:18:05 -0500 Subject: [Bioperl-l] problems in stalling bio perl In-Reply-To: Message-ID: Tomer, Interesting. When I used Cygwin I always worked entirely within the C: drive, it looks like you're executing the script from the E: drive. Is Cygwin installed in C:/cygwin? You can see what I'm getting at, it's possible that you need to set $PERL5LIB to something like /cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say? Brian O. On 11/14/07 9:43 PM, "Tomer Hertz" wrote: > hi > when I try to install bioperl I get the following error message: > > hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 > $ perl Build.PL > Can't find file lib/Module/Build.pm to determine version at > /usr/lib/perl5/site_ > perl/5.8/Module/Build/Base.pm line 950. > can you please help. I have tried reinstalling the build command and that > does not seem to help as well. > > many thanks > --Tomer From bernd.web at gmail.com Thu Nov 15 10:26:42 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 16:26:42 +0100 Subject: [Bioperl-l] Graphics::Panel Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Hi, Has someone been able to access '$description' for the production of imagemaps with Graphics::Panel? The map below does not print the "title" tag at all, '$description' seems not available, although for the tracks ($panel->add_track) it is available. $map = $panel->create_web_map($mapname, $linkrule, '$description'); Replacing '$description' with a coderef for the titletag does work, if I use the code below my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } Regards, Bernd From luciap at sas.upenn.edu Thu Nov 15 10:44:21 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Thu, 15 Nov 2007 10:44:21 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Hi I was asked this question recently and it occurred to me I must be doing things inefficiently To produce gff file I was using SeqIO to parse the required fields, then according to the conventions just printing out whatever was required tab delimited, which is easy but if I wanted to generate a genbank file, extracting features from a gff file and a plain fasta file it was more complicated is there support for gff in bioperl now? anyone can contribute with smart way to go from/to gff, genebank and embl? thanks very much Lucia Peixoto Department of Biology,SAS University of Pennsylvania From lstein at cshl.edu Thu Nov 15 12:38:04 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Nov 2007 12:38:04 -0500 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Depending on which Feature object you use, you may have to use a tag named "note" instead of "description". Lincoln On Nov 15, 2007 10:26 AM, Bernd Web wrote: > Hi, > > Has someone been able to access '$description' for the production of > imagemaps with Graphics::Panel? > The map below does not print the "title" tag at all, '$description' > seems not available, although for the tracks ($panel->add_track) it is > available. > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > Replacing '$description' with a coderef for the titletag does work, if > I use the code below > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bernd.web at gmail.com Thu Nov 15 13:03:19 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 19:03:19 +0100 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com> On Nov 15, 2007 6:38 PM, Lincoln Stein wrote: > Depending on which Feature object you use, you may have to use a tag named > "note" instead of "description". > > Lincoln > > > > On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote: > > > > > > > > Hi, > > > > Has someone been able to access '$description' for the production of > > imagemaps with Graphics::Panel? > > The map below does not print the "title" tag at all, '$description' > > seems not available, although for the tracks ($panel->add_track) it is > > available. > > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > > > Replacing '$description' with a coderef for the titletag does work, if > > I use the code below > > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > > > > Regards, > > Bernd > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Nov 15 13:43:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Nov 2007 12:43:02 -0600 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> There are currently many ways to get what you want, but not all are consistent (particularly re: GFF3). We are aiming for more consistent, compliant GFF/GTF output in the next developer series (1.7) of Bioperl. You can try using bp_genbank2gff or bp_genbank2gff3 (both in the scripts directory); these are probably the most common way when working directly from a seq record. Bio::Tools::GFF is the most commonly used class though I'm unsure of it's status for GFF3 output. From within a Bio::SeqI you can call write_gff() (currently not very flexible) or from the SeqFeature itself gff_string(). Bio::Graphics::Feature has the additional method gff3_string(). Bio::FeatureIO is also an option, though I would consider it very experimental (it will likely undergo significant revision in the next bioperl dev series). Any others anyone can think of, maybe non-BioPerl related as well? chris On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > Hi > I was asked this question recently > and it occurred to me I must be doing things inefficiently > To produce gff file I was using SeqIO to parse the required fields, > then > according to the conventions just printing out whatever was > required tab > delimited, which is easy > > but if I wanted to generate a genbank file, extracting features > from a gff file > and a plain fasta file it was more complicated > is there support for gff in bioperl now? > anyone can contribute with smart way to go from/to gff, genebank > and embl? > > thanks very much > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Nov 15 14:19:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 14:19:41 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> Message-ID: Chris, There's also a genbank2gff3.PLS script in the GMOD package ( http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? revision=1.5&view=markup). However, it has not been modified for a couple of years, it may not be the "preferred" script. See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information on using Bioperl's bp_genbank2gff3 script. Brian O. On 11/15/07 1:43 PM, "Chris Fields" wrote: > There are currently many ways to get what you want, but not all are > consistent (particularly re: GFF3). We are aiming for more > consistent, compliant GFF/GTF output in the next developer series > (1.7) of Bioperl. > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > scripts directory); these are probably the most common way when > working directly from a seq record. Bio::Tools::GFF is the most > commonly used class though I'm unsure of it's status for GFF3 > output. From within a Bio::SeqI you can call write_gff() (currently > not very flexible) or from the SeqFeature itself gff_string(). > Bio::Graphics::Feature has the additional method gff3_string(). > Bio::FeatureIO is also an option, though I would consider it very > experimental (it will likely undergo significant revision in the next > bioperl dev series). > > Any others anyone can think of, maybe non-BioPerl related as well? > > chris > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > >> Hi >> I was asked this question recently >> and it occurred to me I must be doing things inefficiently >> To produce gff file I was using SeqIO to parse the required fields, >> then >> according to the conventions just printing out whatever was >> required tab >> delimited, which is easy >> >> but if I wanted to generate a genbank file, extracting features >> from a gff file >> and a plain fasta file it was more complicated >> is there support for gff in bioperl now? >> anyone can contribute with smart way to go from/to gff, genebank >> and embl? >> >> thanks very much >> >> Lucia Peixoto >> Department of Biology,SAS >> University of Pennsylvania >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Nov 15 17:31:28 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 16 Nov 2007 11:31:28 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Just to add to this, does anyone have any code for reading .sff 'traces' from 454 sequences? Thanx, Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From torsten.seemann at infotech.monash.edu.au Thu Nov 15 20:13:22 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 16 Nov 2007 12:13:22 +1100 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: > Just to add to this, does anyone have any code for reading .sff 'traces' > from 454 sequences? The .SFF files can be manipulated using the SFF tools which 454 distribute with their result data. eg. "sffinfo 454AllContigs.sff" will list all the reads with the original flowgram values etc. However, the SFF tools are i386.Linux binaries, so not really a portable solution. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From mvrmakam at yahoo.com Thu Nov 15 22:04:55 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST) Subject: [Bioperl-l] Problem with installing bioperl on Windows Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com> Hi, I have installed Perl Package Manager ver 5.8.8.822 on windows XP. I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View. However, I am not able to see any packages in the view box. Can anyone help me in this matter. Roshan ____________________________________________________________________________________ Get easy, one-click access to your favorites. Make Yahoo! your homepage. http://www.yahoo.com/r/hs From David.Messina at sbc.su.se Fri Nov 16 03:33:04 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 16 Nov 2007 09:33:04 +0100 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com> > If this is not possible, do you know if drawing an scf is in the > works? Thanks. > One non-BioPerl solution is 4peaks: http://mekentosj.com/4peaks/ Mac only, but really great software. I'm also a fan of their Papers journal article PDF library program. Dave From neetisomaiya at gmail.com Mon Nov 19 01:11:49 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 19 Nov 2007 11:41:49 +0530 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Hi, I am using Bio::SeqIO for parsing KEGG gene ent files. A part of my code is foreach my $key ( $ac->get_all_annotation_keys() ) { if($key eq "dblink") { my %values = $ac->get_Annotations($key); foreach my $value ( keys(%values )) { print "\n*****VALUE $value*****\n"; } } } Here not all dblinks present in the actual file get parsed. For eg, in the data below, ENTRY 116064 CDS H.sapiens NAME LRRC58 DEFINITION leucine rich repeat containing 58 POSITION 3q13.33 MOTIF Pfam: SdiA-regulated LRR_1 PROSITE: LEU_RICH DBLINKS NCBI-GI: 153792305 NCBI-GeneID: 116064 HGNC: 26968 Ensembl: ENSG00000163428 UniProt: Q96CX6 Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE, but doesnt give me HGNC and UniProt. For other entries it gives me other combinations of dbs. Can anyone help me with this. Why is this happenning? I have no clue. Thanks and Regards, Neeti. -- -Neeti Even my blood says, B positive From johnston at biochem.ucl.ac.uk Mon Nov 19 06:44:59 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT) Subject: [Bioperl-l] blast database names Message-ID: Hello, Is there a list of the possible database names for -data => $dbname in RemoteBlast somwhere? Cheers, Cass From cjfields at uiuc.edu Mon Nov 19 08:44:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 07:44:46 -0600 Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: Here's a recent list (don't know if it's up-to-date): http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html chris On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote: > Hello, > > Is there a list of the possible database names for -data => > $dbname in RemoteBlast somwhere? > > Cheers, > Cass > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Nov 19 09:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 08:33:46 -0600 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Message-ID: It makes sense in the light that you're (erroneously) using a hash: my %values = $ac->get_Annotations($key); This assigns key-value pairs of DBLink => DBLink; you don't see an error b/c the number of links happens to be even (I get 8) but you would if the number of links returned is odd (missing value for key error or something along those lines). So when you call: foreach my $value (keys(%values)) {....} you only get half of the DBLinks. You should use an array: my @values = $ac->get_Annotations($key); foreach my $value (@values) { print $value->as_text,"\n"; } Note the loop change; Bio::Annotation are no longer operator overloaded so your print statement wouldn't work in a bioperl 1.6 world. chris On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote: > Hi, > > I am using Bio::SeqIO for parsing KEGG gene ent files. > > A part of my code is > > foreach my $key ( $ac->get_all_annotation_keys() ) > { > if($key eq "dblink") > { > my %values = > $ac->get_Annotations($key); > foreach my $value ( > keys(%values )) > { > print > "\n*****VALUE > $value*****\n"; > } > } > } > > Here not all dblinks present in the actual file get parsed. For eg, > in the > data below, > ENTRY 116064 CDS H.sapiens > NAME LRRC58 > DEFINITION leucine rich repeat containing 58 > POSITION 3q13.33 > MOTIF Pfam: SdiA-regulated LRR_1 > PROSITE: LEU_RICH > DBLINKS NCBI-GI: 153792305 > NCBI-GeneID: 116064 > HGNC: 26968 > Ensembl: ENSG00000163428 > UniProt: Q96CX6 > > Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and > PROSITE, > but doesnt give me HGNC and UniProt. For other entries it gives me > other > combinations of dbs. > > Can anyone help me with this. Why is this happenning? I have no clue. > > Thanks and Regards, > Neeti. > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From akarger at CGR.Harvard.edu Mon Nov 19 10:38:26 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 19 Nov 2007 10:38:26 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> References: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Message-ID: > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 13, 2007 12:42 PM > To: Amir Karger > Cc: Steve Chervitz; Dave Messina; bioperl-l > Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? > > Amir, > > Can you file this as a bug? Done. http://bugzilla.open-bio.org/show_bug.cgi?id=2399 > Dave mentioned he would look > into it but > I think it warrants tracking to make sure it gets fixed: > > http://www.bioperl.org/wiki/Bugs > > Attach the example BLAST report from your last post to the report. > BTW, I wonder how this appears in XML output? > > chris > > On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: > > >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > >> Of Steve Chervitz > >> > >> The Bioperl blast parser should extract that value and you > can obtain > >> it from an HSP object, via the HSPI::n() method, documented here: > >> > >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > > io/Search/HSP/HSPI.html#POD23 > > > > As I mentioned in my email: > > > > And does anyone know off-hand if Bioperl will tell me when > situations > > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > > subroutine > > would help, but I just get a bunch of empty strings for that, > > whether or > > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > > {"_n"} is > > undef.) > > > > And the docs for n() actually say, "This value is not defined with > > NCBI > > Blast2 with gapping" although they don't say why. Which may > explain > > why, > > when I ran the following code on the blast result I included in my > > last > > email, I got empty values for all of the n's. (Why is n() > undefined > > for > > gapped blast if I'm getting n's in my results from that blast?) > > > > use warnings; > > use strict; > > use Bio::SearchIO; > > > > my $blast_out = $ARGV[0]; > > my $in = new Bio::SearchIO(-format => 'blast', > > -file => $blast_out, > > -report_type => 'tblastn'); > > > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart > Send Frame N > > Evalue)), "\n"; > > while(my $query = $in->next_result) { > > while(my $subject = $query->next_hit) { > > while (my $hsp = $subject->next_hsp) { > > print join("\t", > > $query->query_name, > > $hsp->start("query"), > > $hsp->end("query"), > > $hsp->strand("hit"), > > $subject->name, > > $hsp->start("hit"), > > $hsp->end("hit"), > > $subject->frame, > > $hsp->n, > > $hsp->evalue, > > ),"\n"; > > } > > } > > } > > > >> Dave's basically correct in his explanation. It's a result of the > >> application of sum statistics by the blast algorithm. You > can read > >> all > >> about it in Korf et al's BLAST book. Here's the relevant section: > > > > [snip] > > > > Thanks, > > > > -Amir > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From aaron.j.mackey at gsk.com Mon Nov 19 11:50:53 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 19 Nov 2007 11:50:53 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: Message-ID: While Lucia's subject line asked for genbank2gff, her message actually asked the reverse (gff + fasta -> genbank). e.g. pretend you had to prepare a genome annotation for submission to GenBank ... and no, I don't know of any generalized gff2genbank script out there ... Lucia, the SeqIO::genbank module will write GenBank format, but you have to get all the bits and bobs together in the right way, i.e. construct the various AnnotationCollections and SeqFeatures (with SplitLocations for exons, CDS, etc.) that a GenBank record expects. One way to do this is to start with a template GenBank file that you'd like to mimic, strip it down to only two gene models, use SeqIO::genbank to read it into memory, and then step through the object with the Perl debugger to see how it is composed. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM: > Chris, > > There's also a genbank2gff3.PLS script in the GMOD package ( > http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? > revision=1.5&view=markup). However, it has not been modified for a couple of > years, it may not be the "preferred" script. > > See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and > http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information > on using Bioperl's bp_genbank2gff3 script. > > Brian O. > > > On 11/15/07 1:43 PM, "Chris Fields" wrote: > > > There are currently many ways to get what you want, but not all are > > consistent (particularly re: GFF3). We are aiming for more > > consistent, compliant GFF/GTF output in the next developer series > > (1.7) of Bioperl. > > > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > > scripts directory); these are probably the most common way when > > working directly from a seq record. Bio::Tools::GFF is the most > > commonly used class though I'm unsure of it's status for GFF3 > > output. From within a Bio::SeqI you can call write_gff() (currently > > not very flexible) or from the SeqFeature itself gff_string(). > > Bio::Graphics::Feature has the additional method gff3_string(). > > Bio::FeatureIO is also an option, though I would consider it very > > experimental (it will likely undergo significant revision in the next > > bioperl dev series). > > > > Any others anyone can think of, maybe non-BioPerl related as well? > > > > chris > > > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > > > >> Hi > >> I was asked this question recently > >> and it occurred to me I must be doing things inefficiently > >> To produce gff file I was using SeqIO to parse the required fields, > >> then > >> according to the conventions just printing out whatever was > >> required tab > >> delimited, which is easy > >> > >> but if I wanted to generate a genbank file, extracting features > >> from a gff file > >> and a plain fasta file it was more complicated > >> is there support for gff in bioperl now? > >> anyone can contribute with smart way to go from/to gff, genebank > >> and embl? > >> > >> thanks very much > >> > >> Lucia Peixoto > >> Department of Biology,SAS > >> University of Pennsylvania > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From johnston at biochem.ucl.ac.uk Mon Nov 19 09:46:03 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT) Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: On Mon, 19 Nov 2007, Chris Fields wrote: > Here's a recent list (don't know if it's up-to-date): > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html Thanks. Perhaps I missed something in the docs, but I don't think I've quite understood how this is supposed to work. I'm trying to blast primer sequences against the ref genome sequence. Should I be using ref_contig? How can I limit the blast to a single species? cheers, Cass. From Kevin.M.Brown at asu.edu Mon Nov 19 13:31:38 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 19 Nov 2007 11:31:38 -0700 Subject: [Bioperl-l] pSW vs dpAlign Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu> I was able to get the Ext package installed, just had to copy the Align.pm file up one directory from where it was being put by the installer. Now I have a technician trying to use pSW (Bio::Tools::pSW) and it appears to have been last updated back in '99 and seems to lack certain methods to get things out of the alignment like the score. The test.pl script that Bio::Ext comes with actually uses Bio::Tools::dpAlign. Is dpAlign the replacement for pSW? From bernd.web at gmail.com Wed Nov 21 11:42:40 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 17:42:40 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Hi Russell, I came across your question. At first I thought all was well on my system, but indeed I also have these colouring problems. I noted that scrore in the bgcolor callback gets a different value! Printing score during hit parsing($hit->raw_score) gives the same score as -description my $score = $feature->score; However, printing score in the bgcolor sub gives 2573! All scores in the bgcolor routine all different and higher than the real scores. Were you able to solve this colouring issue? Regards, Bernd > Hi all, > I'm using a modified version of Lincoln's tutorial > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > to give a similar image to that from NCBI but for some reason, my > colours are coming out wrong (see attached example) > They seem to be off by one but I can't see why. > > Any ideas? > > I can't be certain but I think it's only started doing this since our > BLAST upgrade to 2.2.17 a few weeks ago. > > Here's the colouring code: > ------------------------------------------------------------------------ > ------- > my $track = $panel->add_track( > -glyph => 'segments', > -label => 1, > -connector => 'dashed', > -bgcolor => sub { > my $feature = shift; > my $score = $feature->score; > return 'red' if $score >= 200; > return 'fuchsia' if $score >= 80; > return 'lime' if $score >= 50; > return 'blue' if $score >= 40; > return 'black'; > }, > -font2color => 'gray', > -sort_order => 'high_score', > -description => sub { > my $feature = shift; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my $score = $feature->score; > "$description, score=$score"; > }, > ); > ------------------------------------------------------------------------ > --------- > > > Thanx, > > Russell Smithies > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Wed Nov 21 12:38:30 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 18:38:30 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Hi, I now found that bgcolor is using a $feature->score that is coming directly from the blast report, it is not the bit score. -bgcolor => sub {my $feature = shift; my $score = $feature->score; print "$score\n"; } always print the score, even if the score is not set in the Bio::SeqFeature::Generic object. -description callbacks are somehow using the score from the SeqFeature object. Does anyone have an idea why? Further is is possible to get the raw_score of a hit. $hit->raw_score actually gets the bitscore (w/o decimal point). Bernd On Nov 21, 2007 5:42 PM, Bernd Web wrote: > Hi Russell, > > I came across your question. At first I thought all was well on my > system, but indeed I also have these colouring problems. > I noted that scrore in the bgcolor callback gets a different value! > Printing score during hit parsing($hit->raw_score) gives the same > score as -description > my $score = $feature->score; However, printing score in the bgcolor > sub gives 2573! > All scores in the bgcolor routine all different and higher than the > real scores. Were you able to solve this colouring issue? > > Regards, > Bernd > > > > Hi all, > > I'm using a modified version of Lincoln's tutorial > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > to give a similar image to that from NCBI but for some reason, my > > colours are coming out wrong (see attached example) > > They seem to be off by one but I can't see why. > > > > Any ideas? > > > > I can't be certain but I think it's only started doing this since our > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > Here's the colouring code: > > ------------------------------------------------------------------------ > > ------- > > my $track = $panel->add_track( > > -glyph => 'segments', > > -label => 1, > > -connector => 'dashed', > > -bgcolor => sub { > > my $feature = shift; > > my $score = $feature->score; > > return 'red' if $score >= 200; > > return 'fuchsia' if $score >= 80; > > return 'lime' if $score >= 50; > > return 'blue' if $score >= 40; > > return 'black'; > > }, > > -font2color => 'gray', > > -sort_order => 'high_score', > > -description => sub { > > my $feature = shift; > > return unless > > $feature->has_tag('description'); > > my ($description) = > > $feature->each_tag_value('description'); > > my $score = $feature->score; > > "$description, score=$score"; > > }, > > ); > > ------------------------------------------------------------------------ > > --------- > > > > > > Thanx, > > > > Russell Smithies > > > > > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From sac at bioperl.org Wed Nov 21 13:43:54 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 21 Nov 2007 10:43:54 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> On Nov 21, 2007 9:38 AM, Bernd Web wrote: > [snip] > > Further is is possible to get the raw_score of a hit. $hit->raw_score > actually gets the bitscore (w/o decimal point). Hmmm. raw_score should not be the same as bit score. So given an example blast hit line such as: Score = 60.0 bits (30), Expect = 1e-06 $hit->raw_score() should return 30, not 60, as you seem to be getting. Could you submit a bug report for this? http://www.bioperl.org/wiki/Bugs Thanks, Steve > > On Nov 21, 2007 5:42 PM, Bernd Web wrote: > > Hi Russell, > > > > I came across your question. At first I thought all was well on my > > system, but indeed I also have these colouring problems. > > I noted that scrore in the bgcolor callback gets a different value! > > Printing score during hit parsing($hit->raw_score) gives the same > > score as -description > > my $score = $feature->score; However, printing score in the bgcolor > > sub gives 2573! > > All scores in the bgcolor routine all different and higher than the > > real scores. Were you able to solve this colouring issue? > > > > Regards, > > Bernd > > > > > > > Hi all, > > > I'm using a modified version of Lincoln's tutorial > > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > > to give a similar image to that from NCBI but for some reason, my > > > colours are coming out wrong (see attached example) > > > They seem to be off by one but I can't see why. > > > > > > Any ideas? > > > > > > I can't be certain but I think it's only started doing this since our > > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > > > Here's the colouring code: > > > ------------------------------------------------------------------------ > > > ------- > > > my $track = $panel->add_track( > > > -glyph => 'segments', > > > -label => 1, > > > -connector => 'dashed', > > > -bgcolor => sub { > > > my $feature = shift; > > > my $score = $feature->score; > > > return 'red' if $score >= 200; > > > return 'fuchsia' if $score >= 80; > > > return 'lime' if $score >= 50; > > > return 'blue' if $score >= 40; > > > return 'black'; > > > }, > > > -font2color => 'gray', > > > -sort_order => 'high_score', > > > -description => sub { > > > my $feature = shift; > > > return unless > > > $feature->has_tag('description'); > > > my ($description) = > > > $feature->each_tag_value('description'); > > > my $score = $feature->score; > > > "$description, score=$score"; > > > }, > > > ); > > > ------------------------------------------------------------------------ > > > --------- > > > > > > > > > Thanx, > > > > > > Russell Smithies > > > > > > > > > > > > > > > ======================================================================= > > > Attention: The information contained in this message and/or attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or privileged > > > material. Any review, retransmission, dissemination or other use of, or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From binkley at genome.stanford.edu Wed Nov 21 19:35:02 2007 From: binkley at genome.stanford.edu (Jonathan Binkley) Date: Wed, 21 Nov 2007 16:35:02 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Hi, I installed bioperl on a Mac (OS 10.4, Intel) via fink, which put it here: /sw/lib/perl5/5.8.6/Bio/ It seems to work fine, but I need bioperl-ext for Smith-Waterman alignments. So, into which directory should I download bioperl-ext and run the Makefile? Thanks. From dcj at sanger.ac.uk Thu Nov 22 09:47:09 2007 From: dcj at sanger.ac.uk (Daniel Jeffares) Date: Thu, 22 Nov 2007 14:47:09 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml Message-ID: Hi all, Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to be a little 'broken', at least in my hands. First, $bml->set_parameter('runmode', 0); does not work (sets runmode to -2). setting runmode to 1 is OK. Also, $bml->no_param_checks(1); doesn't seem to work. The result is that the baseml.ctl file created under /tmp is not runnable by baseml with runmode 0. The phylip file created is run OK by baeml(with another .ctl file). My script & baseml.ctl below. Hope it can be fixed, cheers, Dan #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; my $alignio = Bio::AlignIO->new(-format => 'phylip',-file => 'test.phy'); my $aln = $alignio->next_aln; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; The baseml.ctl file produced: seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA outfile = mlb fix_rho = 1 verbose = 0 noisy = 0 RateAncestor = 1 kappa = 2.5 model = 0 ndata = 5 Small_Diff = 1e-6 runmode = -2 alpha = 0 fix_kappa = 0 rho = 0 nhomo = 0 getSE = 0 cleandata = 1 fix_alpha = 1 clock = 0 Malpha = 0 ncatG = 5 fix_blength = -1 nparK = 0 Regards, Daniel Jeffares ______________________________ Population and Comparative Genomics Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK Phone: +44(0)1223 834244 x 7297 Fax: +44 (0)1223 494919 www.sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Nov 22 11:06:16 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 22 Nov 2007 17:06:16 +0100 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: References: Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Daniel, I don't have bioperl-run or PAML installed on my system to test it myself, but have you tried the latest version of bioperl-run from CVS? It looks like that code has been worked on since 1.5.2 was released. If that still doesn't work, could you file this as a bug to make sure it gets followed up? Dave You can grab the tarball here: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl and if necessary file the bug here: BioPerl Bugzilla tracking system From arareko at campus.iztacala.unam.mx Thu Nov 22 11:37:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 22 Nov 2007 10:37:24 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> Message-ID: <4745B044.5090102@campus.iztacala.unam.mx> Hi Peter, In BioPerl, there's no such mapping for db_xref's that I'm aware of. Each parser handles db_xref records on its own. Take a look at the Bio::SeqIO::genbank code, inside the next_seq() method for example: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup Regards, Mauricio. Peter wrote: > Dear all, > > I'm one of the Biopython developers. I've recently got going with > BioSQL and have been getting to grips with the Biopython BioSQL > interface. I'm aware that we need to try and be consistent with > BioPerl and BioJava, so I'd like to pose my first question related to > that. > > When loading GenBank records, many features have db_xref qualifiers, > e.g. from a random CDS feature in E. coli K12: > > /db_xref="ASAP:1309" > /db_xref="GI:16128366" > /db_xref="ECOCYC:EG10213" > /db_xref="GeneID:945313" > > Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", > "GeneID" before using recording these entries in the seqfeature_dbxref > and dbxref tables. For example, "GI" becomes "GeneIndex". > Biopython's current mapping is as follows: > > # Dictionary of database types, keyed by GenBank db_xref abbreviation > db_dict = {'GeneID': 'Entrez', > 'GI': 'GeneIndex', > 'COG': 'COG', > 'CDD': 'CDD', > 'DDBJ': 'DNA Databank of Japan', > 'Entrez': 'Entrez', > 'GeneIndex': 'GeneIndex', > 'PUBMED': 'PubMed', > 'taxon': 'Taxon', > 'ATCC': 'ATCC', > 'ISFinder': 'ISFinder', > 'GOA': 'Gene Ontology Annotation', > 'ASAP': 'ASAP', > 'PSEUDO': 'PSEUDO', > 'InterPro': 'InterPro', > 'GEO': 'Gene Expression Omnibus', > 'EMBL': 'EMBL', > 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', > 'ECOCYC': 'EcoCyc', > 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' > } > > In my testing, I've found several GenBank db_xref abbreviation for > which we don't have a mapping defined, such as "LocusID", "dbSNP", > "MGD", "MIM", or from an EMBL file, "REMTREMBL". > > I'd like to know if BioPerl and/or BioJava and/or BioRuby define a > similar mapping in their BioSQL code (or GenBank parser), so that > Biopython can follow your example. > > Thank you, > > Peter > > P.S. See also Biopython bug 2405 > http://bugzilla.open-bio.org/show_bug.cgi?id=2405 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From avilella at gmail.com Thu Nov 22 16:55:10 2007 From: avilella at gmail.com (Albert Vilella) Date: Thu, 22 Nov 2007 21:55:10 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Hi, Am I right in thinking that the '_symbols' hash in SimpleAlign is only used if one calls the symbol_chars method? When I comment out this line: map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if $seq->seq; # line 257 I get a nice speed boost on loading alignments. Can I comment this line out in the CVS HEAD? Cheers, Albert. [init] 5.96046447753906e-06 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.0022270679473877 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 2.14348912239075 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 6.91910791397095 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 15.8402290344238 secs... avilella at magneto:~$ perl /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl -dir /home/avilella/ensembl/exoseq/test -verbose [init] 1.21593475341797e-05 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.00294303894042969 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 0.510555982589722 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 1.6192569732666 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 3.86473417282104 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta] 6.99602198600769 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta] 7.26704716682434 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta] 8.44332504272461 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta] 12.103296995163 secs... From cjfields at uiuc.edu Thu Nov 22 19:30:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:30:51 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu> How are tests affected? It might be worth going through the revision history to see if there was a specific reason this was implemented, but if it passes tests I don't see why we need it. chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 22 19:42:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:42:12 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> <4745B044.5090102@campus.iztacala.unam.mx> Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu> I think SeqIO checks the name for parsing reasons only, in cases where the format changes based on the source (such as GenPept DBSOURCE data). I don't think we go beyond that in Bioperl, probably b/c modifying or expanding names for data persistence would lead to volatile coding issues (i.e. consistency between parsers, constant updating to cover new crossrefs, etc). I would definitely suggest retaining the original DB as it appears in the dbxref for consistency/sanity; if needed return expanded names using a different method if they are designated. chris On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote: > Hi Peter, > > In BioPerl, there's no such mapping for db_xref's that I'm aware of. > Each parser handles db_xref records on its own. Take a look at the > Bio::SeqIO::genbank code, inside the next_seq() method for example: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup > > Regards, > Mauricio. > > Peter wrote: >> Dear all, >> >> I'm one of the Biopython developers. I've recently got going with >> BioSQL and have been getting to grips with the Biopython BioSQL >> interface. I'm aware that we need to try and be consistent with >> BioPerl and BioJava, so I'd like to pose my first question related to >> that. >> >> When loading GenBank records, many features have db_xref qualifiers, >> e.g. from a random CDS feature in E. coli K12: >> >> /db_xref="ASAP:1309" >> /db_xref="GI:16128366" >> /db_xref="ECOCYC:EG10213" >> /db_xref="GeneID:945313" >> >> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", >> "GeneID" before using recording these entries in the >> seqfeature_dbxref >> and dbxref tables. For example, "GI" becomes "GeneIndex". >> Biopython's current mapping is as follows: >> >> # Dictionary of database types, keyed by GenBank db_xref abbreviation >> db_dict = {'GeneID': 'Entrez', >> 'GI': 'GeneIndex', >> 'COG': 'COG', >> 'CDD': 'CDD', >> 'DDBJ': 'DNA Databank of Japan', >> 'Entrez': 'Entrez', >> 'GeneIndex': 'GeneIndex', >> 'PUBMED': 'PubMed', >> 'taxon': 'Taxon', >> 'ATCC': 'ATCC', >> 'ISFinder': 'ISFinder', >> 'GOA': 'Gene Ontology Annotation', >> 'ASAP': 'ASAP', >> 'PSEUDO': 'PSEUDO', >> 'InterPro': 'InterPro', >> 'GEO': 'Gene Expression Omnibus', >> 'EMBL': 'EMBL', >> 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', >> 'ECOCYC': 'EcoCyc', >> 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' >> } >> >> In my testing, I've found several GenBank db_xref abbreviation for >> which we don't have a mapping defined, such as "LocusID", "dbSNP", >> "MGD", "MIM", or from an EMBL file, "REMTREMBL". >> >> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a >> similar mapping in their BioSQL code (or GenBank parser), so that >> Biopython can follow your example. >> >> Thank you, >> >> Peter >> >> P.S. See also Biopython bug 2405 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2405 >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 22 19:49:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:49:15 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Albert, Found it: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ SimpleAlign.pm.diff?r1=1.36&r2=1.37 If it slows performance that dramatically, maybe we can move this to a separate AlignUtils method instead. Maybe something to ask Jason about? chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 23 07:29:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Nov 2007 12:29:37 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Message-ID: <4746C7B1.1010002@sendu.me.uk> Dave Messina wrote: > Daniel, > > I don't have bioperl-run or PAML installed on my system to test it myself, > but have you tried the latest version of bioperl-run from CVS? It looks like > that code has been worked on since 1.5.2 was released. Yes, I fixed it in CVS so it should at least /run/. I don't know about the parsing side of things, though that may also have been fixed recently by someone else. From avilella at gmail.com Fri Nov 23 08:08:59 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Nov 2007 13:08:59 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <4746C7B1.1010002@sendu.me.uk> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Just to mention that the new paml4 has a "basemlg" instead of a "baseml" binary. AFAIK, Jason fixed codeml to make it work both for paml3.xx a paml4, but I am not sure about baseml. Also, I think if you set runmode 0, you have to provide a tree: #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy'); my $treeio = Bio::TreeIO->new(-format => 'newick', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree'); my $aln = $alignio->next_aln; my $tree = $treeio->next_tree; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->tree($tree); $bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml"); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while ( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); $DB::single=1;1; # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; 4 50 Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC- Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC ACAUUUU-CCUUGCAAAG ACAUCAU-CCUUGCAAAG ACAUCAUCCCUCGCAGAG ACAUCAUCCCUUGCAGAG (((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm); On Nov 23, 2007 12:29 PM, Sendu Bala wrote: > Dave Messina wrote: > > Daniel, > > > > I don't have bioperl-run or PAML installed on my system to test it myself, > > but have you tried the latest version of bioperl-run from CVS? It looks like > > that code has been worked on since 1.5.2 was released. > > Yes, I fixed it in CVS so it should at least /run/. I don't know about > the parsing side of things, though that may also have been fixed > recently by someone else. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Fri Nov 23 11:24:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 10:24:59 -0600 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu> I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just 'basemlg'), so it would need to work with both. Do we want to put a PAML parser/wrapper overhaul on the TODO list for 1.6? chris On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote: > Just to mention that the new paml4 has a "basemlg" instead of a > "baseml" binary. AFAIK, Jason fixed codeml to make it work both for > paml3.xx a paml4, but I am not sure about baseml. ... From arvindvanam at gmail.com Fri Nov 23 16:26:06 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl Message-ID: <13918981.post@talk.nabble.com> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); my $rnafold = $factory->program('rnafold'); my $job=$rnafold->run(-rnafold => 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); I installed Vienna package and then i tried using Pise to create an object for the program but its giving the following error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bio::Tools::Run::PiseJob terminated: URL missing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::PiseJob::terminated /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 STACK: Bio::Tools::Run::PiseApplication::submit /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 STACK: Bio::Tools::Run::PiseApplication::run /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 STACK: evaluate.pl:12 how to make the program RNAfold run in perl... IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? plz reply soon -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Fri Nov 23 17:49:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 16:49:43 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13918981.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> The Pise wrappers run the programs remotely; see Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ mfold wrappers but haven't done so yet. The Vienna tools do have a Perl-based (non-BioPerl-based) module included which uses libRNA, and is well worth a look. Try 'perldoc RNA' if you have installed the tools locally, or look here for other Perl-based tools: http://www.tbi.univie.ac.at/~ivo/RNA/utils.html chris On Nov 23, 2007, at 3:26 PM, vanam wrote: > > how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? > > my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); > my $rnafold = $factory->program('rnafold'); > my $job=$rnafold->run(-rnafold => > 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); > > I installed Vienna package and then i tried using Pise to create an > object > for the program but its giving the following error > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bio::Tools::Run::PiseJob terminated: URL missing > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::PiseJob::terminated > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 > STACK: Bio::Tools::Run::PiseApplication::submit > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 > STACK: Bio::Tools::Run::PiseApplication::run > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 > STACK: evaluate.pl:12 > > > how to make the program RNAfold run in perl... > IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? > > plz reply soon > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13918981 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Sat Nov 24 02:29:11 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> Message-ID: <13922740.post@talk.nabble.com> i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and i used it exactly as it was mentioned in it. i just want that instead of running its perl version "RNAfold.pl" I can use the functions associated with RNAfold with a perl program without having to call the program using system() command. if you can just tell me how to use these wrapper modules it would b of gr8 help...like while using clustalw or clustalx we define the environment variable for it ..do we have to do the same for RNAfold or Mfold Chris Fields wrote: > > The Pise wrappers run the programs remotely; see > Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a > local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ > mfold wrappers but haven't done so yet. The Vienna tools do have a > Perl-based (non-BioPerl-based) module included which uses libRNA, and > is well worth a look. Try 'perldoc RNA' if you have installed the > tools locally, or look here for other Perl-based tools: > > http://www.tbi.univie.ac.at/~ivo/RNA/utils.html > > chris > > On Nov 23, 2007, at 3:26 PM, vanam wrote: > >> >> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >> >> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >> my $rnafold = $factory->program('rnafold'); >> my $job=$rnafold->run(-rnafold => >> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >> >> I installed Vienna package and then i tried using Pise to create an >> object >> for the program but its giving the following error >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::PiseJob::terminated >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >> STACK: Bio::Tools::Run::PiseApplication::submit >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >> STACK: Bio::Tools::Run::PiseApplication::run >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >> STACK: evaluate.pl:12 >> >> >> how to make the program RNAfold run in perl... >> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >> >> plz reply soon >> -- >> View this message in context: http://www.nabble.com/run-RNAfold-in- >> perl-tf4863835.html#a13918981 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From avilella at gmail.com Sun Nov 25 06:50:42 2007 From: avilella at gmail.com (Albert Vilella) Date: Sun, 25 Nov 2007 11:50:42 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> cvs commited now. it is calculated anyway when calling symbol_chars so... On Nov 23, 2007 12:49 AM, Chris Fields wrote: > Albert, > > Found it: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > SimpleAlign.pm.diff?r1=1.36&r2=1.37 > > If it slows performance that dramatically, maybe we can move this to > a separate AlignUtils method instead. Maybe something to ask Jason > about? > > chris > > On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > > > > Hi, > > > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > > used if one calls the symbol_chars method? > > > > When I comment out this line: > > > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > > $seq->seq; # line 257 > > > > I get a nice speed boost on loading alignments. > > > > Can I comment this line out in the CVS HEAD? > > > > Cheers, > > > > Albert. > > > > [init] 5.96046447753906e-06 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.0022270679473877 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 2.14348912239075 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 6.91910791397095 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 15.8402290344238 secs... > > > > avilella at magneto:~$ perl > > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > > ancestral_alleles.pl > > -dir /home/avilella/ensembl/exoseq/test -verbose > > [init] 1.21593475341797e-05 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.00294303894042969 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 0.510555982589722 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 1.6192569732666 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 3.86473417282104 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000203717.chr1.fasta] > > 6.99602198600769 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000196188.chr1.fasta] > > 7.26704716682434 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000025800.chr1.fasta] > > 8.44332504272461 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000117475.chr1.fasta] > > 12.103296995163 secs... > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From cjfields at uiuc.edu Sun Nov 25 10:05:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:05:27 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13922740.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Again, these wrappers are for submitting data to a Pise server for the corresponding programs (run on a remote server). There are no wrappers for running RNAfold on your computer (i.e. locally), with or w/o a set env. variable. You can try instaling Pise locally and setting the location() as shown in POD to localhost, however I don't know how stable these modules are with newer versions of Pise. These haven't been updated in a few years, apart from getting tests to work. Another option is installing EMBOSS along with the EMBASSY version of RNAFold; this could conceivably be run through Bio::Factory::EMBOSS. chris On Nov 24, 2007, at 1:29 AM, vanam wrote: > > i have seen the documentation for > Bio::Tools::Run::AnalysisFactory::Pise and > i used it exactly as it was mentioned in it. > > i just want that instead of running its perl version "RNAfold.pl" I > can use > the functions associated with RNAfold with a perl program without > having to > call the program using system() command. > > if you can just tell me how to use these wrapper modules it would b > of gr8 > help...like while using clustalw or clustalx we define the environment > variable for it ..do we have to do the same for RNAfold or Mfold > > > > > Chris Fields wrote: >> >> The Pise wrappers run the programs remotely; see >> Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a >> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ >> mfold wrappers but haven't done so yet. The Vienna tools do have a >> Perl-based (non-BioPerl-based) module included which uses libRNA, and >> is well worth a look. Try 'perldoc RNA' if you have installed the >> tools locally, or look here for other Perl-based tools: >> >> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html >> >> chris >> >> On Nov 23, 2007, at 3:26 PM, vanam wrote: >> >>> >>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >>> >>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >>> my $rnafold = $factory->program('rnafold'); >>> my $job=$rnafold->run(-rnafold => >>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >>> >>> I installed Vienna package and then i tried using Pise to create an >>> object >>> for the program but its giving the following error >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >>> STACK: Bio::Tools::Run::PiseJob::terminated >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >>> STACK: Bio::Tools::Run::PiseApplication::submit >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >>> STACK: Bio::Tools::Run::PiseApplication::run >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >>> STACK: evaluate.pl:12 >>> >>> >>> how to make the program RNAfold run in perl... >>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >>> >>> plz reply soon >>> -- >>> View this message in context: http://www.nabble.com/run-RNAfold-in- >>> perl-tf4863835.html#a13918981 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13922740 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 10:38:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:38:40 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: Albert, I was getting a single AlignIO.t fail which appeared to be related to this: ... ok 122 - The object isa Bio::Align::AlignI ok 123 - consensus_string on metafasta not ok 124 - symbol_chars() using metafasta # Failed test 'symbol_chars() using metafasta' # in t/AlignIO.t at line 346. # got: '0' # expected: '23' It was b/c the symbol hash was initialized in the constructor (so it was present, just empty). I have changed that in CVS; all tests pass now. chris On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > cvs commited now. it is calculated anyway when calling symbol_chars > so... > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: >> Albert, >> >> Found it: >> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >> Bio/ >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >> >> If it slows performance that dramatically, maybe we can move this to >> a separate AlignUtils method instead. Maybe something to ask Jason >> about? >> >> chris >> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >> >> >>> Hi, >>> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>> only >>> used if one calls the symbol_chars method? >>> >>> When I comment out this line: >>> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>> $seq->seq; # line 257 >>> >>> I get a nice speed boost on loading alignments. >>> >>> Can I comment this line out in the CVS HEAD? >>> >>> Cheers, >>> >>> Albert. >>> >>> [init] 5.96046447753906e-06 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.0022270679473877 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 2.14348912239075 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 6.91910791397095 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 15.8402290344238 secs... >>> >>> avilella at magneto:~$ perl >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>> ancestral_alleles.pl >>> -dir /home/avilella/ensembl/exoseq/test -verbose >>> [init] 1.21593475341797e-05 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.00294303894042969 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 0.510555982589722 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 1.6192569732666 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 3.86473417282104 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000203717.chr1.fasta] >>> 6.99602198600769 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000196188.chr1.fasta] >>> 7.26704716682434 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000025800.chr1.fasta] >>> 8.44332504272461 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000117475.chr1.fasta] >>> 12.103296995163 secs... >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd.web at gmail.com Sun Nov 25 11:13:44 2007 From: bernd.web at gmail.com (Bernd Web) Date: Sun, 25 Nov 2007 17:13:44 +0100 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Hi, I am not sure if this is related, but I remember SimpleAlign was adapted to cope with more gap symbols that can occur in alignments/FastA sequences, as: . _ - = Previous versions would throw an error on 'illegal' gap characters, Regards, Bernd On Nov 25, 2007 4:38 PM, Chris Fields wrote: > Albert, > > I was getting a single AlignIO.t fail which appeared to be related to > this: > > ... > ok 122 - The object isa Bio::Align::AlignI > ok 123 - consensus_string on metafasta > > not ok 124 - symbol_chars() using metafasta > # Failed test 'symbol_chars() using metafasta' > # in t/AlignIO.t at line 346. > # got: '0' > # expected: '23' > > It was b/c the symbol hash was initialized in the constructor (so it > was present, just empty). I have changed that in CVS; all tests pass > now. > > chris > > > On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > > > cvs commited now. it is calculated anyway when calling symbol_chars > > so... > > > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: > >> Albert, > >> > >> Found it: > >> > >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > >> Bio/ > >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 > >> > >> If it slows performance that dramatically, maybe we can move this to > >> a separate AlignUtils method instead. Maybe something to ask Jason > >> about? > >> > >> chris > >> > >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > >> > >> > >>> Hi, > >>> > >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is > >>> only > >>> used if one calls the symbol_chars method? > >>> > >>> When I comment out this line: > >>> > >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > >>> $seq->seq; # line 257 > >>> > >>> I get a nice speed boost on loading alignments. > >>> > >>> Can I comment this line out in the CVS HEAD? > >>> > >>> Cheers, > >>> > >>> Albert. > >>> > >>> [init] 5.96046447753906e-06 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.0022270679473877 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 2.14348912239075 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 6.91910791397095 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 15.8402290344238 secs... > >>> > >>> avilella at magneto:~$ perl > >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > >>> ancestral_alleles.pl > >>> -dir /home/avilella/ensembl/exoseq/test -verbose > >>> [init] 1.21593475341797e-05 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.00294303894042969 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 0.510555982589722 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 1.6192569732666 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 3.86473417282104 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000203717.chr1.fasta] > >>> 6.99602198600769 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000196188.chr1.fasta] > >>> 7.26704716682434 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000025800.chr1.fasta] > >>> 8.44332504272461 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000117475.chr1.fasta] > >>> 12.103296995163 secs... > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sun Nov 25 11:39:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 10:39:01 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Message-ID: Bernd, That would be when generating Bio::LocatableSeq instances for building a Bio::SimpleAlign object. Judging by test suite results that doesn't appear to be affected. chris On Nov 25, 2007, at 10:13 AM, Bernd Web wrote: > Hi, > > I am not sure if this is related, but I remember SimpleAlign was > adapted to cope with more gap symbols that can occur in > alignments/FastA sequences, as: . _ - = > Previous versions would throw an error on 'illegal' gap characters, > > Regards, > Bernd > > On Nov 25, 2007 4:38 PM, Chris Fields wrote: >> Albert, >> >> I was getting a single AlignIO.t fail which appeared to be related to >> this: >> >> ... >> ok 122 - The object isa Bio::Align::AlignI >> ok 123 - consensus_string on metafasta >> >> not ok 124 - symbol_chars() using metafasta >> # Failed test 'symbol_chars() using metafasta' >> # in t/AlignIO.t at line 346. >> # got: '0' >> # expected: '23' >> >> It was b/c the symbol hash was initialized in the constructor (so it >> was present, just empty). I have changed that in CVS; all tests pass >> now. >> >> chris >> >> >> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: >> >>> cvs commited now. it is calculated anyway when calling symbol_chars >>> so... >>> >>> On Nov 23, 2007 12:49 AM, Chris Fields wrote: >>>> Albert, >>>> >>>> Found it: >>>> >>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >>>> Bio/ >>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >>>> >>>> If it slows performance that dramatically, maybe we can move >>>> this to >>>> a separate AlignUtils method instead. Maybe something to ask Jason >>>> about? >>>> >>>> chris >>>> >>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>>>> only >>>>> used if one calls the symbol_chars method? >>>>> >>>>> When I comment out this line: >>>>> >>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>>>> $seq->seq; # line 257 >>>>> >>>>> I get a nice speed boost on loading alignments. >>>>> >>>>> Can I comment this line out in the CVS HEAD? >>>>> >>>>> Cheers, >>>>> >>>>> Albert. >>>>> >>>>> [init] 5.96046447753906e-06 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.0022270679473877 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 2.14348912239075 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 6.91910791397095 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 15.8402290344238 secs... >>>>> >>>>> avilella at magneto:~$ perl >>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>>>> ancestral_alleles.pl >>>>> -dir /home/avilella/ensembl/exoseq/test -verbose >>>>> [init] 1.21593475341797e-05 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.00294303894042969 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 0.510555982589722 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 1.6192569732666 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 3.86473417282104 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000203717.chr1.fasta] >>>>> 6.99602198600769 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000196188.chr1.fasta] >>>>> 7.26704716682434 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000025800.chr1.fasta] >>>>> 8.44332504272461 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000117475.chr1.fasta] >>>>> 12.103296995163 secs... >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 13:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 12:51:42 -0600 Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu> I have been making some significant changes to Bio::SeqIO::staden::read over the last few months which incorporate code from Bugzilla (bugs 2074 and 2329, very kindly donated from Chris Bailey and Joel Martin, cheers!). Significant Changes: * All Inline code in staden::read are now XS-based * A new method has been added to Bio::SeqIO::staden::read for optionally getting trace data (i.e. for drawing graphs). The method ode is now implemented in Bio::SeqIO::abi, with example code in examples/quality/svgtrace.pl. These changes should allow newer versions of Staden io_lib as well (the code is tested with io_lib 1.9.2), though they haven't been tested extensively as I am having problems compiling newer io_lib versions on my MacBook. It's very likely more changes will need to be made along the way; some issues were found with XS compilation which appear harmless but need to be investigated, and trace data from other formats need to be evaluated. The possibility exists that many of these changes break backward compatibility with older bioperl releases, though tests passed with bioperl 1.5.2. Any feedback re: platform issues, test results using newer io_lib versions, older bioperl-versions, etc would be greatly appreciated. I'm hoping this will stimulate more interest in getting other bioperl- ext modules up-to-date with bioperl-live. chris From cjfields at uiuc.edu Mon Nov 26 13:59:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 12:59:23 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: Steve, Bernd, (and Jason, since you may have some input on this as well), I am now looking into the bug Bernd submitted and it seems there is a serious discrepancy with the way the hit raw_score, bits, and significance is determined for Hit objects. Unless I am mistaken these should always come from the best HSP when they are present, falling back to the hit table data only when no HSP alignments are present. Under the latter conditions a minimal Hit object is made from data in the hit table, which reports the rounded bit score, not the raw score, so in those cases the raw score would be undefined (and you probably should get a nasty warning indicating there are no HSPs present to get the data from). What is occurring now, though, is the raw_score and significance is explicitly set from the hit table in the BLAST parser for the Hit object at all times, while the bits are correctly derived from the best HSP (no fallback to the hit table). Changing to the behavior above results in several tests failing via SearchIO.t, with each failed test reporting the expected (read:correct) raw score. I'll look through the tests just in case, but I am planning on committing changes to the BLAST parsers, GenericHit, and SearchIO.t (to reflect the correct expected data) in the next day or two unless there are any objections. chris On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > On Nov 21, 2007 9:38 AM, Bernd Web wrote: >> [snip] >> >> Further is is possible to get the raw_score of a hit. $hit->raw_score >> actually gets the bitscore (w/o decimal point). > > Hmmm. raw_score should not be the same as bit score. So given an > example blast hit line such as: > > Score = 60.0 bits (30), Expect = 1e-06 > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > Could you submit a bug report for this? http://www.bioperl.org/ > wiki/Bugs > > Thanks, > Steve > >> >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: >>> Hi Russell, >>> >>> I came across your question. At first I thought all was well on my >>> system, but indeed I also have these colouring problems. >>> I noted that scrore in the bgcolor callback gets a different value! >>> Printing score during hit parsing($hit->raw_score) gives the same >>> score as -description >>> my $score = $feature->score; However, printing score in the bgcolor >>> sub gives 2573! >>> All scores in the bgcolor routine all different and higher than the >>> real scores. Were you able to solve this colouring issue? >>> >>> Regards, >>> Bernd >>> >>> >>>> Hi all, >>>> I'm using a modified version of Lincoln's tutorial >>>> (http://www.bioperl.org/wiki/ >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) >>>> and I'm colouring the HSPs by setting the -bgcolor by score with >>>> a sub >>>> to give a similar image to that from NCBI but for some reason, my >>>> colours are coming out wrong (see attached example) >>>> They seem to be off by one but I can't see why. >>>> >>>> Any ideas? >>>> >>>> I can't be certain but I think it's only started doing this >>>> since our >>>> BLAST upgrade to 2.2.17 a few weeks ago. >>>> >>>> Here's the colouring code: >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------- >>>> my $track = $panel->add_track( >>>> -glyph => 'segments', >>>> -label => 1, >>>> -connector => 'dashed', >>>> -bgcolor => sub { >>>> my $feature = shift; >>>> my $score = $feature->score; >>>> return 'red' if $score >= 200; >>>> return 'fuchsia' if $score >>>> >= 80; >>>> return 'lime' if $score >>>> >= 50; >>>> return 'blue' if $score >= 40; >>>> return 'black'; >>>> }, >>>> -font2color => 'gray', >>>> -sort_order => 'high_score', >>>> -description => sub { >>>> my $feature = shift; >>>> return unless >>>> $feature->has_tag('description'); >>>> my ($description) = >>>> $feature->each_tag_value('description'); >>>> my $score = $feature->score; >>>> "$description, score=$score"; >>>> }, >>>> ); >>>> ------------------------------------------------------------------- >>>> ----- >>>> --------- >>>> >>>> >>>> Thanx, >>>> >>>> Russell Smithies >>>> >>>> >>>> >>>> >>>> =================================================================== >>>> ==== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> =================================================================== >>>> ==== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Mon Nov 26 14:08:41 2007 From: arvindvanam at gmail.com (vanam) Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Message-ID: <13955209.post@talk.nabble.com> i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m unable to find a downloadable version.all ther is a web interface for it. can u tell frm wher to fdownload it???? or can you just tell me how to set the location in piseapplication to localhost n wat to enter in the $email variable???? -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Nov 26 15:08:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 14:08:24 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13955209.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> <13955209.post@talk.nabble.com> Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu> On Nov 26, 2007, at 1:08 PM, vanam wrote: > i searches for the embassy version of RNAFOLD (i guess its > vrnafold) but i m > unable to find a downloadable version.all ther is a web interface > for it. > can u tell frm wher to fdownload it???? You will need to install EMBOSS as well as the EMBASSY version of VIENNA (something which is documented in the docs that come along with the distributions and I will not go into detail on): ftp://emboss.open-bio.org/pub/EMBOSS/ This would be your best bet. Understand that there is no specific class framework for dealing with RNA secondary structure in BioPerl, so you will be on your own for now. My suggestion for using Pise had the very important caveats that (1) it very well may not work, (2) I have no experience with Pise, let alone setting it up locally, therefore (3) I haven't tested it (and don't intend to, as I don't have the time). > or can you just tell me how to set the location in piseapplication to > localhost n wat to enter in the $email variable???? I have pointed out documentation previously which comes with the modules in question. Remember perldoc is your friend; consulting it saves me (and everyone else) time. From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise': ---------------------------------------------- DESCRIPTION Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli- cation objects, that let you submit jobs on a Pise server. my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -email => 'me at myhome'); The email is optional (there is default one). It can be useful, though. Your program might enter infinite loops, or just run many jobs: the Pise server maintainer needs a contact (s/he could of course cancel any requests from your address...). And if you plan to run a lot of heavy jobs, or to do a course with many students, please ask the maintainer before. The location parameter stands for the actual CGI location, except when set at the factory creation step, where it is rather the root of all CGI. There are default values for most of Pise programs. You can either set location at: 1 factory creation: my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -location => 'http://somewhere/ Pise/cgi-bin', -email => 'me at myhome'); 2 program creation: my $program = $factory->program('water', -location => 'http://somewhere/Pise/ cgi-bin/water.pl' ); 3 any time before running: $program->location('http://somewhere/Pise/cgi-bin/water.pl'); $job = $program->run(); 4 when running: $job = $program->run(-location => 'http://somewhere/Pise/cgi- bin/water.pl'); You can also retrieve a previous job results by providing its url: $job = $factory->job($url); You get the url of a job by: $job->jobid; ---------------------------------------------- chris From sac at bioperl.org Mon Nov 26 20:41:59 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 17:41:59 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Chris, Cood catch. You're on track here with one exception: WU blast and NCBI blast behave differently in what they report in the hit table: WU blast puts the raw score in the table not the bit score as NCBI blast does (see example below for reference). WU blast also swaps their location in the HSP header relative to how NCBI reports it. It would be good to verify that the blast parser isn't befuddled by this. A quick look at SearchIO::blast and it appears that data from the hit table is always getting stored as score, not bits for WU blast. Not sure if the HSP section data are parsed correctly. I'd recommend looking into these things when you do your fixes. So in the end, WU blast HSPs that are built from the hit table should report a value for raw_score and punt on bits, but NCBI HSPs so constructed should do the opposite. The downside to this arrangement is that code that works for NCBI blast hits will need modification to work for WU blast hits, but that is just the nature of the data. It shouldn't be an issue for the majority of users that stick with one flavor of blast and don't switch back and forth, or for users that get their HSP scoring data from HSP sections rather than relying on the hit table. Ideally, the HSP object would know whether it was NCBI or WU-based and issue an informative warning when attempting to access data it doesn't have. One solution might be for the parser to put a 'WU-' in front of the algorithm name for WU blast reports, so it would then be available for the contained hit/hsp objects. This could break anything dependent on algorithm name, so it would need some testing. Steve Example WU blast table and HSP header: Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh... 4141 0.0 1 gb|AAC76922.1| (AE000468) aspartokinase II and homoserine... 844 3.1e-86 1 gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi... 483 2.8e-47 1 gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c... 97 0.0010 1 >gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I [Escherichia coli] Length = 820 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0 Identities = 820/820 (100%), Positives = 820/820 (100%) Example NCBI blast table and HSP header: Score E Sequences producing significant alignments: (bits) Value ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189... 115 8e-26 >ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397 transcript:ENST00000357569 Length = 425 Score = 120 bits (301), Expect = 3e-27 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%) On Nov 26, 2007 10:59 AM, Chris Fields wrote: > Steve, Bernd, (and Jason, since you may have some input on this as > well), > > I am now looking into the bug Bernd submitted and it seems there is a > serious discrepancy with the way the hit raw_score, bits, and > significance is determined for Hit objects. Unless I am mistaken > these should always come from the best HSP when they are present, > falling back to the hit table data only when no HSP alignments are > present. Under the latter conditions a minimal Hit object is made > from data in the hit table, which reports the rounded bit score, not > the raw score, so in those cases the raw score would be undefined > (and you probably should get a nasty warning indicating there are no > HSPs present to get the data from). > > What is occurring now, though, is the raw_score and significance is > explicitly set from the hit table in the BLAST parser for the Hit > object at all times, while the bits are correctly derived from the > best HSP (no fallback to the hit table). Changing to the behavior > above results in several tests failing via SearchIO.t, with each > failed test reporting the expected (read:correct) raw score. > > I'll look through the tests just in case, but I am planning on > committing changes to the BLAST parsers, GenericHit, and SearchIO.t > (to reflect the correct expected data) in the next day or two unless > there are any objections. > > chris > > > On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > > > On Nov 21, 2007 9:38 AM, Bernd Web wrote: > >> [snip] > >> > >> Further is is possible to get the raw_score of a hit. $hit->raw_score > >> actually gets the bitscore (w/o decimal point). > > > > Hmmm. raw_score should not be the same as bit score. So given an > > example blast hit line such as: > > > > Score = 60.0 bits (30), Expect = 1e-06 > > > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > > > Could you submit a bug report for this? http://www.bioperl.org/ > > wiki/Bugs > > > > Thanks, > > Steve > > > >> > >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: > >>> Hi Russell, > >>> > >>> I came across your question. At first I thought all was well on my > >>> system, but indeed I also have these colouring problems. > >>> I noted that scrore in the bgcolor callback gets a different value! > >>> Printing score during hit parsing($hit->raw_score) gives the same > >>> score as -description > >>> my $score = $feature->score; However, printing score in the bgcolor > >>> sub gives 2573! > >>> All scores in the bgcolor routine all different and higher than the > >>> real scores. Were you able to solve this colouring issue? > >>> > >>> Regards, > >>> Bernd > >>> > >>> > >>>> Hi all, > >>>> I'm using a modified version of Lincoln's tutorial > >>>> (http://www.bioperl.org/wiki/ > >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) > >>>> and I'm colouring the HSPs by setting the -bgcolor by score with > >>>> a sub > >>>> to give a similar image to that from NCBI but for some reason, my > >>>> colours are coming out wrong (see attached example) > >>>> They seem to be off by one but I can't see why. > >>>> > >>>> Any ideas? > >>>> > >>>> I can't be certain but I think it's only started doing this > >>>> since our > >>>> BLAST upgrade to 2.2.17 a few weeks ago. > >>>> > >>>> Here's the colouring code: > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> ------- > >>>> my $track = $panel->add_track( > >>>> -glyph => 'segments', > >>>> -label => 1, > >>>> -connector => 'dashed', > >>>> -bgcolor => sub { > >>>> my $feature = shift; > >>>> my $score = $feature->score; > >>>> return 'red' if $score >= 200; > >>>> return 'fuchsia' if $score > >>>> >= 80; > >>>> return 'lime' if $score > >>>> >= 50; > >>>> return 'blue' if $score >= 40; > >>>> return 'black'; > >>>> }, > >>>> -font2color => 'gray', > >>>> -sort_order => 'high_score', > >>>> -description => sub { > >>>> my $feature = shift; > >>>> return unless > >>>> $feature->has_tag('description'); > >>>> my ($description) = > >>>> $feature->each_tag_value('description'); > >>>> my $score = $feature->score; > >>>> "$description, score=$score"; > >>>> }, > >>>> ); > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> --------- > >>>> > >>>> > >>>> Thanx, > >>>> > >>>> Russell Smithies > >>>> > >>>> > >>>> > >>>> > >>>> =================================================================== > >>>> ==== > >>>> Attention: The information contained in this message and/or > >>>> attachments > >>>> from AgResearch Limited is intended only for the persons or > >>>> entities > >>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>> material. Any review, retransmission, dissemination or other use > >>>> of, or > >>>> taking of any action in reliance upon, this information by > >>>> persons or > >>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>> Limited. If you have received this message in error, please > >>>> notify the > >>>> sender immediately. > >>>> =================================================================== > >>>> ==== > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From sac at bioperl.org Mon Nov 26 22:27:09 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 19:27:09 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com> Hi Jon, I'd recommend downloading it into a separate location of your choosing (~/lib/bioperl-ext for example) and running the installer as specified in the docs that come with the download. Then you can include the location you installed it into via a "use lib '~/lib/bioperl-ext'" statement at the top of your script. It may be handy to install it as a non-root user so that you don't alter the main perl installation. This way your ext install will stay separate from your main bioperl and perl installations. There are some docs about the ext packages you might want to check out at http://www.bioperl.org/wiki/Ext_package. Steve On Nov 21, 2007 4:35 PM, Jonathan Binkley wrote: > Hi, > > I installed bioperl on a Mac (OS 10.4, Intel) via fink, > which put it here: > > /sw/lib/perl5/5.8.6/Bio/ > > It seems to work fine, but I need bioperl-ext for > Smith-Waterman alignments. > > So, into which directory should I download bioperl-ext and > run the Makefile? > > Thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From a_arya2000 at yahoo.com Tue Nov 27 09:51:41 2007 From: a_arya2000 at yahoo.com (a_arya2000) Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST) Subject: [Bioperl-l] Bioperl-ext test fails Message-ID: <615478.1036.qm@web60113.mail.yahoo.com> Hello, I downloaded latest bioperl-ext from bioperl website, and I have io_lib v1.8.11 installed, and I was trying to install Bio::SeqIO::staden::read (of bioperl-ext). It compiled fine without any error but when I run make test I got following output. ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/staden_read....ok 3/94# Test 7 got: "0" (t/staden_read.t at line 110 *TODO*) # Expected: (We don't have the ability to write files for abi format) # t/staden_read.t line 110 is: ok(0, undef, "We don't have the ability to write files for $format format") for 1..7; # Test 8 got: "0" (t/staden_read.t at line 110 fail #2 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 9 got: "0" (t/staden_read.t at line 110 fail #3 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 10 got: "0" (t/staden_read.t at line 110 fail #4 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 11 got: "0" (t/staden_read.t at line 110 fail #5 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 12 got: "0" (t/staden_read.t at line 110 fail #6 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 13 got: "0" (t/staden_read.t at line 110 fail #7 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 14 got: "0" (t/staden_read.t at line 62 *TODO*) # Expected: (Still missing test files for alf format) # t/staden_read.t line 62 is: ok(0, undef, "Still missing test files for $format format") for (1..$formatlooptests); # Test 15 got: "0" (t/staden_read.t at line 62 fail #2 *TODO*) # Expected: (Still missing test files for alf format) # Test 16 got: "0" (t/staden_read.t at line 62 fail #3 *TODO*) # Expected: (Still missing test files for alf format) # Test 17 got: "0" (t/staden_read.t at line 62 fail #4 *TODO*) # Expected: (Still missing test files for alf format) # Test 18 got: "0" (t/staden_read.t at line 62 fail #5 *TODO*) # Expected: (Still missing test files for alf format) # Test 19 got: "0" (t/staden_read.t at line 62 fail #6 *TODO*) # Expected: (Still missing test files for alf format) # Test 20 got: "0" (t/staden_read.t at line 62 fail #7 *TODO*) # Expected: (Still missing test files for alf format) # Test 21 got: "0" (t/staden_read.t at line 62 fail #8 *TODO*) # Expected: (Still missing test files for alf format) # Test 22 got: "0" (t/staden_read.t at line 62 fail #9 *TODO*) # Expected: (Still missing test files for alf format) # Test 23 got: "0" (t/staden_read.t at line 62 fail #10 *TODO*) # Expected: (Still missing test files for alf format) # Test 24 got: "0" (t/staden_read.t at line 62 fail #11 *TODO*) # Expected: (Still missing test files for alf format) # Test 25 got: "0" (t/staden_read.t at line 62 fail #12 *TODO*) # Expected: (Still missing test files for alf format) # Test 31 got: "0" (t/staden_read.t at line 107 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # t/staden_read.t line 107 is: ok(0, undef, "Can't write valid ctf files until we have a trace object") for 1..7; # Test 32 got: "0" (t/staden_read.t at line 107 fail #2 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 33 got: "0" (t/staden_read.t at line 107 fail #3 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 34 got: "0" (t/staden_read.t at line 107 fail #4 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 35 got: "0" (t/staden_read.t at line 107 fail #5 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 36 got: "0" (t/staden_read.t at line 107 fail #6 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 37 got: "0" (t/staden_read.t at line 107 fail #7 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + 0.15 csys = 1.71 CPU) Anyone has any idea what might be going wrong here? By the way, my OS is Linux. Thank you very much. Arya ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From bix at sendu.me.uk Tue Nov 27 10:41:38 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 15:41:38 +0000 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com> References: <615478.1036.qm@web60113.mail.yahoo.com> Message-ID: <474C3AB2.5050208@sendu.me.uk> a_arya2000 wrote: > Hello, > I downloaded latest bioperl-ext from bioperl website, > and I have io_lib v1.8.11 installed, and I was trying > to install Bio::SeqIO::staden::read (of bioperl-ext). > It compiled fine without any error but when I run make > test I got following output. [...] > All tests successful. > Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + > 0.15 csys = 1.71 CPU) > > > Anyone has any idea what might be going wrong here? By > the way, my OS is Linux. Thank you very much. Not being familiar with the test script or ext, I can at least say that nothing actually went wrong: 'All tests successful'. Apparently there are some things in the test script that are known by the author to not work quite right, so he marked them as 'todo'. The problems seem harmless in any case, with things returning 0 instead of undef. So, unless you've reason to believe there is something significant going on, all is well. From alison.waller at utoronto.ca Mon Nov 26 16:06:35 2007 From: alison.waller at utoronto.ca (alison waller) Date: Mon, 26 Nov 2007 16:06:35 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results Message-ID: <005a01c83070$3a814580$d81efea9@AWALL> Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From bix at sendu.me.uk Tue Nov 27 12:01:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 17:01:36 +0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <474C4D70.2010206@sendu.me.uk> alison waller wrote: > I am trying to write a script that will parse large blast files (usually > blastx) I also want to be able to specify how many hits I want to report > information on. > > Most of the time I will only want information on the top hit, but I want to > have the flexibility to obtain information on say the top5. I am pretty > sure I have done this wrong, any advice on how to correct my script to do > this, would be great. [snip] > if ($top_hit=$result->next_hit) # this might be wrong - I want to > specify how many hits to print results for I didn't really pay attention to the rest of your code, but assuming it all works except for only ever giving you info for the top hit, you just need to change this 'if' to a loop of some kind. # ... my $hits = 0; while (my $hit = $result->next_hit) { $hits++; last if $hits > $tophit; # ... } From David.Messina at sbc.su.se Tue Nov 27 12:55:44 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 27 Nov 2007 18:55:44 +0100 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <474C4D70.2010206@sendu.me.uk> References: <005a01c83070$3a814580$d81efea9@AWALL> <474C4D70.2010206@sendu.me.uk> Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Hi Alison, As Sendu mentioned, the key bit is adding a condition to the hit loop to limit the number of hits that are printed. I didn't test the below extensively, but give it a try... Dave #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while ( my $result = $report->next_result ) { my $i = 0; while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { while ( my $hsp = $hit->next_hsp ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } From Russell.Smithies at agresearch.co.nz Tue Nov 27 14:31:29 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 28 Nov 2007 08:31:29 +1300 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: Do the hits need to be sorted first or is this done automagicly? I ask this as I know Megablast doesn't provide sorted output for most of it's formats. Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Dave Messina > Sent: Wednesday, 28 November 2007 6:56 a.m. > To: alison waller > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > Hi Alison, > As Sendu mentioned, the key bit is adding a condition to the hit loop to > limit the number of hits that are printed. I didn't test the below > extensively, but give it a try... > > > Dave > > > > #!/usr/local/bin/perl -w > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > use strict; > use warnings; > use Bio::SearchIO; > > my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; > if (@ARGV != 2) { die $usage; } > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > print OUT > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t > Qstrand\tHstrand\n"; > > # Go through BLAST reports one by one > while ( my $result = $report->next_result ) { > my $i = 0; > while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { > while ( my $hsp = $hit->next_hsp ) { > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > > if ($i == 0) { print OUT "no hits\n"; } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Tue Nov 27 16:09:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:09:43 -0600 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <474C3AB2.5050208@sendu.me.uk> References: <615478.1036.qm@web60113.mail.yahoo.com> <474C3AB2.5050208@sendu.me.uk> Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu> You can always test it within the bioperl suite after it's installed; several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read. In general though if it's passing tests it should be fine. chris On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote: > a_arya2000 wrote: >> Hello, >> I downloaded latest bioperl-ext from bioperl website, >> and I have io_lib v1.8.11 installed, and I was trying >> to install Bio::SeqIO::staden::read (of bioperl-ext). >> It compiled fine without any error but when I run make >> test I got following output. > [...] >> All tests successful. >> Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + >> 0.15 csys = 1.71 CPU) >> >> >> Anyone has any idea what might be going wrong here? By >> the way, my OS is Linux. Thank you very much. > > Not being familiar with the test script or ext, I can at least say > that > nothing actually went wrong: 'All tests successful'. Apparently there > are some things in the test script that are known by the author to not > work quite right, so he marked them as 'todo'. The problems seem > harmless in any case, with things returning 0 instead of undef. > > So, unless you've reason to believe there is something significant > going > on, all is well. From cjfields at uiuc.edu Tue Nov 27 16:00:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:00:33 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Tue Nov 27 20:06:30 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT) Subject: [Bioperl-l] Bio::Tools::Run::Primer3 Message-ID: Hello, I was playing around with Primer3, and I hit a problem. Not sure if it's a bug or if I was doing something I wasn't supposed to, but if it's the latter, I thought it might save someone else half an hour of banging their head of a keyboard if I mentioned it: What I was doing was roughly: # create a primer3 obj my $p3 = ...Primer3->new(); # loop through some sequences generating primers for # each of them using the same primer3 obj while (@some_bio_seqs){ my $res = $p3->run; ... } This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC, at which point it worked for a few sequences then I got a "can't place primer on sequence" error. After a bit of faffing about, I think the problem occurs when no primers are found. In which case $p3 still has the primers from the previous run, which don't come from the current sequence, so can't be placed on it. I tried calling $p3->cleanup in the loop, but that didn't work either. Creating a new $p3 every time works fine. Are you supposed to create a new Primer3 object for every sequence? (Apologies if I missed the relevant bit of the docs). Cheers, Cass xx From alison.waller at utoronto.ca Tue Nov 27 16:32:07 2007 From: alison.waller at utoronto.ca (alison waller) Date: Tue, 27 Nov 2007 16:32:07 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Thanks Everyone, Your edits worked Dave, however after looking at the output I realized that I only want information on the top hsp per query returned. For example some of the querys the top hit has two hsps so it returned both. I tried to further edit it, but after 3 attempts they are all failing, I think due to me using the loops wrong. I also have another problem, I also want to retrieve the gi, this doesn't seem to be straight forward as it should. I found another method _get_seq_identifiers, but this looks awkward, isn't there and object for the gi? I've pasted my non-working script below if there are any suggestions on how to get it to print out just the first hsp per hit, that would be great. Thanks, #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t strand\tHstrand\n"; # Go through BLAST reports one by one while (my $result = $report->next_result) { my $i=0; while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, November 27, 2007 4:01 PM To: Smithies, Russell Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dennis.prickett at bbsrc.ac.uk Wed Nov 28 05:18:26 2007 From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C)) Date: Wed, 28 Nov 2007 10:18:26 -0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk> Dear Alison Or, if you are absolutely only interested in the top hit you could limit it to that in the blast command by adding the parameters " -b 1 ". This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps, etc). Your blasts run faster and then you won't have to worry about how to parse out the top blast hit(s). However, if there are any caveats for using this parameter that I am not aware of please let us know. Dennis Prickett Institute of Animal Health Compton, nr Newbury RG2 9FS United Kingdom -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller Sent: 26 November 2007 21:07 To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] help using SEARCH IO to parse blast results Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From t.nugent at cs.ucl.ac.uk Wed Nov 28 08:10:41 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Wed, 28 Nov 2007 13:10:41 +0000 Subject: [Bioperl-l] Helical Wheel module Message-ID: <474D68D1.3080602@cs.ucl.ac.uk> Hi everyone, I've been drawing a lot of helical wheels recently so put all my code into a module. I don't think there's anything in bioperl to do this yet though there are a few programs written in perl and flash on the web to do the same thing. I was thinking this could fit into biographics. Has lots of options to adjust labels, colours, ttf fonts and can output to png & svg. Tim ... Here's the output, converted to jpg from svg: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg Module: http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz Works like this: use DrawHelicalWheel; my $im = DrawHelicalWheel->new(-title=>$title, -sequence=>$sequence, -helices=>\@helices, -ttf_font=>$font); open(OUTPUT, ">$svg"); binmode OUTPUT; print OUTPUT $im->svg; close OUTPUT; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From tristan.lefebure at gmail.com Wed Nov 28 10:46:11 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:46:11 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281046.11146.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From bix at sendu.me.uk Wed Nov 28 11:19:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Nov 2007 16:19:36 +0000 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <200711281046.11146.tnl7@cornell.edu> References: <200711281046.11146.tnl7@cornell.edu> Message-ID: <474D9518.7010201@sendu.me.uk> Tristan Lefebure wrote: > Hello! > > I was wondering if there was a function to remove sites/columns of an > alignment. Something like: $aln->remove_sites(@sites_to_remove) > I looked around Bio::SimpleAlign but did not find exactly that. There is > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. You might want to take a second look at the docs. You can supply column number ranges to remove_columns(), so it does exactly what you want. From tnl7 at cornell.edu Wed Nov 28 10:44:17 2007 From: tnl7 at cornell.edu (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:44:17 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281044.17770.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From cjfields at uiuc.edu Wed Nov 28 08:57:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:57:27 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Message-ID: I had some code which does this which I committed yesterday to CVS; it catches the GI for the query and the hits: $result->query_gi; $hit->ncbi_gi; I am in the midst of fixing additional problems with WU-BLAST parsing but you are more than welcome to try it. chris On Nov 27, 2007, at 3:32 PM, alison waller wrote: > Thanks Everyone, > > Your edits worked Dave, however after looking at the output I > realized that > I only want information on the top hsp per query returned. For > example some > of the querys the top hit has two hsps so it returned both. > > I tried to further edit it, but after 3 attempts they are all > failing, I > think due to me using the loops wrong. > > I also have another problem, I also want to retrieve the gi, this > doesn't > seem to be straight forward as it should. I found another method > _get_seq_identifiers, but this looks awkward, isn't there and object > for the > gi? > > I've pasted my non-working script below if there are any suggestions > on how > to get it to print out just the first hsp per hit, that would be > great. > > Thanks, > > > #!/usr/local/bin/perl -w > > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > > use strict; > use warnings; > use Bio::SearchIO; > > > my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; > if (@ARGV != 2) { die $usage; } > > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! > \n"; > > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > > print OUT > > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tgaps\t > strand\tHstrand\n"; > > > # Go through BLAST reports one by one > while (my $result = $report->next_result) { > my $i=0; > while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ > while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { > > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > if ($i == 0) { print OUT "no hits\n"; } > > } > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 27, 2007 4:01 PM > To: Smithies, Russell > Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > The hits/HSPs are generally in the order they appear in the report. > > If you are looking for best/worst HSP after parsing you can use the > $hit->hsp() method: > > # best and worst > my $best = $hit->hsp('best'); # also 'first' > my $worst = $hit->hsp('worst'); # also last > > The SearchIO text BLAST parser also has several options implemented > for finer control: > > -inclusion_threshold => e-value threshold for inclusion in the > PSI-BLAST score matrix model (blastpgp) > -signif => float or scientific notation number to be used > as a P- or Expect value cutoff > -score => integer or scientific notation number to be used > as a blast score value cutoff > -bits => integer or scientific notation number to be used > as a bit score value cutoff > -hit_filter => reference to a function to be used for > filtering hits based on arbitrary criteria. > All hits of each BLAST report must satisfy > this criteria to be retained. > If a hit fails this test, it is ignored. > This function should take a > Bio::Search::Hit::BlastHit.pm object as its first > argument and return true > if the hit should be retained. > Sample filter function: > -hit_filter => sub { $hit = shift; > $hit->gaps == 0; }, > (Note: -filt_func is synonymous with -hit_filter) > -overlap => integer. The amount of overlap to permit between > adjacent HSPs when tiling HSPs. A reasonable > value is 2. > Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. > > chris > > On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > >> Do the hits need to be sorted first or is this done automagicly? >> I ask this as I know Megablast doesn't provide sorted output for >> most of >> it's formats. >> >> Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open- >>> bio.org] On Behalf Of Dave Messina >>> Sent: Wednesday, 28 November 2007 6:56 a.m. >>> To: alison waller >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >>> >>> Hi Alison, >>> As Sendu mentioned, the key bit is adding a condition to the hit >>> loop >> to >>> limit the number of hits that are printed. I didn't test the below >>> extensively, but give it a try... >>> >>> >>> Dave >>> >>> >>> >>> #!/usr/local/bin/perl -w >>> >>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >>> # alison waller November 2007 >>> >>> use strict; >>> use warnings; >>> use Bio::SearchIO; >>> >>> my $usage = "to run type: blast_parse_aw.pl <# of >> hits>\n"; >>> if (@ARGV != 2) { die $usage; } >>> >>> my $infile = $ARGV[0]; >>> my $outfile = $infile . '.parsed'; >>> my $tophit = $ARGV[1]; # to specify in the command line how many >>> hits >>> # to report for each query >>> >>> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >>> \n"; >>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! >> $!\n"; >>> >>> my $report = new Bio::SearchIO( >>> -file => "$infile", >>> -format => "blast" >>> ); >>> >>> print OUT >>> >> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent >> \tga >> ps\t >>> Qstrand\tHstrand\n"; >>> >>> # Go through BLAST reports one by one >>> while ( my $result = $report->next_result ) { >>> my $i = 0; >>> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >>> while ( my $hsp = $hit->next_hsp ) { >>> >>> # Print some tab-delimited data about this hit >>> print OUT $result->query_name, "\t"; >>> print OUT $hit->name, "\t"; >>> print OUT $hit->significance, "\t"; >>> print OUT $hit->bits, "\t"; >>> print OUT $hsp->evalue, "\t"; >>> print OUT $hsp->percent_identity, "\t"; >>> print OUT $hsp->length('total'), "\t"; >>> print OUT $hsp->num_identical, "\t"; >>> print OUT $hsp->gaps, "\t"; >>> print OUT $hsp->strand('query'), "\t"; >>> print OUT $hsp->strand('hit'), "\n"; >>> } >>> } >>> >>> if ($i == 0) { print OUT "no hits\n"; } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use of, >> or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 08:54:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:54:39 -0600 Subject: [Bioperl-l] Helical Wheel module In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk> References: <474D68D1.3080602@cs.ucl.ac.uk> Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu> Looks good! We recently added in your transmembrane module, so we could definitely add this in. chris On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote: > Hi everyone, > > I've been drawing a lot of helical wheels recently so put all my code > into a module. I don't think there's anything in bioperl to do this > yet > though there are a few programs written in perl and flash on the web > to > do the same thing. I was thinking this could fit into biographics. Has > lots of options to adjust labels, colours, ttf fonts and can output to > png & svg. > > Tim > > ... > > Here's the output, converted to jpg from svg: > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg > > Module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz > > Works like this: > > use DrawHelicalWheel; > > my $im = DrawHelicalWheel->new(-title=>$title, > -sequence=>$sequence, > -helices=>\@helices, > -ttf_font=>$font); > open(OUTPUT, ">$svg"); > binmode OUTPUT; > print OUTPUT $im->svg; > close OUTPUT; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > http://www.cs.ucl.ac.uk/staff/T.Nugent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 13:43:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 12:43:58 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu> On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote: > Chris, > > Cood catch. You're on track here with one exception: WU blast and NCBI > blast behave differently in what they report in the hit table: WU > blast puts the raw score in the table not the bit score as NCBI blast > does (see example below for reference). WU blast also swaps their > location in the HSP header relative to how NCBI reports it. It would > be good to verify that the blast parser isn't befuddled by this. A > quick look at SearchIO::blast and it appears that data from the hit > table is always getting stored as score, not bits for WU blast. Not > sure if the HSP section data are parsed correctly. I'd recommend > looking into these things when you do your fixes. What I have now after commits is: GenericHit - use the best HSP when possible for bits, score/raw_score, significance. When there is no HSP, construct a minimal Hit object using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST maps to bits(), both map evalue/pvalue to significance). HSP mapping seems to be correct. One issue that has popped up is GenericHit::significance preferentially uses the best HSP. However, GenericHSP::significance uses evalues preferentially over pvalues; both Expect and P appear to be parsed for WU-BLAST HSPs now (so the evalue is reported); this apparently wasn't always the case if I read the GenericHit docs correctly. As NCBI BLAST doesn't report pvalues we could change that so it preferentially returns a pvalue if present, falling back to an evalue. This would match what is found hit table more closely and resembles what is documented for the method (for significance(), WU- BLAST gets pvalues, NCBI BLAST gets evalues). > So in the end, WU blast HSPs that are built from the hit table should > report a value for raw_score and punt on bits, but NCBI HSPs so > constructed should do the opposite. The downside to this arrangement > is that code that works for NCBI blast hits will need modification to > work for WU blast hits, but that is just the nature of the data. It > shouldn't be an issue for the majority of users that stick with one > flavor of blast and don't switch back and forth, or for users that get > their HSP scoring data from HSP sections rather than relying on the > hit table. In general I get my data from the HSPs, so this shouldn't be a significant issue (bad pun). I did find that changing it so that Hit objects use HSP data pointed out issues with test data; hit table raw/ bit scores were rounded from the HSP score data or vice versa since all data came from the hit table, so tests flunked. I think changing the way minimal hit objects report data (particularly for NCBI BLAST) will lead to a lot of confusion unless we clarify warnings when one or the other is missing (as you also indicated). I'm working on that now. > Ideally, the HSP object would know whether it was NCBI or WU-based and > issue an informative warning when attempting to access data it doesn't > have. One solution might be for the parser to put a 'WU-' in front of > the algorithm name for WU blast reports, so it would then be available > for the contained hit/hsp objects. This could break anything dependent > on algorithm name, so it would need some testing. > > Steve I can probably work around as noted above that unless you think it's warranted to add a 'WU' designation (the version info in the Result object has 'WashU' attached, so one could feasibly use that for distinguishing the two report types). Anyway, I'm committing my first batch of fixes, the significance test will fail for at least a day until I can look into it more. chris From tristan.lefebure at gmail.com Wed Nov 28 14:03:44 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 14:03:44 -0500 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <474D9518.7010201@sendu.me.uk> References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Hoops. I was reading the BioPerl 1.4 documentation. Actually, http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be the 1.4documentation... Thank you, it works great. On Nov 28, 2007 11:19 AM, Sendu Bala wrote: > Tristan Lefebure wrote: > > Hello! > > > > I was wondering if there was a function to remove sites/columns of an > > alignment. Something like: $aln->remove_sites(@sites_to_remove) > > I looked around Bio::SimpleAlign but did not find exactly that. There is > > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' > criteria. > > You might want to take a second look at the docs. You can supply column > number ranges to remove_columns(), so it does exactly what you want. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Nov 28 16:57:14 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 29 Nov 2007 10:57:14 +1300 Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk> Message-ID: Has anyone got a good example of parsing ASN.1 with Bio::SeqIO::entrezgene? I'm trying to get GO ids and KEGG terms out but it's quite deeply nested and my Perl isn't that good :-( Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From stefan.kirov at bms.com Wed Nov 28 17:16:18 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time) Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Here is an example for GO, will send the one for KEGG later: my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', -service_record=>'yes');#, -locuslink=>'convert'); while (my $seq=$eio->next_seq) { my $gid=$seq->accession_number; foreach my $ot ($ann->get_Annotations('OntologyTerm')) { next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers my $evid=$ot->comment; $evid=~s/evidence: //i; my @ref=$ot->term->get_references; #Really there should be just one? my $id=$ot->identifier; my $fid='GO:' . sprintf("%07u",$id); print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n"; } } Please note there is a bug in the parser that makes it suck a lot of RAM. I am fixing this and will commit probably by the week's end- you will have to update at that point. If you work with few records this should not matter. Stefan On Thu, 29 Nov 2007, Smithies, Russell wrote: > Has anyone got a good example of parsing ASN.1 with > Bio::SeqIO::entrezgene? > I'm trying to get GO ids and KEGG terms out but it's quite deeply nested > and my Perl isn't that good :-( > > Russell > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Nov 29 18:06:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 17:06:42 -0600 Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu> For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST parsing in Bio::SearchIO::blastxml (though it appears to be pretty stable!). Since there isn't any easy way to distinguish between normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to BLAST, you have to indicate how the report is to be parsed by passing in a '-blasttype' parameter: $searchio = Bio::SearchIO->new('-tempfile' => 1, '-format' => 'blastxml', '-file' => 'psiblast.xml', '-blasttype' => 'psiblast'); Otherwise it chunks the individual iterations out as separate BLAST reports and parses them as separate reports. Tests have also been added to SearchIO.t. I will update the HOWTO and blastxml docs soon. chris From cjfields at uiuc.edu Thu Nov 29 21:41:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 20:41:49 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Primer3 In-Reply-To: References: Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu> It's probably safer to create a new instance each time but it really shouldn't be necessary for a wrapper module; this sounds like a bug to me. Could you file it in Bugzilla? On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote: > Hello, > > I was playing around with Primer3, and I hit a problem. Not sure if > it's a > bug or if I was doing something I wasn't supposed to, but if it's the > latter, I thought it might save someone else half an hour of banging > their > head of a keyboard if I mentioned it: > > What I was doing was roughly: > > # create a primer3 obj > my $p3 = ...Primer3->new(); > > # loop through some sequences generating primers for > # each of them using the same primer3 obj > while (@some_bio_seqs){ > my $res = $p3->run; > ... > } > > This worked fine for a while, but broke when I tried to set > PRIMER_MIN_GC, > at which point it worked for a few sequences then I got a "can't place > primer on sequence" error. > > After a bit of faffing about, I think the problem occurs when no > primers > are found. In which case $p3 still has the primers from the previous > run, > which don't come from the current sequence, so can't be placed on > it. I > tried calling $p3->cleanup in the loop, but that didn't work either. > Creating a new $p3 every time works fine. > > Are you supposed to create a new Primer3 object for every sequence? > (Apologies if I missed the relevant bit of the docs). > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paulhengen at coh.org Wed Nov 28 20:20:42 2007 From: paulhengen at coh.org (Paul N. Hengen) Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST) Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs Message-ID: <14017289.post@talk.nabble.com> Hi. I have a number of gene IDs from Entrez and I want to find the start and end locations in the human genome. This seemed simple enough, so I started working through some of the examples for using the EntrezGene module at www.bioperl.org Of course this did not work because the core installation does not include this module. So, I think I have two choices (1) install the module (how?), or (2) find an easier way to get the locations in the human genome. I want to use the locations to grab sequences out of the genome. Can anyone offer advice on this? Thanks. -Paul. -- Paul N. Hengen, Ph.D. Hematopoietic Stem Cell and Leukemia Research City of Hope National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 USA mailto:paulhengen at coh.org -- View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From Viktor.Martyanov at Dartmouth.EDU Thu Nov 29 15:20:19 2007 From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov) Date: 29 Nov 2007 15:20:19 -0500 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases Message-ID: <193573097@newdonner.Dartmouth.EDU> A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 445 bytes Desc: not available URL: From alison.waller at utoronto.ca Thu Nov 29 11:20:59 2007 From: alison.waller at utoronto.ca (alison waller) Date: Thu, 29 Nov 2007 11:20:59 -0500 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL> Hi all, I would like to install the CVS version of bioperl as I know of some code changes that will be useful to me. However, I am having problems installing it. I am trying to install bioperl in my home directly on a linux cluster. I used > cd bioperl-live * perl Build.PL -install /home/awaller However after the build command I got a lot of errors. Do I have to also have perl installed in my home directory?? There is perl installed on the cluster in /usr/bin. Do I need to point to this or does Build.PL automatically look there? I noticed a few errors about not having permission and a few about not being able to connect. I've copied a portion of the messages after my Build.pl command. Any help would be appreciated, alison Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/02packages.details.txt.gz Trying to get away with old file: 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 /root/.cpan/sources/modules/02packages.details.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Database was generated on Sat, 10 Nov 2007 22:36:34 GMT There's a new CPAN.pm version (v1.9204) available! [Current version is v1.7601] You might want to try install Bundle::CPAN reload cpan without quitting the current session. It should be a seamless upgrade while we are running... Warning: You are not allowed to write into directory "/root/.cpan/sources/modules". I'll continue, but if you encounter problems, they may be due to insufficient permissions. Fetching with LWP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied] Fetching with Net::FTP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from ftp.nrc.ca Fetching with LWP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[FTP close response: 500 Network seems to have barfed - Let's all phone our ISP and go postal! Unknown command. ] Fetching with Net::FTP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca Fetching with LWP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'cpan.mirror.cygnal.ca'] Fetching with Net::FTP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Fetching with LWP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'mirror.isurf.ca'] Fetching with Net::FTP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Trying with "/usr/bin/lynx -source" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: cpan.mirror.cygnal.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/03modlist.data.gz Trying to get away with old file: 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 /root/.cpan/sources/modules/03modlist.data.gz Going to read /root/.cpan/sources/modules/03modlist.data.gz Going to write /root/.cpan/Metadata can't create /root/.cpan/Metadata: Permission denied at /usr/share/perl/5.8/CPAN.pm line 3432 Running install for module Test::Harness Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2342 ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From cjfields at uiuc.edu Thu Nov 29 23:53:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:53:09 -0600 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: Alison, There are directions on how to do this here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA (TinyURL link) http://tinyurl.com/3263dd Note the additional configuration for CPAN in that section; you'll need to set up CPAN so it installs everything locally. chris On Nov 29, 2007, at 10:20 AM, alison waller wrote: > Hi all, > > > > I would like to install the CVS version of bioperl as I know of > some code > changes that will be useful to me. However, I am having problems > installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. > > > > I used > > > >> cd bioperl-live > > * perl Build.PL -install /home/awaller > > > > However after the build command I got a lot of errors. Do I have to > also > have perl installed in my home directory?? There is perl installed > on the > cluster in /usr/bin. Do I need to point to this or does Build.PL > automatically look there? I noticed a few errors about not having > permission and a few about not being able to connect. I've copied a > portion > of the messages after my Build.pl command. > > > > Any help would be appreciated, > > > > alison > > > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/02packages.details.txt.gz > > Trying to get away with old file: > > 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 > /root/.cpan/sources/modules/02packages.details.txt.gz > > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > > Database was generated on Sat, 10 Nov 2007 22:36:34 GMT > > > > There's a new CPAN.pm version (v1.9204) available! > > [Current version is v1.7601] > > You might want to try > > install Bundle::CPAN > > reload cpan > > without quitting the current session. It should be a seamless upgrade > > while we are running... > > > > Warning: You are not allowed to write into directory > "/root/.cpan/sources/modules". > > I'll continue, but if you encounter problems, they may be due > > to insufficient permissions. > > Fetching with LWP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[Cannot write to > '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission > denied] > > Fetching with Net::FTP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from ftp.nrc.ca > > Fetching with LWP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[FTP close response: 500 Network > seems to > have barfed - Let's all phone our ISP and go postal! > > Unknown command. > > ] > > Fetching with Net::FTP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca > > Fetching with LWP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'cpan.mirror.cygnal.ca'] > > Fetching with Net::FTP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Fetching with LWP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'mirror.isurf.ca'] > > Fetching with Net::FTP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: cpan.mirror.cygnal.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > . > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/03modlist.data.gz > > Trying to get away with old file: > > 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 > /root/.cpan/sources/modules/03modlist.data.gz > > Going to read /root/.cpan/sources/modules/03modlist.data.gz > > Going to write /root/.cpan/Metadata > > can't create /root/.cpan/Metadata: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 3432 > > Running install for module Test::Harness > > Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz > > mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 2342 > > ****************************************** > Alison S. Waller M.A.Sc. > Doctoral Candidate > awaller at chem-eng.utoronto.ca > 416-978-4222 (lab) > Department of Chemical Engineering > Wallberg Building > 200 College st. > Toronto, ON > M5S 3E5 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 29 23:57:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:57:36 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- core (I think they were added prior to the 1.5.1 release, but I'm not positive). If possible you should try installing bioperl 1.5.2 or the latest code from CVS. For directions on installing Bioperl for most OS's go here: http://www.bioperl.org/wiki/Installing_BioPerl From CVS: http://www.bioperl.org/wiki/Using_CVS chris On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org > > -- > View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 30 03:45:57 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 30 Nov 2007 08:45:57 +0000 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: <474FCDC5.5020100@sendu.me.uk> alison waller wrote: > I would like to install the CVS version of bioperl as I know of some code > changes that will be useful to me. However, I am having problems installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. [...] > Please check, if the URLs I found in your configuration file > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are > valid. The urllist can be edited. E.g. with 'o conf urllist push > ftp://myurl/' Either these urls are invalid as suggested (try setting the urllist to nothing), or your linux cluster doesn't have internet access. You can't do a 'proper' install of BioPerl and its dependencies without internet access. However, for most purposes simply downloading the BioPerl modules (ie. from a different machine with internet access) and pointing your PERL5LIB to their location is sufficient. You can download CVS modules from the BioPerl website individually, or as a tarball or everything. From MEC at stowers-institute.org Fri Nov 30 09:12:09 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 30 Nov 2007 08:12:09 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: How many, how often? Use ensembl biomart! First time interactively. Then if you to pipeline it, take the perl code it generates for you and run it - of course you'll have to install the Ensembl Perl API.... Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Paul N. Hengen > Sent: Wednesday, November 28, 2007 7:21 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs > > > Hi. > > I have a number of gene IDs from Entrez and I want to find > the start and end locations in the human genome. This seemed > simple enough, so I started working through some of the > examples for using the EntrezGene module at www.bioperl.org > Of course this did not work because the core installation > does not include this module. So, I think I have two choices > (1) install the module (how?), or (2) find an easier way to > get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research City of Hope > National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 > USA mailto:paulhengen at coh.org > > -- > View this message in context: > http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E > ntrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Fri Nov 30 09:38:58 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 30 Nov 2007 09:38:58 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> Message-ID: Paul, Have you taken a look at this page? http://www.bioperl.org/wiki/Getting_Genomic_Sequences There's code there that looks similar to what you're proposing. Brian O. On 11/28/07 8:20 PM, "Paul N. Hengen" wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org From cjfields at uiuc.edu Fri Nov 30 10:47:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Nov 2007 09:47:32 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47502C75.60809@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask Mingyi Liu if he would like to include this parser with BioPerl (since it requires it, makes sense to me, and it avoids the circular dependency that has plagued these modules). chris On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > Chris Fields wrote: > Chris, > Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the > low-level parser and is not part of bioperl. There is a circular > dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... > Paul, you can get it from CPAN and this should make > Bio::SeqIO::entrezgene functional for you. > Stefan > > >> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >> core (I think they were added prior to the 1.5.1 release, but I'm not >> positive). If possible you should try installing bioperl 1.5.2 or >> the >> latest code from CVS. >> >> For directions on installing Bioperl for most OS's go here: >> >> http://www.bioperl.org/wiki/Installing_BioPerl >> >> From CVS: >> >> http://www.bioperl.org/wiki/Using_CVS >> >> chris >> >> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >> >> >>> Hi. >>> >>> I have a number of gene IDs from Entrez and I want to find the >>> start and end locations in the human genome. This seemed simple >>> enough, so I started working through some of the examples for >>> using the EntrezGene module at www.bioperl.org Of course this >>> did not work because the core installation does not include this >>> module. So, I think I have two choices (1) install the module >>> (how?), >>> or (2) find an easier way to get the locations in the human genome. >>> I want to use the locations to grab sequences out of the genome. >>> Can anyone offer advice on this? Thanks. >>> >>> -Paul. >>> >>> -- >>> Paul N. Hengen, Ph.D. >>> Hematopoietic Stem Cell and Leukemia Research >>> City of Hope National Medical Center >>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>> mailto:paulhengen at coh.org >>> >>> -- >>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Fri Nov 30 11:12:22 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 11:12:22 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> Message-ID: <47503666.8090004@bms.com> Chris Fields wrote: > My bad. I always forget about Bio::ASN1::Entrezgene. We should ask > Mingyi Liu if he would like to include this parser with BioPerl (since > it requires it, makes sense to me, and it avoids the circular > dependency that has plagued these modules). > Yes, I think this would be a good step. Stefan > chris > > On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > > >> Chris Fields wrote: >> Chris, >> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >> low-level parser and is not part of bioperl. There is a circular >> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >> Paul, you can get it from CPAN and this should make >> Bio::SeqIO::entrezgene functional for you. >> Stefan >> >> >> >>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>> core (I think they were added prior to the 1.5.1 release, but I'm not >>> positive). If possible you should try installing bioperl 1.5.2 or >>> the >>> latest code from CVS. >>> >>> For directions on installing Bioperl for most OS's go here: >>> >>> http://www.bioperl.org/wiki/Installing_BioPerl >>> >>> From CVS: >>> >>> http://www.bioperl.org/wiki/Using_CVS >>> >>> chris >>> >>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>> >>> >>> >>>> Hi. >>>> >>>> I have a number of gene IDs from Entrez and I want to find the >>>> start and end locations in the human genome. This seemed simple >>>> enough, so I started working through some of the examples for >>>> using the EntrezGene module at www.bioperl.org Of course this >>>> did not work because the core installation does not include this >>>> module. So, I think I have two choices (1) install the module >>>> (how?), >>>> or (2) find an easier way to get the locations in the human genome. >>>> I want to use the locations to grab sequences out of the genome. >>>> Can anyone offer advice on this? Thanks. >>>> >>>> -Paul. >>>> >>>> -- >>>> Paul N. Hengen, Ph.D. >>>> Hematopoietic Stem Cell and Leukemia Research >>>> City of Hope National Medical Center >>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>> mailto:paulhengen at coh.org >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From stefan.kirov at bms.com Fri Nov 30 10:29:57 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 10:29:57 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: <47502C75.60809@bms.com> Chris Fields wrote: Chris, Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the low-level parser and is not part of bioperl. There is a circular dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... Paul, you can get it from CPAN and this should make Bio::SeqIO::entrezgene functional for you. Stefan > Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- > core (I think they were added prior to the 1.5.1 release, but I'm not > positive). If possible you should try installing bioperl 1.5.2 or the > latest code from CVS. > > For directions on installing Bioperl for most OS's go here: > > http://www.bioperl.org/wiki/Installing_BioPerl > > From CVS: > > http://www.bioperl.org/wiki/Using_CVS > > chris > > On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find the >> start and end locations in the human genome. This seemed simple >> enough, so I started working through some of the examples for >> using the EntrezGene module at www.bioperl.org Of course this >> did not work because the core installation does not include this >> module. So, I think I have two choices (1) install the module (how?), >> or (2) find an easier way to get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research >> City of Hope National Medical Center >> 1500 E. Duarte Road, Duarte, CA 91010 USA >> mailto:paulhengen at coh.org >> >> -- >> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arareko at campus.iztacala.unam.mx Fri Nov 30 12:01:29 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 30 Nov 2007 11:01:29 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47503666.8090004@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> <47503666.8090004@bms.com> Message-ID: <475041E9.8050909@campus.iztacala.unam.mx> I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the past, he mentioned he doesn't track the list closely). Mauricio. Stefan Kirov wrote: > Chris Fields wrote: >> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask >> Mingyi Liu if he would like to include this parser with BioPerl (since >> it requires it, makes sense to me, and it avoids the circular >> dependency that has plagued these modules). >> > Yes, I think this would be a good step. > Stefan >> chris >> >> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: >> >> >>> Chris Fields wrote: >>> Chris, >>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >>> low-level parser and is not part of bioperl. There is a circular >>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >>> Paul, you can get it from CPAN and this should make >>> Bio::SeqIO::entrezgene functional for you. >>> Stefan >>> >>> >>> >>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>>> core (I think they were added prior to the 1.5.1 release, but I'm not >>>> positive). If possible you should try installing bioperl 1.5.2 or >>>> the >>>> latest code from CVS. >>>> >>>> For directions on installing Bioperl for most OS's go here: >>>> >>>> http://www.bioperl.org/wiki/Installing_BioPerl >>>> >>>> From CVS: >>>> >>>> http://www.bioperl.org/wiki/Using_CVS >>>> >>>> chris >>>> >>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>>> >>>> >>>> >>>>> Hi. >>>>> >>>>> I have a number of gene IDs from Entrez and I want to find the >>>>> start and end locations in the human genome. This seemed simple >>>>> enough, so I started working through some of the examples for >>>>> using the EntrezGene module at www.bioperl.org Of course this >>>>> did not work because the core installation does not include this >>>>> module. So, I think I have two choices (1) install the module >>>>> (how?), >>>>> or (2) find an easier way to get the locations in the human genome. >>>>> I want to use the locations to grab sequences out of the genome. >>>>> Can anyone offer advice on this? Thanks. >>>>> >>>>> -Paul. >>>>> >>>>> -- >>>>> Paul N. Hengen, Ph.D. >>>>> Hematopoietic Stem Cell and Leukemia Research >>>>> City of Hope National Medical Center >>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>>> mailto:paulhengen at coh.org >>>>> >>>>> -- >>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Fri Nov 30 15:21:13 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 30 Nov 2007 12:21:13 -0800 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases In-Reply-To: <193573097@newdonner.Dartmouth.EDU> References: <193573097@newdonner.Dartmouth.EDU> Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org> Viktor - Bio::SearchIO helps you parse BLAST reports, but don't underestimate the power of going as low-tech as possible and outputting scores with the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular format that is parseable with the 'split' function in Perl. See the wiki http://bioperl.org/wiki for HOWTOs and examples of using the parsers. You might also consider already-written tools like OrthoMCL, InParanoid, and other that help you define relationships like orthologs and paralogs among species. There also exist a few published web resources that have pre-computed homologs for you, might take a look around first unless the point of the project is to learn how to run these kinds of searches. For general Perl help consider Perlmonks.org and some of the introductory books that are available. -jason -- Jason Stajich jason at bioperl.org On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote: > Hello, > > My name is Viktor Martyanov and I am a Ph.D. student in biology at > Dartmouth. > > I need to be able to use a set of genes or FASTA sequences from S. > cerevisiae and retrieve a set of corresponding homologs from other > fungal species via BLASTP searches. > > I would like to find out if there are Perl scripts that approach > this problem. By the way, is there a Perl community or forum where > I could post this question? > > Thanks very much. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri Nov 30 17:03:23 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 30 Nov 2007 15:03:23 -0700 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: Paul, One other alternative is to use the UCSC table browser (http:// genome.ucsc.edu/cgi-bin/hgTables?command=start). Select your organism, upload your ID list. Select you output options. You can download the coordinates or the fasta directly. You have options for including or excluding various parts of the gene, and upstream/ downstream sequences. This is similar the solution that Malcom suggested except the Ensembl option can be run repeatedly as perl code as he pointed out. UCSC allows you to do remote connections to their MySQL server so you could set up a repeated task and more complex queries that way with the UCSC method. Barry On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote: > How many, how often? > > Use ensembl biomart! > > First time interactively. > > Then if you to pipeline it, take the perl code it generates for you > and > run it - of course you'll have to install the Ensembl Perl API.... > > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Paul N. Hengen >> Sent: Wednesday, November 28, 2007 7:21 PM >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez >> IDs >> >> >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find >> the start and end locations in the human genome. This seemed >> simple enough, so I started working through some of the >> examples for using the EntrezGene module at www.bioperl.org >> Of course this did not work because the core installation >> does not include this module. So, I think I have two choices >> (1) install the module (how?), or (2) find an easier way to >> get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research City of Hope >> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 >> USA mailto:paulhengen at coh.org >> >> -- >> View this message in context: >> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E >> ntrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Nov 30 23:37:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Nov 2007 22:37:50 -0600 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <000901c833bf$33d53500$0a02a8c0@AWALL> References: <000901c833bf$33d53500$0a02a8c0@AWALL> Message-ID: <75FF7E93-1633-4D43-9BC0-8BE2A6A7711D@uiuc.edu> Make sure to keep this on the list. ncbi_gi() is only in bioperl-live (CVS); my guess is you either somehow got 1.5.2 instead or the bioperl-live version is not found in your path. It's very likely the latter, as perl's looking for whatever else is present (which appears to be an older version of bioperl). That should give you a hint that the problem may be with your lib path. Try changing the 'Use lib '/home/awaller/bioperl-live/ Bio'' to: use lib '/home/awaller/bioperl-live'; chris On Nov 30, 2007, at 8:09 PM, alison waller wrote: > Okay so Now I'm really confused. > I edited > #!usr/bin/perl >> Use lib '/home/awaller/bioperl-live/Bio. > I ran the script below with the *special hit->ncbi from Chris. It > worked, > it was great, I got the gi! No errors, no bugs that I saw in > checking the > output. Then I went back in, edited the script to retrieve further > info > (specifically the strand). Saved it, now when I try to run it I get > the > same error message that I was previously getting. > > barrett ~ $ perl blast_parse_awcf.pl OldMoBlastxGiTest.txt 1 > Can't locate object method "ncbi_gi" via package > "Bio::Search::Hit::BlastHit" at blast_parse_awcf.pl line 50, > line > 189. > > Thanks soo much, > > > #!usr/bin/perl > > use strict; > use warnings; > use lib "/home/awaller/bioperl-live/Bio"; > use Bio::Perl; > use Bio::SearchIO; > > my $usage = "to run type: blast_parse_aw.pl <# of > hits per > query> \n"; if (@ARGV != 2) { die $usage; } > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! > \n"; > > my $report = Bio::SearchIO->new( > -file => $infile, > -format => "blast" > ); > > print OUT join("\t",qw( > Query > HitDesc > HitAccess > HitGi > HitBits > Evalue > %id > AlignLen > NumIdent > NumPos > gaps > Qframe > Qstrand > Hframe > Hstrand))."\n"; > > # Go through BLAST reports one by one > while ( my $result = $report->next_result ) { > my $ct = 0; > my @tophits = grep {$ct++ < $tophit } $result->hits; > if (scalar(@tophits) == 0) { > print OUT "no hits\n"; > } > for my $hit (@tophits) { > my $tophsp=$hit->hsp('best'); > # Print some tab-delimited data about this hit > print OUT join("\t", > $result->query_name, > $hit->description, > $hit->accession, > $hit->ncbi_gi, > $hit->bits, > $tophsp->evalue, > $tophsp->percent_identity, > $tophsp->length('total'), > $tophsp->num_identical, > $tophsp->num_conserved, > $tophsp->gaps, > $tophsp->query->frame, > $tophsp->strand('query'), > $tophsp->hit->frame, > $tophsp->strand('hit'), > )."\n"; > } > } > > > > > -----Original Message----- > From: Sendu Bala [mailto:bix at sendu.me.uk] > Sent: Friday, November 30, 2007 6:24 PM > To: alison waller > Subject: Re: [Bioperl-l] Problems installing bioperl (bioperl-live > tarball > from CVS) > > alison waller wrote: >> Thank you Sendu, >> >> So I'm trying the second option. I have downloaded the bioperl-live > tarball >> from the CVS on my windows laptop, and then moved it to my home >> directory > in >> the linux cluster where I unzipped and tared it. So I now have a > directory >> /home/awaller/bioperl-live. >> >> I edited my .bashrc file as below: >> Export PERL5LIB='/home/awaller/bioperl-live' >> >> I also edited a sample script to include: >> #!usr/bin/perl >> Use lib '/home/awaller/bioperl-live' > > Does this directory contain a 'Bio' directory with all the BioPerl > modules inside it? > > >> But it still isn't working. >> At the prompt I typed$ perl script.pl >> It gave me the warning - can't locate object method ncbi_gi which >> is why > I'm >> trying to download the CVS version as Chris Fields added code to >> make the >> ncbi-gi object. > > You'll have to give me the complete, unedited error message and > ideally > the script itself before I can help you further. > > >> Don't I have to do something similar to what the Build.PL file does? > > Probably not. It doesn't matter where your perl executable is, btw, as > long as the system knows how to run perl, which it obviously does. > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Thu Nov 1 01:27:29 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 1 Nov 2007 12:27:29 +1100 Subject: [Bioperl-l] BLAST output parsing In-Reply-To: <13519112.post@talk.nabble.com> References: <13519112.post@talk.nabble.com> Message-ID: Swapna, > I am new to bioperl. I did BLAST search of ~4000 genes and I need to parse > it. I did use -m 9 option to get a tabular information of the blast data. > But it does not include the gene names or the names of the organisms of each > hit. Are there any parsers that can do this job ?? The -m 9 tabular output does not include gene descriptions and organisms. It only includes the "gene id" that was present immediately after the ">" sign in the FASTA file that was used to create the BLAST database you specified with the -d option when you ran BLAST. Hence, no parser will help you. You either have to re-do the BLAST with a different -m value that includes the information you desire, or write code to convert your gene IDs into what you want. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From swapnatbhat at gmail.com Thu Nov 1 03:49:45 2007 From: swapnatbhat at gmail.com (swapna26) Date: Wed, 31 Oct 2007 20:49:45 -0700 (PDT) Subject: [Bioperl-l] BLAST output parsing In-Reply-To: References: <13519112.post@talk.nabble.com> Message-ID: <13523150.post@talk.nabble.com> which -m option do you think will be helpful. swapna Torsten Seemann wrote: > > Swapna, > >> I am new to bioperl. I did BLAST search of ~4000 genes and I need to >> parse >> it. I did use -m 9 option to get a tabular information of the blast >> data. >> But it does not include the gene names or the names of the organisms of >> each >> hit. Are there any parsers that can do this job ?? > > The -m 9 tabular output does not include gene descriptions and > organisms. It only includes the "gene id" that was present immediately > after the ">" sign in the FASTA file that was used to create the BLAST > database you specified with the -d option when you ran BLAST. > > Hence, no parser will help you. You either have to re-do the BLAST > with a different -m value that includes the information you desire, or > write code to convert your gene IDs into what you want. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/BLAST-output-parsing-tf4728082.html#a13523150 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From barry.moore at genetics.utah.edu Thu Nov 1 04:03:01 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 31 Oct 2007 22:03:01 -0600 Subject: [Bioperl-l] BLAST output parsing In-Reply-To: References: <13519112.post@talk.nabble.com> Message-ID: <7BDC2187-1ABE-4CA1-AB86-98D5FD5433A4@genetics.utah.edu> Swapna- If you are using NCBI fasta files you can use files from NCBIs gene database to map your gene IDs to names and organisms. Look in particular at the files gene2accession, gene2refseq, and gene_info. For example, if you had RefSeq protein IDs like NP_123456, you could use gene2refseq to map those RefSeq accessions to gene IDs and then gene_info to map the gene IDs to organisms and gene name. B On Oct 31, 2007, at 7:27 PM, Torsten Seemann wrote: > Swapna, > >> I am new to bioperl. I did BLAST search of ~4000 genes and I need >> to parse >> it. I did use -m 9 option to get a tabular information of the >> blast data. >> But it does not include the gene names or the names of the >> organisms of each >> hit. Are there any parsers that can do this job ?? > > The -m 9 tabular output does not include gene descriptions and > organisms. It only includes the "gene id" that was present immediately > after the ">" sign in the FASTA file that was used to create the BLAST > database you specified with the -d option when you ran BLAST. > > Hence, no parser will help you. You either have to re-do the BLAST > with a different -m value that includes the information you desire, or > write code to convert your gene IDs into what you want. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 09:45:43 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 10:45:43 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows Message-ID: <4729A047.2060507@mikrobio.med.uni-giessen.de> Dear all, I have emboss installed on a windows machine. (Embosswin). I can run this from the dos command line and the path is present. However, when I try to call an emboss application from bioperl I get a "Application not found error" my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); gives the following error -------------------- WARNING --------------------- MSG: Application [fuzznuc] is not available! --------------------------------------------------- Can't call method "run" on an undefined value at searchPatterns.pl line 102. Can somebody help me fix this ? best regards Rohit -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 14:22:14 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:22:14 -0400 Subject: [Bioperl-l] PAML/Codeml parsing Message-ID: PAML4 breaks our PAML parser right now because the order of things in the result file has changed. Now sequences precede the information about the version or the program run. This means that $result- >get_seqs() fails because we don't parse the sequences. We'll see what we can do, but as usual with supporting 3rd party programs it is brittle when file formats change. Th -jason -- Jason Stajich jason at bioperl.org From jason at bioperl.org Thu Nov 1 14:32:06 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 10:32:06 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Presumably the PATH is not getting set properly - you should play around printing the $ENV{PATH} variable in a perl script to see if actually contains the directory where the emboss programs are installed. Bioperl can only guess so much as to where to find an application. It is also possible that we aren't creating the proper path to the executable - you can print the executable path with print $fuzznuc->executable I believe unless it is throwing an error at the program() line. It looks like the code in the Factory object is a little fragile assuming that the programs HAVE to be in your $PATH. I don't know if windows+perl is special in any way that it run things so I can't really tell if there is specific things you have to do here. You may have to run this through cygwin in case PATH and such are just not available properly to windowsPerl. -jason On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. However, > when I > try to call > an emboss application from bioperl I get a "Application not found > error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at searchPatterns.pl > line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Thu Nov 1 14:54:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 09:54:09 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl on windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <325E8599-793F-49DC-8680-9823F9389D4C@uiuc.edu> This worked for me previously when I tested with WinXP on my old machine using EMBOSS v5: ftp://emboss.open-bio.org/pub/EMBOSS/windows I haven't tried it with EMBOSSWin (latest is v 2.7); it's probably better to use the latest EMBOSS version anyway so I suggest trying the version in the above link. I'll test it again today and let you know what I find. chris On Nov 1, 2007, at 9:32 AM, Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, >> when I >> try to call >> an emboss application from bioperl I get a "Application not found >> error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl >> line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Thu Nov 1 15:31:40 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 11:31:40 -0400 Subject: [Bioperl-l] PAML3 vs 4 Message-ID: <23575228-2FA3-4F07-BED4-4A2309A36D71@bioperl.org> Small tweaks were needed to parse PAML4 results. Pairwise Ka, Ks parsing (runmode -2) should be working more smoothly now on both PAML 3 and 4. You'll need to get the latest code from CVS in order to see the changes to Bio/Tools/Phylo/PAML.pm I've added tests for PAML4 in the parser and the run code. If you have scripts that use codeml please give it a try. I have not attempted to play with BASEML or AAML results at this point so if you also have codes that use those programs, please try it out and provide bugreports if we need to fix things. -jason -- Jason Stajich jason at bioperl.org From Kevin.M.Brown at asu.edu Thu Nov 1 17:25:30 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 1 Nov 2007 10:25:30 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperl onwindows In-Reply-To: <4729A047.2060507@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> Message-ID: <1A4207F8295607498283FE9E93B775B403EA7E06@EX02.asurite.ad.asu.edu> Sounds like a path issue. Try to tell bioperl the full path to the executable rather than just the executable name. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 2:46 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] bioperl: cannot run emboss programs > using bioperl onwindows > > Dear all, > > I have emboss installed on a windows machine. (Embosswin). I can run > this from the dos command line and the path is present. > However, when I > try to call > an emboss application from bioperl I get a "Application not > found error" > > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > gives the following error > > -------------------- WARNING --------------------- > MSG: Application [fuzznuc] is not available! > --------------------------------------------------- > Can't call method "run" on an undefined value at > searchPatterns.pl line > 102. > > Can somebody help me fix this ? > > best regards > Rohit > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 18:06:48 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:06:48 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> Message-ID: <472A15B8.7040502@mikrobio.med.uni-giessen.de> Thanks for all the suggestions... but I unfortunately still cannot run emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), and the path is set correctly. I printed $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct location. I also tried setting the path directly but I'm not sure how to do this, so I tried this... my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); this also did not work. Also tried printing... $fuzznuc->executable() gave the following error again -------------------- WARNING --------------------- MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! --------------------------------------------------- Any more ideas ? thanks ! Rohit here's the code... use strict; use Bio::Factory::EMBOSS; use Data::Dumper; # # print "PATH=$ENV{PATH}\n"; # path contains C:\EMBOSSwin which is the correct location # embossversion is 2.10.0-Win-0.8 my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper ($f); my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe as well, print Dump ($fuzznuc); #dump of fuzznuc #$VAR1 = bless( { # '_programgroup' => {}, # '_programs' => {}, # '_groups' => {} # }, 'Bio::Factory::EMBOSS' ); #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work my $infile = "temp.fasta"; my $motif = "ATGTCGATC"; my $outfile = "test.out"; $fuzznuc->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); Here's the error again.... #-------------------- WARNING --------------------- #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! #--------------------------------------------------- Jason Stajich wrote: > Presumably the PATH is not getting set properly - you should play > around printing the $ENV{PATH} variable in a perl script to see if > actually contains the directory where the emboss programs are > installed. Bioperl can only guess so much as to where to find an > application. It is also possible that we aren't creating the proper > path to the executable - you can print the executable path with > print $fuzznuc->executable > I believe unless it is throwing an error at the program() line. > > It looks like the code in the Factory object is a little fragile > assuming that the programs HAVE to be in your $PATH. I don't know if > windows+perl is special in any way that it run things so I can't > really tell if there is specific things you have to do here. You may > have to run this through cygwin in case PATH and such are just not > available properly to windowsPerl. > > -jason > On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> Dear all, >> >> I have emboss installed on a windows machine. (Embosswin). I can run >> this from the dos command line and the path is present. However, when I >> try to call >> an emboss application from bioperl I get a "Application not found error" >> >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> gives the following error >> >> -------------------- WARNING --------------------- >> MSG: Application [fuzznuc] is not available! >> --------------------------------------------------- >> Can't call method "run" on an undefined value at searchPatterns.pl line >> 102. >> >> Can somebody help me fix this ? >> >> best regards >> Rohit >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From jason at bioperl.org Thu Nov 1 18:37:24 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 1 Nov 2007 14:37:24 -0400 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <472A15B8.7040502@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> Message-ID: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> You could try this - can't test it though so not sure. my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); -jason On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > Thanks for all the suggestions... but I unfortunately still cannot run > emboss. I am running the latest version of embosswin (2.10.0- > Win-0.8), > and the > path is set correctly. I printed $ENV{$PATH} and this contains > C:\EMBOSSwin which is the correct location. > I also tried setting the path directly but I'm not sure how to do > this, > so I tried this... > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > this also did not work. > > Also tried printing... > $fuzznuc->executable() > > gave the following error again > -------------------- WARNING --------------------- > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > --------------------------------------------------- > > Any more ideas ? > > thanks ! > Rohit > > > here's the code... > > use strict; > use Bio::Factory::EMBOSS; > use Data::Dumper; > > # > # print "PATH=$ENV{PATH}\n"; > # path contains C:\EMBOSSwin which is the correct location > # embossversion is 2.10.0-Win-0.8 > > my $f = Bio::Factory::EMBOSS->new(); > # get an EMBOSS application object from the factory > print Dumper ($f); > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > fuzznuc.exe > as well, > print Dump ($fuzznuc); > > #dump of fuzznuc > #$VAR1 = bless( { > # '_programgroup' => {}, > # '_programs' => {}, > # '_groups' => {} > # }, 'Bio::Factory::EMBOSS' ); > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > my $infile = "temp.fasta"; > my $motif = "ATGTCGATC"; > my $outfile = "test.out"; > > > $fuzznuc->run( > { -sequence => $infile, > -pattern => $motif, > -outfile => $outfile > }); > > Here's the error again.... > > #-------------------- WARNING --------------------- > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > #--------------------------------------------------- > > > > > Jason Stajich wrote: >> Presumably the PATH is not getting set properly - you should play >> around printing the $ENV{PATH} variable in a perl script to see if >> actually contains the directory where the emboss programs are >> installed. Bioperl can only guess so much as to where to find an >> application. It is also possible that we aren't creating the proper >> path to the executable - you can print the executable path with >> print $fuzznuc->executable >> I believe unless it is throwing an error at the program() line. >> >> It looks like the code in the Factory object is a little fragile >> assuming that the programs HAVE to be in your $PATH. I don't know if >> windows+perl is special in any way that it run things so I can't >> really tell if there is specific things you have to do here. You may >> have to run this through cygwin in case PATH and such are just not >> available properly to windowsPerl. >> >> -jason >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >> >>> Dear all, >>> >>> I have emboss installed on a windows machine. (Embosswin). I can run >>> this from the dos command line and the path is present. However, >>> when I >>> try to call >>> an emboss application from bioperl I get a "Application not found >>> error" >>> >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> gives the following error >>> >>> -------------------- WARNING --------------------- >>> MSG: Application [fuzznuc] is not available! >>> --------------------------------------------------- >>> Can't call method "run" on an undefined value at >>> searchPatterns.pl line >>> 102. >>> >>> Can somebody help me fix this ? >>> >>> best regards >>> Rohit >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From Rohit.Ghai at mikrobio.med.uni-giessen.de Thu Nov 1 18:41:41 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Thu, 01 Nov 2007 19:41:41 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: <472A1DE5.30207@mikrobio.med.uni-giessen.de> Hi Jason I tried this as well. This also gives the same error message. -Rohit Jason Stajich wrote: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > >> >> >> Thanks for all the suggestions... but I unfortunately still cannot run >> emboss. I am running the latest version of embosswin (2.10.0-Win-0.8), >> and the >> path is set correctly. I printed $ENV{$PATH} and this contains >> C:\EMBOSSwin which is the correct location. >> I also tried setting the path directly but I'm not sure how to do this, >> so I tried this... >> >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >> >> this also did not work. >> >> Also tried printing... >> $fuzznuc->executable() >> >> gave the following error again >> -------------------- WARNING --------------------- >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> --------------------------------------------------- >> >> Any more ideas ? >> >> thanks ! >> Rohit >> >> >> here's the code... >> >> use strict; >> use Bio::Factory::EMBOSS; >> use Data::Dumper; >> >> # >> # print "PATH=$ENV{PATH}\n"; >> # path contains C:\EMBOSSwin which is the correct location >> # embossversion is 2.10.0-Win-0.8 >> >> my $f = Bio::Factory::EMBOSS->new(); >> # get an EMBOSS application object from the factory >> print Dumper ($f); >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >> print Dump ($fuzznuc); >> >> #dump of fuzznuc >> #$VAR1 = bless( { >> # '_programgroup' => {}, >> # '_programs' => {}, >> # '_groups' => {} >> # }, 'Bio::Factory::EMBOSS' ); >> >> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >> >> my $infile = "temp.fasta"; >> my $motif = "ATGTCGATC"; >> my $outfile = "test.out"; >> >> >> $fuzznuc->run( >> { -sequence => $infile, >> -pattern => $motif, >> -outfile => $outfile >> }); >> >> Here's the error again.... >> >> #-------------------- WARNING --------------------- >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >> #--------------------------------------------------- >> >> >> >> >> Jason Stajich wrote: >>> Presumably the PATH is not getting set properly - you should play >>> around printing the $ENV{PATH} variable in a perl script to see if >>> actually contains the directory where the emboss programs are >>> installed. Bioperl can only guess so much as to where to find an >>> application. It is also possible that we aren't creating the proper >>> path to the executable - you can print the executable path with >>> print $fuzznuc->executable >>> I believe unless it is throwing an error at the program() line. >>> >>> It looks like the code in the Factory object is a little fragile >>> assuming that the programs HAVE to be in your $PATH. I don't know if >>> windows+perl is special in any way that it run things so I can't >>> really tell if there is specific things you have to do here. You may >>> have to run this through cygwin in case PATH and such are just not >>> available properly to windowsPerl. >>> >>> -jason >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>> >>>> Dear all, >>>> >>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>> this from the dos command line and the path is present. However, >>>> when I >>>> try to call >>>> an emboss application from bioperl I get a "Application not found >>>> error" >>>> >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> my $fuzznuc = $f->program('fuzznuc'); >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> gives the following error >>>> >>>> -------------------- WARNING --------------------- >>>> MSG: Application [fuzznuc] is not available! >>>> --------------------------------------------------- >>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>> line >>>> 102. >>>> >>>> Can somebody help me fix this ? >>>> >>>> best regards >>>> Rohit >>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > -- Dr. Rohit Ghai Institute of Medical Microbiology Faculty of Medicine Justus-Liebig University Frankfurter Strasse 107 35392 - Giessen GERMANY Tel : 0049 (0)641-9946413 Fax : 0049 (0)641-9946409 Email: Rohit.Ghai at mikrobio.med.uni-giessen.de From MEC at stowers-institute.org Thu Nov 1 18:57:33 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Thu, 1 Nov 2007 13:57:33 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: in the code http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 there is a call to `wossname` (c.f. http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html ) is wossname in your path? Maybe it needs to be wossname.exe under windows? Malcolm Cook > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai > Sent: Thursday, November 01, 2007 1:42 PM > To: Jason Stajich > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs > usingbioperlonwindows > > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: > > You could try this - can't test it though so not sure. > > my $fuzznuc = $f->program('fuzznuc'); > > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > > > -jason > > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > >> > >> > >> Thanks for all the suggestions... but I unfortunately still cannot > >> run emboss. I am running the latest version of embosswin > >> (2.10.0-Win-0.8), and the path is set correctly. I printed > >> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct > >> location. > >> I also tried setting the path directly but I'm not sure how to do > >> this, so I tried this... > >> > >> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > >> > >> this also did not work. > >> > >> Also tried printing... > >> $fuzznuc->executable() > >> > >> gave the following error again > >> -------------------- WARNING --------------------- > >> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> --------------------------------------------------- > >> > >> Any more ideas ? > >> > >> thanks ! > >> Rohit > >> > >> > >> here's the code... > >> > >> use strict; > >> use Bio::Factory::EMBOSS; > >> use Data::Dumper; > >> > >> # > >> # print "PATH=$ENV{PATH}\n"; > >> # path contains C:\EMBOSSwin which is the correct location # > >> embossversion is 2.10.0-Win-0.8 > >> > >> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS > application > >> object from the factory print Dumper ($f); my $fuzznuc = > >> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe > as well, > >> print Dump ($fuzznuc); > >> > >> #dump of fuzznuc > >> #$VAR1 = bless( { > >> # '_programgroup' => {}, > >> # '_programs' => {}, > >> # '_groups' => {} > >> # }, 'Bio::Factory::EMBOSS' ); > >> > >> #print "executing -- >", $fuzznuc->executable, "\n" ; # > doesn't work > >> > >> my $infile = "temp.fasta"; > >> my $motif = "ATGTCGATC"; > >> my $outfile = "test.out"; > >> > >> > >> $fuzznuc->run( > >> { -sequence => $infile, > >> -pattern => $motif, > >> -outfile => $outfile > >> }); > >> > >> Here's the error again.... > >> > >> #-------------------- WARNING --------------------- > >> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > >> #--------------------------------------------------- > >> > >> > >> > >> > >> Jason Stajich wrote: > >>> Presumably the PATH is not getting set properly - you should play > >>> around printing the $ENV{PATH} variable in a perl script > to see if > >>> actually contains the directory where the emboss programs are > >>> installed. Bioperl can only guess so much as to where to find an > >>> application. It is also possible that we aren't creating > the proper > >>> path to the executable - you can print the executable path with > >>> print $fuzznuc->executable I believe unless it is > throwing an error > >>> at the program() line. > >>> > >>> It looks like the code in the Factory object is a little fragile > >>> assuming that the programs HAVE to be in your $PATH. I > don't know > >>> if > >>> windows+perl is special in any way that it run things so I can't > >>> really tell if there is specific things you have to do > here. You may > >>> have to run this through cygwin in case PATH and such are > just not > >>> available properly to windowsPerl. > >>> > >>> -jason > >>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >>> > >>>> Dear all, > >>>> > >>>> I have emboss installed on a windows machine. (Embosswin). I can > >>>> run this from the dos command line and the path is present. > >>>> However, when I try to call an emboss application from bioperl I > >>>> get a "Application not found error" > >>>> > >>>> > >>>> my $f = Bio::Factory::EMBOSS->new(); > >>>> # get an EMBOSS application object from the factory > >>>> my $fuzznuc = $f->program('fuzznuc'); > >>>> $fuzznuc->run( > >>>> { -sequence => $infile, > >>>> -pattern => $motif, > >>>> -outfile => $outfile > > >>>> }); > >>>> gives the following error > >>>> > >>>> -------------------- WARNING --------------------- > >>>> MSG: Application [fuzznuc] is not available! > >>>> --------------------------------------------------- > >>>> Can't call method "run" on an undefined value at > searchPatterns.pl > >>>> line 102. > >>>> > >>>> Can somebody help me fix this ? > >>>> > >>>> best regards > >>>> Rohit > >>>> > >>>> -- > >>>> > >>>> Dr. Rohit Ghai > >>>> Institute of Medical Microbiology > >>>> Faculty of Medicine > >>>> Justus-Liebig University > >>>> Frankfurter Strasse 107 > >>>> 35392 - Giessen > >>>> GERMANY > >>>> > >>>> Tel : 0049 (0)641-9946413 > >>>> Fax : 0049 (0)641-9946409 > >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> -- > >>> Jason Stajich > >>> jason at bioperl.org > >>> > >> > >> -- > >> > >> Dr. Rohit Ghai > >> Institute of Medical Microbiology > >> Faculty of Medicine > >> Justus-Liebig University > >> Frankfurter Strasse 107 > >> 35392 - Giessen > >> GERMANY > >> > >> Tel : 0049 (0)641-9946413 > >> Fax : 0049 (0)641-9946409 > >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > jason at bioperl.org > > > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arareko at campus.iztacala.unam.mx Thu Nov 1 19:51:41 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Nov 2007 13:51:41 -0600 Subject: [Bioperl-l] bioperl: cannot run emboss programs usingbioperlonwindows In-Reply-To: References: <4729A047.2060507@mikrobio.med.uni-giessen.de><80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org><472A15B8.7040502@mikrobio.med.uni-giessen.de><6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <472A2E4D.8080903@campus.iztacala.unam.mx> Doesn't EMBOSS binaries live under 'bin'? Perhaps setting PATH=$ENV{PATH} to 'C:\EMBOSSwin\bin' or using this: my $fuzznuc = $f->program('fuzznuc'); $fuzznuc->executable('C:\EMBOSSwin\bin\fuzznuc'); Adding .exe might be worth trying as well. Mauricio. Cook, Malcolm wrote: > in the code > http://doc.bioperl.org/bioperl-run/Bio/Factory/EMBOSS.html#CODE6 > > there is a call to `wossname` (c.f. > http://emboss.sourceforge.net/apps/release/4.0/emboss/apps/wossname.html > ) > > is wossname in your path? > > Maybe it needs to be wossname.exe under windows? > > > Malcolm Cook > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rohit Ghai >> Sent: Thursday, November 01, 2007 1:42 PM >> To: Jason Stajich >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] bioperl: cannot run emboss programs >> usingbioperlonwindows >> >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot >>>> run emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), and the path is set correctly. I printed >>>> $ENV{$PATH} and this contains C:\EMBOSSwin which is the correct >>>> location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location # >>>> embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS >> application >>>> object from the factory print Dumper ($f); my $fuzznuc = >>>> $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried fuzznuc.exe >> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # >> doesn't work >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script >> to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating >> the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable I believe unless it is >> throwing an error >>>>> at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I >> don't know >>>>> if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do >> here. You may >>>>> have to run this through cygwin in case PATH and such are >> just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can >>>>>> run this from the dos command line and the path is present. >>>>>> However, when I try to call an emboss application from bioperl I >>>>>> get a "Application not found error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >> >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at >> searchPatterns.pl >>>>>> line 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> >>>>>> Dr. Rohit Ghai >>>>>> Institute of Medical Microbiology >>>>>> Faculty of Medicine >>>>>> Justus-Liebig University >>>>>> Frankfurter Strasse 107 >>>>>> 35392 - Giessen >>>>>> GERMANY >>>>>> >>>>>> Tel : 0049 (0)641-9946413 >>>>>> Fax : 0049 (0)641-9946409 >>>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>> -- >>>> >>>> Dr. Rohit Ghai >>>> Institute of Medical Microbiology >>>> Faculty of Medicine >>>> Justus-Liebig University >>>> Frankfurter Strasse 107 >>>> 35392 - Giessen >>>> GERMANY >>>> >>>> Tel : 0049 (0)641-9946413 >>>> Fax : 0049 (0)641-9946409 >>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> -- >> >> Dr. Rohit Ghai >> Institute of Medical Microbiology >> Faculty of Medicine >> Justus-Liebig University >> Frankfurter Strasse 107 >> 35392 - Giessen >> GERMANY >> >> Tel : 0049 (0)641-9946413 >> Fax : 0049 (0)641-9946409 >> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Nov 1 20:07:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Nov 2007 15:07:39 -0500 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlonwindows In-Reply-To: <472A1DE5.30207@mikrobio.med.uni-giessen.de> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> Message-ID: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> I did a little investigating using my old PC and was able to get fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a hoop or two but I managed to get it working. First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. You need to remove EMBOSSWin and install the one I linked to previously (this is an actual EMBOSS beta release). It's possible older EMBOSSWin can be configured, but I don't plan on checking it out myself. Next, you need to ensure the binaries are in your PATH env. variable (test by running 'wossname' on the command line), then set EMBOSS_DATA to point at the EMBOSS data directory using a UNIX-like path (i.e. 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP recognizes the UNIX'y form as a valid path. If you don't know how to set env. variables go here: http://vlaurie.com/computers2/Articles/environment.htm Once that is set up you should be able to run the script using the latest (greatest?) EMBOSS. chris On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > Hi Jason > > I tried this as well. This also gives the same error message. > > -Rohit > > Jason Stajich wrote: >> You could try this - can't test it though so not sure. >> my $fuzznuc = $f->program('fuzznuc'); >> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >> >> -jason >> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >> >>> >>> >>> Thanks for all the suggestions... but I unfortunately still >>> cannot run >>> emboss. I am running the latest version of embosswin (2.10.0- >>> Win-0.8), >>> and the >>> path is set correctly. I printed $ENV{$PATH} and this contains >>> C:\EMBOSSwin which is the correct location. >>> I also tried setting the path directly but I'm not sure how to do >>> this, >>> so I tried this... >>> >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>> >>> this also did not work. >>> >>> Also tried printing... >>> $fuzznuc->executable() >>> >>> gave the following error again >>> -------------------- WARNING --------------------- >>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> --------------------------------------------------- >>> >>> Any more ideas ? >>> >>> thanks ! >>> Rohit >>> >>> >>> here's the code... >>> >>> use strict; >>> use Bio::Factory::EMBOSS; >>> use Data::Dumper; >>> >>> # >>> # print "PATH=$ENV{PATH}\n"; >>> # path contains C:\EMBOSSwin which is the correct location >>> # embossversion is 2.10.0-Win-0.8 >>> >>> my $f = Bio::Factory::EMBOSS->new(); >>> # get an EMBOSS application object from the factory >>> print Dumper ($f); >>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>> fuzznuc.exe >>> as well, >>> print Dump ($fuzznuc); >>> >>> #dump of fuzznuc >>> #$VAR1 = bless( { >>> # '_programgroup' => {}, >>> # '_programs' => {}, >>> # '_groups' => {} >>> # }, 'Bio::Factory::EMBOSS' ); >>> >>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't >>> work >>> >>> my $infile = "temp.fasta"; >>> my $motif = "ATGTCGATC"; >>> my $outfile = "test.out"; >>> >>> >>> $fuzznuc->run( >>> { -sequence => $infile, >>> -pattern => $motif, >>> -outfile => $outfile >>> }); >>> >>> Here's the error again.... >>> >>> #-------------------- WARNING --------------------- >>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>> #--------------------------------------------------- >>> >>> >>> >>> >>> Jason Stajich wrote: >>>> Presumably the PATH is not getting set properly - you should play >>>> around printing the $ENV{PATH} variable in a perl script to see if >>>> actually contains the directory where the emboss programs are >>>> installed. Bioperl can only guess so much as to where to find an >>>> application. It is also possible that we aren't creating the >>>> proper >>>> path to the executable - you can print the executable path with >>>> print $fuzznuc->executable >>>> I believe unless it is throwing an error at the program() line. >>>> >>>> It looks like the code in the Factory object is a little fragile >>>> assuming that the programs HAVE to be in your $PATH. I don't >>>> know if >>>> windows+perl is special in any way that it run things so I can't >>>> really tell if there is specific things you have to do here. You >>>> may >>>> have to run this through cygwin in case PATH and such are just not >>>> available properly to windowsPerl. >>>> >>>> -jason >>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>> >>>>> Dear all, >>>>> >>>>> I have emboss installed on a windows machine. (Embosswin). I >>>>> can run >>>>> this from the dos command line and the path is present. However, >>>>> when I >>>>> try to call >>>>> an emboss application from bioperl I get a "Application not found >>>>> error" >>>>> >>>>> >>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>> # get an EMBOSS application object from the factory >>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>> $fuzznuc->run( >>>>> { -sequence => $infile, >>>>> -pattern => $motif, >>>>> -outfile => $outfile >>>>> }); >>>>> gives the following error >>>>> >>>>> -------------------- WARNING --------------------- >>>>> MSG: Application [fuzznuc] is not available! >>>>> --------------------------------------------------- >>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>> line >>>>> 102. >>>>> >>>>> Can somebody help me fix this ? >>>>> >>>>> best regards >>>>> Rohit >>>>> >>>>> -- >>>>> >>>>> Dr. Rohit Ghai >>>>> Institute of Medical Microbiology >>>>> Faculty of Medicine >>>>> Justus-Liebig University >>>>> Frankfurter Strasse 107 >>>>> 35392 - Giessen >>>>> GERMANY >>>>> >>>>> Tel : 0049 (0)641-9946413 >>>>> Fax : 0049 (0)641-9946409 >>>>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason at bioperl.org >>>> >>> >>> -- >>> >>> Dr. Rohit Ghai >>> Institute of Medical Microbiology >>> Faculty of Medicine >>> Justus-Liebig University >>> Frankfurter Strasse 107 >>> 35392 - Giessen >>> GERMANY >>> >>> Tel : 0049 (0)641-9946413 >>> Fax : 0049 (0)641-9946409 >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> > > -- > > Dr. Rohit Ghai > Institute of Medical Microbiology > Faculty of Medicine > Justus-Liebig University > Frankfurter Strasse 107 > 35392 - Giessen > GERMANY > > Tel : 0049 (0)641-9946413 > Fax : 0049 (0)641-9946409 > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From neetisomaiya at gmail.com Fri Nov 2 04:20:27 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Fri, 2 Nov 2007 09:50:27 +0530 Subject: [Bioperl-l] need help Message-ID: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Hi, This is a perl question, not bioperl. Can anyone point me to a perl program/code/function which can calculate the number of days between any two given dates. Any help will be deeply appreciated. Thanks. -- -Neeti Even my blood says, B positive From whs at ebi.ac.uk Fri Nov 2 05:01:20 2007 From: whs at ebi.ac.uk (Will Spooner) Date: Fri, 2 Nov 2007 05:01:20 +0000 (GMT) Subject: [Bioperl-l] need help In-Reply-To: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> References: <764978cf0711012120o11010624r5a43e51d33b25e75@mail.gmail.com> Message-ID: Hi Neeti, A non-bioperl answer to your perl questio; Date::Calc should do the trick. Will On Fri, 2 Nov 2007, neeti somaiya wrote: > Hi, > > This is a perl question, not bioperl. > Can anyone point me to a perl program/code/function which can calculate the > number of days between any two given dates. > Any help will be deeply appreciated. > Thanks. > > From smarkel at accelrys.com Sat Nov 3 06:01:38 2007 From: smarkel at accelrys.com (Scott Markel) Date: Fri, 2 Nov 2007 23:01:38 -0700 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> Message-ID: I set multiple environment variables in my code. $ENV{EMBOSS_ROOT} = $embossPath; $ENV{EMBOSS_ACDROOT} = File::Spec->catdir($embossPath, "acd"); $ENV{EMBOSS_DB_DIR} = File::Spec->catdir($embossPath, "test"); $ENV{EMBOSS_DATA} = File::Spec->catdir($embossPath, "data"); $ENV{PATH} = $embossPath; I found it necessary to set both PATH and EMBOSS_ROOT. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com bioperl-l-bounces at lists.open-bio.org wrote on 01.11.2007 11:37:24: > You could try this - can't test it though so not sure. > my $fuzznuc = $f->program('fuzznuc'); > $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); > > -jason > On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: > > > > > > > Thanks for all the suggestions... but I unfortunately still cannot run > > emboss. I am running the latest version of embosswin (2.10.0- > > Win-0.8), > > and the > > path is set correctly. I printed $ENV{$PATH} and this contains > > C:\EMBOSSwin which is the correct location. > > I also tried setting the path directly but I'm not sure how to do > > this, > > so I tried this... > > > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); > > > > this also did not work. > > > > Also tried printing... > > $fuzznuc->executable() > > > > gave the following error again > > -------------------- WARNING --------------------- > > MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > --------------------------------------------------- > > > > Any more ideas ? > > > > thanks ! > > Rohit > > > > > > here's the code... > > > > use strict; > > use Bio::Factory::EMBOSS; > > use Data::Dumper; > > > > # > > # print "PATH=$ENV{PATH}\n"; > > # path contains C:\EMBOSSwin which is the correct location > > # embossversion is 2.10.0-Win-0.8 > > > > my $f = Bio::Factory::EMBOSS->new(); > > # get an EMBOSS application object from the factory > > print Dumper ($f); > > my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried > > fuzznuc.exe > > as well, > > print Dump ($fuzznuc); > > > > #dump of fuzznuc > > #$VAR1 = bless( { > > # '_programgroup' => {}, > > # '_programs' => {}, > > # '_groups' => {} > > # }, 'Bio::Factory::EMBOSS' ); > > > > #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work > > > > my $infile = "temp.fasta"; > > my $motif = "ATGTCGATC"; > > my $outfile = "test.out"; > > > > > > $fuzznuc->run( > > { -sequence => $infile, > > -pattern => $motif, > > -outfile => $outfile > > }); > > > > Here's the error again.... > > > > #-------------------- WARNING --------------------- > > #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! > > #--------------------------------------------------- > > > > > > > > > > Jason Stajich wrote: > >> Presumably the PATH is not getting set properly - you should play > >> around printing the $ENV{PATH} variable in a perl script to see if > >> actually contains the directory where the emboss programs are > >> installed. Bioperl can only guess so much as to where to find an > >> application. It is also possible that we aren't creating the proper > >> path to the executable - you can print the executable path with > >> print $fuzznuc->executable > >> I believe unless it is throwing an error at the program() line. > >> > >> It looks like the code in the Factory object is a little fragile > >> assuming that the programs HAVE to be in your $PATH. I don't know if > >> windows+perl is special in any way that it run things so I can't > >> really tell if there is specific things you have to do here. You may > >> have to run this through cygwin in case PATH and such are just not > >> available properly to windowsPerl. > >> > >> -jason > >> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: > >> > >>> Dear all, > >>> > >>> I have emboss installed on a windows machine. (Embosswin). I can run > >>> this from the dos command line and the path is present. However, > >>> when I > >>> try to call > >>> an emboss application from bioperl I get a "Application not found > >>> error" > >>> > >>> > >>> my $f = Bio::Factory::EMBOSS->new(); > >>> # get an EMBOSS application object from the factory > >>> my $fuzznuc = $f->program('fuzznuc'); > >>> $fuzznuc->run( > >>> { -sequence => $infile, > >>> -pattern => $motif, > >>> -outfile => $outfile > >>> }); > >>> gives the following error > >>> > >>> -------------------- WARNING --------------------- > >>> MSG: Application [fuzznuc] is not available! > >>> --------------------------------------------------- > >>> Can't call method "run" on an undefined value at > >>> searchPatterns.pl line > >>> 102. > >>> > >>> Can somebody help me fix this ? > >>> > >>> best regards > >>> Rohit > >>> > >>> -- > >>> > >>> Dr. Rohit Ghai > >>> Institute of Medical Microbiology > >>> Faculty of Medicine > >>> Justus-Liebig University > >>> Frankfurter Strasse 107 > >>> 35392 - Giessen > >>> GERMANY > >>> > >>> Tel : 0049 (0)641-9946413 > >>> Fax : 0049 (0)641-9946409 > >>> Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > > > > -- > > > > Dr. Rohit Ghai > > Institute of Medical Microbiology > > Faculty of Medicine > > Justus-Liebig University > > Frankfurter Strasse 107 > > 35392 - Giessen > > GERMANY > > > > Tel : 0049 (0)641-9946413 > > Fax : 0049 (0)641-9946409 > > Email: Rohit.Ghai at mikrobio.med.uni-giessen.de > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Rohit.Ghai at mikrobio.med.uni-giessen.de Sat Nov 3 14:07:52 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Sat, 03 Nov 2007 15:07:52 +0100 Subject: [Bioperl-l] bioperl: cannot run emboss programs using bioperlon windows In-Reply-To: <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> References: <4729A047.2060507@mikrobio.med.uni-giessen.de> <80BA54B5-72E6-4A5B-A124-D73256644DC9@bioperl.org> <472A15B8.7040502@mikrobio.med.uni-giessen.de> <6968D1EB-FED3-463D-AF12-74A7D7F2FF3C@bioperl.org> <472A1DE5.30207@mikrobio.med.uni-giessen.de> <28223F7B-045A-4CC7-8FE7-583D0F8F7D44@uiuc.edu> Message-ID: <472C80B8.9050601@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing #however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work #$fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. #however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; Chris Fields wrote: > I did a little investigating using my old PC and was able to get > fuzznuc to run using BioPerl and EMBOSS v5. I had to jump through a > hoop or two but I managed to get it working. > > First, realize that EMBOSSWin is NOT the latest EMBOSS for Windows. > You need to remove EMBOSSWin and install the one I linked to > previously (this is an actual EMBOSS beta release). It's possible > older EMBOSSWin can be configured, but I don't plan on checking it out > myself. > > Next, you need to ensure the binaries are in your PATH env. variable > (test by running 'wossname' on the command line), then set EMBOSS_DATA > to point at the EMBOSS data directory using a UNIX-like path (i.e. > 'C:/mEMBOSS/data'); regular Win32 paths didn't work for me and WinXP > recognizes the UNIX'y form as a valid path. If you don't know how to > set env. variables go here: > > http://vlaurie.com/computers2/Articles/environment.htm > > Once that is set up you should be able to run the script using the > latest (greatest?) EMBOSS. > > chris > > On Nov 1, 2007, at 1:41 PM, Rohit Ghai wrote: > >> Hi Jason >> >> I tried this as well. This also gives the same error message. >> >> -Rohit >> >> Jason Stajich wrote: >>> You could try this - can't test it though so not sure. >>> my $fuzznuc = $f->program('fuzznuc'); >>> $fuzznuc->executable('C:\EMBOSSwin\fuzznuc'); >>> >>> -jason >>> On Nov 1, 2007, at 2:06 PM, Rohit Ghai wrote: >>> >>>> >>>> >>>> Thanks for all the suggestions... but I unfortunately still cannot run >>>> emboss. I am running the latest version of embosswin >>>> (2.10.0-Win-0.8), >>>> and the >>>> path is set correctly. I printed $ENV{$PATH} and this contains >>>> C:\EMBOSSwin which is the correct location. >>>> I also tried setting the path directly but I'm not sure how to do >>>> this, >>>> so I tried this... >>>> >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); >>>> >>>> this also did not work. >>>> >>>> Also tried printing... >>>> $fuzznuc->executable() >>>> >>>> gave the following error again >>>> -------------------- WARNING --------------------- >>>> MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> --------------------------------------------------- >>>> >>>> Any more ideas ? >>>> >>>> thanks ! >>>> Rohit >>>> >>>> >>>> here's the code... >>>> >>>> use strict; >>>> use Bio::Factory::EMBOSS; >>>> use Data::Dumper; >>>> >>>> # >>>> # print "PATH=$ENV{PATH}\n"; >>>> # path contains C:\EMBOSSwin which is the correct location >>>> # embossversion is 2.10.0-Win-0.8 >>>> >>>> my $f = Bio::Factory::EMBOSS->new(); >>>> # get an EMBOSS application object from the factory >>>> print Dumper ($f); >>>> my $fuzznuc = $f->program('C:\\EMBOSSwin\\fuzznuc'); #tried >>>> fuzznuc.exe >>>> as well, >>>> print Dump ($fuzznuc); >>>> >>>> #dump of fuzznuc >>>> #$VAR1 = bless( { >>>> # '_programgroup' => {}, >>>> # '_programs' => {}, >>>> # '_groups' => {} >>>> # }, 'Bio::Factory::EMBOSS' ); >>>> >>>> #print "executing -- >", $fuzznuc->executable, "\n" ; # doesn't work >>>> >>>> my $infile = "temp.fasta"; >>>> my $motif = "ATGTCGATC"; >>>> my $outfile = "test.out"; >>>> >>>> >>>> $fuzznuc->run( >>>> { -sequence => $infile, >>>> -pattern => $motif, >>>> -outfile => $outfile >>>> }); >>>> >>>> Here's the error again.... >>>> >>>> #-------------------- WARNING --------------------- >>>> #MSG: Application [C:\EMBOSSwin\fuzznuc] is not available! >>>> #--------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Jason Stajich wrote: >>>>> Presumably the PATH is not getting set properly - you should play >>>>> around printing the $ENV{PATH} variable in a perl script to see if >>>>> actually contains the directory where the emboss programs are >>>>> installed. Bioperl can only guess so much as to where to find an >>>>> application. It is also possible that we aren't creating the proper >>>>> path to the executable - you can print the executable path with >>>>> print $fuzznuc->executable >>>>> I believe unless it is throwing an error at the program() line. >>>>> >>>>> It looks like the code in the Factory object is a little fragile >>>>> assuming that the programs HAVE to be in your $PATH. I don't know if >>>>> windows+perl is special in any way that it run things so I can't >>>>> really tell if there is specific things you have to do here. You may >>>>> have to run this through cygwin in case PATH and such are just not >>>>> available properly to windowsPerl. >>>>> >>>>> -jason >>>>> On Nov 1, 2007, at 5:45 AM, Rohit Ghai wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> I have emboss installed on a windows machine. (Embosswin). I can run >>>>>> this from the dos command line and the path is present. However, >>>>>> when I >>>>>> try to call >>>>>> an emboss application from bioperl I get a "Application not found >>>>>> error" >>>>>> >>>>>> >>>>>> my $f = Bio::Factory::EMBOSS->new(); >>>>>> # get an EMBOSS application object from the factory >>>>>> my $fuzznuc = $f->program('fuzznuc'); >>>>>> $fuzznuc->run( >>>>>> { -sequence => $infile, >>>>>> -pattern => $motif, >>>>>> -outfile => $outfile >>>>>> }); >>>>>> gives the following error >>>>>> >>>>>> -------------------- WARNING --------------------- >>>>>> MSG: Application [fuzznuc] is not available! >>>>>> --------------------------------------------------- >>>>>> Can't call method "run" on an undefined value at searchPatterns.pl >>>>>> line >>>>>> 102. >>>>>> >>>>>> Can somebody help me fix this ? >>>>>> >>>>>> best regards >>>>>> Rohit >>>>>> >>>>>> -- >>>>>> > > From hlapp at gmx.net Sun Nov 4 17:42:13 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 4 Nov 2007 12:42:13 -0500 Subject: [Bioperl-l] question -- Bio::SeqFeature::Gene::Transcript In-Reply-To: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> References: <0918983F-BF45-4466-AF5C-8F1ACAE5EAE2@uni-potsdam.de> Message-ID: <62FB6DE1-3F1D-428C-B108-4CF9EEB67DDD@gmx.net> Hi Stefanie, sorry for taking so long to respond - your email got buried in a pile while I was away on travel. The Bio::SeqFeature::Gene::* modules were written mostly with the motivation to have a model that can represent the results of gene predictors. GenBank AFAIK doesn't annotate introns explicitly, though they should be implicit from cDNA (or mRNA? or gene, as you say) features on genomic sequence. The Bioperl SeqIO parsers won't transform those into a Bio::SeqFeature::Gene-based model, but instead will yield just plain Bio::SeqFeatureI objects in a flat array. It's up to subsequent processing to build these into more hierarchical models. I'm not sure whether someone's done this already for GenBank-type feature tables. There is a Unflattener that at least attempts to build a feature hierarchy from the flat array that's compliant with the Sequence Ontology (or so I recall). I'm copying the list in case others have additional suggestions. -hilmar On Oct 25, 2007, at 3:40 AM, Stefanie Hartmann wrote: > > > Hello Hilmar, > > I have a question about your bioperl module > Bio::SeqFeature::Gene::Transcript: > > I can't figure out how to generate the $gene object for use in this > line: > @introns = $gene->introns(); > > The data I'm working with is a local file in genbank format, and > I'm interested in extracting intron sequences (and maybe flanking > exons) for certain genes. I have been trying to get the introns via > the sequence features ('CDS' or 'gene'), but this has not been > working. Which approach will I have to take? > I'd be very grateful if you could point me into the right direction! > > Hope things are going well in Durham! And thank you in advance! > > Stefanie > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From downloadondemand at gmail.com Sun Nov 4 18:39:42 2007 From: downloadondemand at gmail.com (download on demand) Date: Sun, 4 Nov 2007 20:39:42 +0200 Subject: [Bioperl-l] Help with Bio::SeqIO Message-ID: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Hi to all. I have a problem with a simplest script: use Bio::SeqIO; # get command-line arguments, or die with a usage statement my $usage = "x2y.pl infile infileformat outfile outfileformat\n"; my $infile = shift or die $usage; my $infileformat = shift or die $usage; # my $outfile = shift or die $usage; my $outfileformat = shift or die $usage; # create one SeqIO object to read in,and another to write out my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, '-format' => $outfileformat); # write each entry in the input file to the output file while (my $inseq = $seq_in->next_seq) { # $seq_out->write_seq($inseq); # Whole sequence not needed for my $feat_object ($inseq->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->get_tag_values('product'),"\n"; print $feat_object->location->start,"..",$feat_object->location->end,"\n"; print $feat_object->spliced_seq->seq,"\n\n"; } } The result seems OK to me, but in case of first CDS of NC_005213.gbk from here the output is wrong: It is: hypothetical protein 1..490885 TAAATGCGATTGCTATTAGAA..................................Truncated sequence................................... Should be: hypothetical protein 879..490883 ATGCGATTGCTATTAGAA...................................Truncated sequence....................................TAA This CDS have an unnatural location string: CDS complement(join(490883..490885,1..879)), but spliced_seq should handle these things? Please help me! Best regards, N. From cjfields at uiuc.edu Mon Nov 5 00:08:34 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 4 Nov 2007 18:08:34 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> Message-ID: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Pass in (-nosort => 1) to spliced_seq: print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; This ensures no sorting of sublocations occurs, if you want for instance typical GenBank/EMBL 'join' behavior. To the other devs: shouldn't -nosort be the default behavior when the split location is a 'join'? In other words, should spliced_seq() be modified to take into account the split location type when returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly indicates the order of the sequences is important when joined together; the current behavior is more like that for 'order'. chris On Nov 4, 2007, at 12:39 PM, download on demand wrote: > Hi to all. > > I have a problem with a simplest script: > > > > use Bio::SeqIO; > # get command-line arguments, or die with a usage statement > my $usage = "x2y.pl infile infileformat outfile > outfileformat\n"; > my $infile = shift or die $usage; > my $infileformat = shift or die $usage; > # my $outfile = shift or die $usage; > my $outfileformat = shift or die $usage; > > # create one SeqIO object to read in,and another to write out > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => $infileformat); > my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, > '-format' => $outfileformat); > > # write each entry in the input file to the output file > while (my $inseq = $seq_in->next_seq) { > > # $seq_out->write_seq($inseq); # Whole sequence not needed > > for my $feat_object ($inseq->get_SeqFeatures) > { > if ($feat_object->primary_tag eq "CDS") > { > print $feat_object->get_tag_values('product'),"\n"; > print > $feat_object->location->start,"..",$feat_object->location->end,"\n"; > print $feat_object->spliced_seq->seq,"\n\n"; > } > } > > > > The result seems OK to me, but in case of first CDS of > NC_005213.gbk from > here > the > output is wrong: > > It is: > hypothetical protein > 1..490885 > TAAATGCGATTGCTATTAGAA..................................Truncated > sequence................................... > > Should be: > hypothetical protein > 879..490883 > ATGCGATTGCTATTAGAA...................................Truncated > sequence....................................TAA > > > > This CDS have an unnatural location string: > CDS complement(join(490883..490885,1..879)), but > spliced_seq > should handle these things? > > Please help me! > Best regards, N. > _______________________________________________ > From jean-luc.jany at univ-brest.fr Mon Nov 5 08:26:52 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Mon, 05 Nov 2007 09:26:52 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <472ED3CC.2050305@univ-brest.fr> Dear Bioperl and Mac users, I am a Mac user and would like to run a script I made using Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate to Bioperl the pathway to Blastall and other executables. I read carefully the following link http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the path to Blast, but I guess the way to proceed is slightly different in Mac and that I should not create .ncbirc and .bashrc files (e.g. should I modify the .profile file instead of .bashrc?) Actually, my blast file is in myname directory and comprises a /bin and a /data file. I have got my blastall and other executables in myname/blast/bin/blastall. Thank you in anticipation for your help. Jean-Luc From Rohit.Ghai at mikrobio.med.uni-giessen.de Mon Nov 5 11:36:16 2007 From: Rohit.Ghai at mikrobio.med.uni-giessen.de (Rohit Ghai) Date: Mon, 05 Nov 2007 12:36:16 +0100 Subject: [Bioperl-l] bioperl and emboss on windows Message-ID: <472F0030.7040200@mikrobio.med.uni-giessen.de> Dear all, thanks for all the different inputs on this topic, I was able to run emboss applications on windows (vista), but with the following workaround. Chris suggested to remove EMBOSSwin and get another version. This I did. Scott suggested setting all the variables within the program. This I also tried, but actually these were already available to the program so this was also not the problem. The following line... my $fuzznuc = $f->program('fuzznuc') doesn't return a Bio::Tools::Run::EMBOSSApplication object. but using Bio::Tools::Run::EMBOSSApplication directly seems to work. It doesn't have any path issues. What is also curious is that $f->version returns the correct version of emboss running (no path problems here), and it looks like it runs the command "embossversion -auto" to get this information. If it can get at this command, its a bit peculiar why it cannot get the other programs. Or am I missing something here ? Please take a look at the code, I have commented within this... -Rohit use Bio::Factory::EMBOSS; use Data::Dumper; use Bio::Tools::Run::EMBOSSApplication; my $infile = "test.fasta"; my $motif = "AGGAGG"; my $outfile = "test.out"; my $f = Bio::Factory::EMBOSS->new(); # get an EMBOSS application object from the factory print Dumper $f; print "location=",$f->location,"\n"; #returns local print "version=", $f->version,"\n"; # this returns the correct version 5.0 (uses embossversion -auto internally, and seems to know where it is) print "info=", $f->program_info('fuzznuc'),"\n"; #returns nothing print "list=",$f->_program_list,"\n"; #returns nothing # # however, my $fuzznuc = $f->program('fuzznuc'); or with path / or \\ or with exe suffix doesn't work # $fuzznuc->executable('C:/mEMBOSS/fuzznuc'); # doesnt work # the problem is that it does not return a Bio::Tools::Run::EMBOSSApplication object. # # # # however, creating a EMBOSSApplication object directly makes it possible to run the program # my $application = Bio::Tools::Run::EMBOSSApplication->new(); $application->name('fuzznuc'); print Dumper $application; $application->run( { -sequence => $infile, -pattern => $motif, -outfile => $outfile }); print "Done\n"; exit; From neetisomaiya at gmail.com Mon Nov 5 12:20:04 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 5 Nov 2007 17:50:04 +0530 Subject: [Bioperl-l] perl question Message-ID: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Again a perl question, and maybe a very trivial one. How do I terminate a number like 3.1232010098 to only 3 decimal places in perl? -- -Neeti Even my blood says, B positive From biology0046 at hotmail.com Mon Nov 5 12:16:13 2007 From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=) Date: Mon, 05 Nov 2007 12:16:13 +0000 Subject: [Bioperl-l] how to extract intron information from gff files. Message-ID: Dear all: i got a poplar genome gff file like this: LG_I src exon 2598 3280 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 2598 3280 . - 0 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 4 LG_I src start_codon 3278 3280 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src stop_codon 2598 2600 . - 0 name "fgenesh1_pg.C_LG_I000001" LG_I src exon 3544 3918 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 3544 3918 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 3 LG_I src exon 4258 4740 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 4258 4740 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 2 LG_I src exon 5344 6388 . - . name "fgenesh1_pg.C_LG_I000001"; transcriptId 62649 LG_I src CDS 5344 6388 . - 2 name "fgenesh1_pg.C_LG_I000001"; proteinId 62649; exonNumber 1 LG_I src exon 8259 8528 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8259 8528 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 3 LG_I src stop_codon 8259 8261 . - 0 name "fgenesh1_pg.C_LG_I000002" LG_I src exon 8897 8987 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 8897 8987 . - 0 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 2 LG_I src exon 9831 9892 . - . name "fgenesh1_pg.C_LG_I000002"; transcriptId 62650 LG_I src CDS 9831 9892 . - 1 name "fgenesh1_pg.C_LG_I000002"; proteinId 62650; exonNumber 1 LG_I src start_codon 9890 9892 . - 0 name "fgenesh1_pg.C_LG_I000002" I try to use Bio::DB::GFF, but this module only applies to methods given in the gff file. what i want to get is "intron, 5utr, 3utr", but this information do not contain in this gff file. how can i get these information through bioperl? This file do not contain intron information if i consider gaps between exons as introns, non cds parts of the first and last exon as utrs, how can i extract them through this gff file. Thanks~~ Wenkai _________________________________________________________________ ??????????????? MSN Hotmail? http://www.hotmail.com From spiros at lokku.com Mon Nov 5 12:36:36 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Mon, 5 Nov 2007 12:36:36 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: Hey, use the `sprintf` function. More information can be found at , http://perldoc.perl.org/functions/sprintf.html. For more proper rounding, you could use the Math::Round module, http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm. hope this helps, spiros On 11/5/07, neeti somaiya wrote: > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ak at ebi.ac.uk Mon Nov 5 12:43:06 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 12:43:06 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <20071105124305.GC4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? When displaying: printf( "The number is %.3f\n", $number ); When making a string: my $string = sprintf( "%.3f", $number ); BTW, this is cutting, not rounding. Cheers, Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From t.nugent at cs.ucl.ac.uk Mon Nov 5 12:37:15 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Mon, 05 Nov 2007 12:37:15 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F0E7B.60303@cs.ucl.ac.uk> Use Math:Round and nearest_ceil: http://search.cpan.org/~grommel/Math-Round-0.05/Round.pm neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? > > -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From bix at sendu.me.uk Mon Nov 5 12:47:17 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 12:47:17 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <472F10D5.5060006@sendu.me.uk> neeti somaiya wrote: > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 decimal places in > perl? Please don't use this list to ask general Perl questions. See these instead: http://perldoc.perl.org/perlfaq4.html http://lists.cpan.org/ http://www.perlmonks.org/ $rounded = sprintf("%.3f", $number); From Marc.Logghe at DEVGEN.com Mon Nov 5 12:39:36 2007 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Mon, 5 Nov 2007 13:39:36 +0100 Subject: [Bioperl-l] perl question References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> Message-ID: <0C528E3670D8CE4B8E013F6749231AA601C3BB80@ANTARESIA.be.devgen.com> Hi, Have a look at http://perldoc.perl.org/functions/sprintf.html#precision%2c-or-maximum-w idth In your particular case: my $f = 3.1232010098; printf "%0.3f", $f; HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > neeti somaiya > Sent: Monday, November 05, 2007 1:20 PM > To: bioperl-l > Subject: [Bioperl-l] perl question > > Again a perl question, and maybe a very trivial one. > How do I terminate a number like 3.1232010098 to only 3 > decimal places in perl? > > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Mon Nov 5 13:24:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 05 Nov 2007 13:24:25 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <20071105124305.GC4491@ebi.ac.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> Message-ID: <472F1989.90105@sendu.me.uk> Andreas Kahari wrote: > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: >> Again a perl question, and maybe a very trivial one. >> How do I terminate a number like 3.1232010098 to only 3 decimal places in >> perl? > > When displaying: > > printf( "The number is %.3f\n", $number ); > > When making a string: > > my $string = sprintf( "%.3f", $number ); > > > BTW, this is cutting, not rounding. (s)printf rounds (ie. doesn't simply truncate), though for critical applications you should use your own rounding algorithm. From ak at ebi.ac.uk Mon Nov 5 13:56:24 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon, 5 Nov 2007 13:56:24 +0000 Subject: [Bioperl-l] perl question In-Reply-To: <472F1989.90105@sendu.me.uk> References: <764978cf0711050420x800b663q1fd94b08f8a4b975@mail.gmail.com> <20071105124305.GC4491@ebi.ac.uk> <472F1989.90105@sendu.me.uk> Message-ID: <20071105135624.GD4491@ebi.ac.uk> On Mon, Nov 05, 2007 at 01:24:25PM +0000, Sendu Bala wrote: > Andreas Kahari wrote: > > On Mon, Nov 05, 2007 at 05:50:04PM +0530, neeti somaiya wrote: > >> Again a perl question, and maybe a very trivial one. > >> How do I terminate a number like 3.1232010098 to only 3 decimal places in > >> perl? > > > > When displaying: > > > > printf( "The number is %.3f\n", $number ); > > > > When making a string: > > > > my $string = sprintf( "%.3f", $number ); > > > > > > BTW, this is cutting, not rounding. > > (s)printf rounds (ie. doesn't simply truncate), though for critical > applications you should use your own rounding algorithm. They do indeed. Mea culpa. Andreas -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------------------------------- From jay at jays.net Mon Nov 5 15:14:17 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 10:14:17 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <8CA2A45C-1F82-47A2-841B-1BA92E1F4466@jays.net> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > To the other devs: shouldn't -nosort be the default behavior when the > split location is a 'join'? I certainly think so. > In other words, should spliced_seq() be > modified to take into account the split location type when returning > sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly > indicates the order of the sequences is important when joined > together; the current behavior is more like that for 'order'. I don't see any value to the sorting algorithm. All tests invoke - nosort => 1 (except a phase test where nosort doesn't matter anyway). In my limited experience the sorting only serves to break real-world splicing. If there is no valid use then we can remove ~20 lines from SeqFeatureI.pm circa line 505. If there is a valid use and someone would be so kind as to educate me I'd be happy to add tests which demonstrate them. :) P.S. CSHL is neato. I plan on understanding some of this stuff some day. :) j http://www.bioperl.org/wiki/User:Jhannah From hlapp at duke.edu Mon Nov 5 16:03:16 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 11:03:16 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: I agree that there should be a meaningful default that results in "doing the right thing" in most cases if the user doesn't intervene. I'm not sure I understand all the details, but it sounds sorting or not sorting should depend on the split location type unless the user overrides it by argument. That's what you're suggesting, right? -hilmar On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > Pass in (-nosort => 1) to spliced_seq: > > print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; > > This ensures no sorting of sublocations occurs, if you want for > instance typical GenBank/EMBL 'join' behavior. > > To the other devs: shouldn't -nosort be the default behavior when > the split location is a 'join'? In other words, should spliced_seq > () be modified to take into account the split location type when > returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' > explicitly indicates the order of the sequences is important when > joined together; the current behavior is more like that for 'order'. > > chris > > On Nov 4, 2007, at 12:39 PM, download on demand wrote: > >> Hi to all. >> >> I have a problem with a simplest script: >> >> >> >> use Bio::SeqIO; >> # get command-line arguments, or die with a usage statement >> my $usage = "x2y.pl infile infileformat outfile >> outfileformat\n"; >> my $infile = shift or die $usage; >> my $infileformat = shift or die $usage; >> # my $outfile = shift or die $usage; >> my $outfileformat = shift or die $usage; >> >> # create one SeqIO object to read in,and another to write >> out >> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >> '-format' => $infileformat); >> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >> '-format' => $outfileformat); >> >> # write each entry in the input file to the output file >> while (my $inseq = $seq_in->next_seq) { >> >> # $seq_out->write_seq($inseq); # Whole sequence not needed >> >> for my $feat_object ($inseq->get_SeqFeatures) >> { >> if ($feat_object->primary_tag eq "CDS") >> { >> print $feat_object->get_tag_values('product'),"\n"; >> print >> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >> print $feat_object->spliced_seq->seq,"\n\n"; >> } >> } >> >> >> >> The result seems OK to me, but in case of first CDS of >> NC_005213.gbk from >> here > Nanoarchaeum_equitans/> the >> output is wrong: >> >> It is: >> hypothetical protein >> 1..490885 >> TAAATGCGATTGCTATTAGAA..................................Truncated >> sequence................................... >> >> Should be: >> hypothetical protein >> 879..490883 >> ATGCGATTGCTATTAGAA...................................Truncated >> sequence....................................TAA >> >> >> >> This CDS have an unnatural location string: >> CDS complement(join(490883..490885,1..879)), but >> spliced_seq >> should handle these things? >> >> Please help me! >> Best regards, N. >> _______________________________________________ >> > > > From bernd.web at gmail.com Mon Nov 5 16:53:01 2007 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 5 Nov 2007 17:53:01 +0100 Subject: [Bioperl-l] PSI-BLAST Message-ID: <716af09c0711050853l23087ac6j9f7d597580b66c46@mail.gmail.com> Hi, Is it possible with SearchIO to select a specific iteration (Results from round i) part of the PSI-blast report, when parsing this with SearchIO::blast? It seems the parser parses the complete report. If not implemented I could of course extract the specific part of the psi-blast report and then give it too SearchIO (e.g. with IO::String), but maybe I am missing a built-in option? Regards, Bernd From jay at jays.net Mon Nov 5 16:54:13 2007 From: jay at jays.net (Jay Hannah) Date: Mon, 5 Nov 2007 11:54:13 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? If someone knows why spliced_seq() should ever sort then I'm suggesting we add a test demonstrating a useful example of that. If no one has a useful example of when you would want spliced_seq() to sort then I'm suggesting we remove the sorting altogether and nosort goes away. I can provide/add many examples where sorting is bad. I do not know of a case where sorting is good. j http://www.bioperl.org/wiki/User:Jhannah From jason at bioperl.org Mon Nov 5 17:07:10 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Nov 2007 12:07:10 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: At one point the location order was not respected/saved I believe. I guess we will just assume the user will build up a SplitLocation in order (i.e. add_SubLocation). I'll try and remember if there were any other particular reasons. -jason On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar > > On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: > >> Pass in (-nosort => 1) to spliced_seq: >> >> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >> >> This ensures no sorting of sublocations occurs, if you want for >> instance typical GenBank/EMBL 'join' behavior. >> >> To the other devs: shouldn't -nosort be the default behavior when >> the split location is a 'join'? In other words, should spliced_seq >> () be modified to take into account the split location type when >> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >> explicitly indicates the order of the sequences is important when >> joined together; the current behavior is more like that for 'order'. >> >> chris >> >> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >> >>> Hi to all. >>> >>> I have a problem with a simplest script: >>> >>> >>> >>> use Bio::SeqIO; >>> # get command-line arguments, or die with a usage statement >>> my $usage = "x2y.pl infile infileformat outfile >>> outfileformat\n"; >>> my $infile = shift or die $usage; >>> my $infileformat = shift or die $usage; >>> # my $outfile = shift or die $usage; >>> my $outfileformat = shift or die $usage; >>> >>> # create one SeqIO object to read in,and another to write >>> out >>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>> '-format' => $infileformat); >>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>> '-format' => $outfileformat); >>> >>> # write each entry in the input file to the output file >>> while (my $inseq = $seq_in->next_seq) { >>> >>> # $seq_out->write_seq($inseq); # Whole sequence not >>> needed >>> >>> for my $feat_object ($inseq->get_SeqFeatures) >>> { >>> if ($feat_object->primary_tag eq "CDS") >>> { >>> print $feat_object->get_tag_values('product'),"\n"; >>> print >>> $feat_object->location->start,"..",$feat_object->location->end,"\n"; >>> print $feat_object->spliced_seq->seq,"\n\n"; >>> } >>> } >>> >>> >>> >>> The result seems OK to me, but in case of first CDS of >>> NC_005213.gbk from >>> here >> Nanoarchaeum_equitans/> the >>> output is wrong: >>> >>> It is: >>> hypothetical protein >>> 1..490885 >>> TAAATGCGATTGCTATTAGAA..................................Truncated >>> sequence................................... >>> >>> Should be: >>> hypothetical protein >>> 879..490883 >>> ATGCGATTGCTATTAGAA...................................Truncated >>> sequence....................................TAA >>> >>> >>> >>> This CDS have an unnatural location string: >>> CDS complement(join(490883..490885,1..879)), but >>> spliced_seq >>> should handle these things? >>> >>> Please help me! >>> Best regards, N. >>> _______________________________________________ >>> >> >> >> > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Mon Nov 5 17:16:10 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:16:10 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Yes, we would sort based on the splittype() and default to a particular behavior ('join') if one isn't designated, maybe with a warning indicating the splittype() isn't defined. Using an 'order' or other defined types could also delineate a default sort/nosort behavior (probably the previous as it would replicate prior behavior). chris On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > I agree that there should be a meaningful default that results in > "doing the right thing" in most cases if the user doesn't intervene. > I'm not sure I understand all the details, but it sounds sorting or > not sorting should depend on the split location type unless the user > overrides it by argument. That's what you're suggesting, right? > > -hilmar From cjfields at uiuc.edu Mon Nov 5 17:20:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:20:35 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <70023491-3549-428D-9E5C-32275A33FF20@uiuc.edu> On Nov 5, 2007, at 10:54 AM, Jay Hannah wrote: > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? > > If someone knows why spliced_seq() should ever sort then I'm > suggesting we add a test demonstrating a useful example of that. > > If no one has a useful example of when you would want spliced_seq() > to sort then I'm suggesting we remove the sorting altogether and > nosort goes away. > > I can provide/add many examples where sorting is bad. I do not know > of a case where sorting is good. > > j > http://www.bioperl.org/wiki/User:Jhannah The behavior would be based on the current use of 'join', 'order', and 'bond' (the latter in GenPept records). I documented some cases here a while back: http://www.bioperl.org/wiki/BioPerl_Locations#Split chris From hlapp at duke.edu Mon Nov 5 17:32:24 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 12:32:24 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> <69AE79C0-3775-4AAC-B846-AA0611C44EAB@uiuc.edu> Message-ID: <13919657-0446-4821-9EE4-FD07C995C734@duke.edu> Sounds good to me. -hilmar On Nov 5, 2007, at 12:16 PM, Chris Fields wrote: > Yes, we would sort based on the splittype() and default to a > particular behavior ('join') if one isn't designated, maybe with a > warning indicating the splittype() isn't defined. Using an 'order' > or other defined types could also delineate a default sort/nosort > behavior (probably the previous as it would replicate prior behavior). > > chris > > On Nov 5, 2007, at 10:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 17:41:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 11:41:27 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: It may have something to do with remote locations or setting strand() in sublocations. This may have popped up in relation to a LocationI code audit I proposed a while back on the list which I never got around to. Oh well... I at least managed getting a wiki page started in case we decided to make changes, with the intention of making it a HOWTO at some point: http://www.bioperl.org/wiki/BioPerl_Locations If we go through with the changes to spliced_seq(), should it be implemented for inclusion in v1.6 or wait until v1.7? chris On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote: > > At one point the location order was not respected/saved I believe. > I guess we will just assume the user will build up a SplitLocation > in order (i.e. add_SubLocation). I'll try and remember if there > were any other particular reasons. > > > -jason > On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote: > >> I agree that there should be a meaningful default that results in >> "doing the right thing" in most cases if the user doesn't intervene. >> I'm not sure I understand all the details, but it sounds sorting or >> not sorting should depend on the split location type unless the user >> overrides it by argument. That's what you're suggesting, right? >> >> -hilmar >> >> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote: >> >>> Pass in (-nosort => 1) to spliced_seq: >>> >>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n"; >>> >>> This ensures no sorting of sublocations occurs, if you want for >>> instance typical GenBank/EMBL 'join' behavior. >>> >>> To the other devs: shouldn't -nosort be the default behavior when >>> the split location is a 'join'? In other words, should spliced_seq >>> () be modified to take into account the split location type when >>> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' >>> explicitly indicates the order of the sequences is important when >>> joined together; the current behavior is more like that for 'order'. >>> >>> chris >>> >>> On Nov 4, 2007, at 12:39 PM, download on demand wrote: >>> >>>> Hi to all. >>>> >>>> I have a problem with a simplest script: >>>> >>>> >>>> >>>> use Bio::SeqIO; >>>> # get command-line arguments, or die with a usage >>>> statement >>>> my $usage = "x2y.pl infile infileformat outfile >>>> outfileformat\n"; >>>> my $infile = shift or die $usage; >>>> my $infileformat = shift or die $usage; >>>> # my $outfile = shift or die $usage; >>>> my $outfileformat = shift or die $usage; >>>> >>>> # create one SeqIO object to read in,and another to write >>>> out >>>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile", >>>> '-format' => $infileformat); >>>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT, >>>> '-format' => >>>> $outfileformat); >>>> >>>> # write each entry in the input file to the output file >>>> while (my $inseq = $seq_in->next_seq) { >>>> >>>> # $seq_out->write_seq($inseq); # Whole sequence not >>>> needed >>>> >>>> for my $feat_object ($inseq->get_SeqFeatures) >>>> { >>>> if ($feat_object->primary_tag eq "CDS") >>>> { >>>> print $feat_object->get_tag_values('product'),"\n"; >>>> print >>>> $feat_object->location->start,"..",$feat_object->location- >>>> >end,"\n"; >>>> print $feat_object->spliced_seq->seq,"\n\n"; >>>> } >>>> } >>>> >>>> >>>> >>>> The result seems OK to me, but in case of first CDS of >>>> NC_005213.gbk from >>>> here >>> Nanoarchaeum_equitans/> the >>>> output is wrong: >>>> >>>> It is: >>>> hypothetical protein >>>> 1..490885 >>>> TAAATGCGATTGCTATTAGAA..................................Truncated >>>> sequence................................... >>>> >>>> Should be: >>>> hypothetical protein >>>> 879..490883 >>>> ATGCGATTGCTATTAGAA...................................Truncated >>>> sequence....................................TAA >>>> >>>> >>>> >>>> This CDS have an unnatural location string: >>>> CDS complement(join(490883..490885,1..879)), but >>>> spliced_seq >>>> should handle these things? >>>> >>>> Please help me! >>>> Best regards, N. >>>> _______________________________________________ >>>> >>> >>> >>> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Mon Nov 5 16:05:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 05 Nov 2007 12:05:41 -0400 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: <472ED3CC.2050305@univ-brest.fr> Message-ID: Jean-luc, >From what you written it sounds like you're using bash and not some other shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file in your home directory, as well as a .ncbirc file. This should work. I'm no Unix expert but I've always configured tcsh on the Mac in the same ways I'd configure it on Linux machines. Similarly, if you're using bash then it will read its .bashrc file, regardless of what flavor of Unix you use (and the same thing holds true for zsh or csh or ...). Brian O. On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > Dear Bioperl and Mac users, > > I am a Mac user and would like to run a script I made using > Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate > to Bioperl the pathway to Blastall and other executables. > > I read carefully the following link > http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the > path to Blast, but I guess the way to proceed is slightly different in Mac and > that I should not create .ncbirc and .bashrc files (e.g. should I modify the > .profile file instead of .bashrc?) > > Actually, my blast file is in myname directory and comprises a /bin and a > /data file. I have got my blastall and other executables in > myname/blast/bin/blastall. > > Thank you in anticipation for your help. > > Jean-Luc > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Mon Nov 5 18:35:56 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 05 Nov 2007 12:35:56 -0600 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall In-Reply-To: References: Message-ID: <472F628C.2000506@campus.iztacala.unam.mx> If the ~/.bashrc file doesn't work for you, try renaming it to ~/.bash_profile and re-login, that might work best. ~/.bashrc works as an individual per-interactive-shell startup file, whereas ~/.bash_profile is a personal initialization file, executed for login shells. Hope this helps. Regards, Mauricio. Brian Osborne wrote: > Jean-luc, > >>From what you written it sounds like you're using bash and not some other > shell (e.g. tcsh, csh), right? If that's the case then create a .bashrc file > in your home directory, as well as a .ncbirc file. This should work. > > I'm no Unix expert but I've always configured tcsh on the Mac in the same > ways I'd configure it on Linux machines. Similarly, if you're using bash > then it will read its .bashrc file, regardless of what flavor of Unix you > use (and the same thing holds true for zsh or csh or ...). > > Brian O. > > > On 11/5/07 4:26 AM, "Jean-luc Jany" wrote: > >> Dear Bioperl and Mac users, >> >> I am a Mac user and would like to run a script I made using >> Bio::Tools::Run::StandAloneBlast. Unfortunately, I did not manage to indicate >> to Bioperl the pathway to Blastall and other executables. >> >> I read carefully the following link >> http://www.bioperl.org/wiki/HOWTO:StandAloneBlast and tried to indicate the >> path to Blast, but I guess the way to proceed is slightly different in Mac and >> that I should not create .ncbirc and .bashrc files (e.g. should I modify the >> .profile file instead of .bashrc?) >> >> Actually, my blast file is in myname directory and comprises a /bin and a >> /data file. I have got my blastall and other executables in >> myname/blast/bin/blastall. >> >> Thank you in anticipation for your help. >> >> Jean-Luc >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From hlapp at duke.edu Mon Nov 5 21:04:11 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Mon, 5 Nov 2007 16:04:11 -0500 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > If we go through with the changes to spliced_seq(), should it be > implemented for inclusion in v1.6 or wait until v1.7? I would say they should be implemented ASAP because they 1) should not change behavior for those for which the current default behavior was already broken (and who therefore pass in --no_sort), and 2) fix the behavior for those who erroneously assumed that the code was going to do the right thing by default. I.e., it sounds mostly like a bugfix to me. Am I overlooking something? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== From cjfields at uiuc.edu Mon Nov 5 22:12:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Nov 2007 16:12:23 -0600 Subject: [Bioperl-l] Help with Bio::SeqIO In-Reply-To: References: <923c9ce30711041039q3f718911r63eaa5a093226df2@mail.gmail.com> <8543B6EA-7D37-4D59-B22F-01D34BA9C13D@uiuc.edu> Message-ID: <980977BB-72C3-401A-848F-AEF2E602E4BE@uiuc.edu> On Nov 5, 2007, at 3:04 PM, Hilmar Lapp wrote: > > On Nov 5, 2007, at 12:41 PM, Chris Fields wrote: > >> If we go through with the changes to spliced_seq(), should it be >> implemented for inclusion in v1.6 or wait until v1.7? > > I would say they should be implemented ASAP because they 1) should > not change behavior for those for which the current default > behavior was already broken (and who therefore pass in --no_sort), > and 2) fix the behavior for those who erroneously assumed that the > code was going to do the right thing by default. > > I.e., it sounds mostly like a bugfix to me. Am I overlooking > something? > > -hilmar > -- Okay; I'll try to get this in soon. chris From jean-luc.jany at univ-brest.fr Tue Nov 6 09:00:07 2007 From: jean-luc.jany at univ-brest.fr (Jean-luc Jany) Date: Tue, 06 Nov 2007 10:00:07 +0100 Subject: [Bioperl-l] Bioperl + standalone blast on Mac= cannot find path to blastall Message-ID: <47302D17.2030500@univ-brest.fr> Thanks Brian. Yes I use bash. I am going to follow your advice as soon as possible (for some reasons I am unable to run bioperl) and come back to you to tell you if it runs. Jean-Luc From jason at bioperl.org Tue Nov 6 21:18:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 16:18:35 -0500 Subject: [Bioperl-l] lightweight sequence features Message-ID: I started a branch for implementing and playing with lightweight feature object. The branch is called 'lightweight_feature_branch'. Right now it is about 70% faster just in object creation based on parsing features using Bio::Tools::GFF and swapping the types of features that are created. It uses arrays instead of hashes under the hood. So the objects don't have locations under the hood. My hope is if this works okay we could use it for creating objects where we KNOW the underlying features have simple locations so such as parsing in GFF data. -jason -- Jason Stajich jason at bioperl.org From cjfields at uiuc.edu Tue Nov 6 21:57:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Nov 2007 15:57:17 -0600 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: References: Message-ID: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Bravo! I once benchmarked Location instance creation once and found it contributed quite a bit of overhead so the speedup with that and the use of arrays makes quite a bit of sense to me. You mention only simple locations; I'm guessing this doesn't handle 'fuzzy' ends? If it did I could see layering the feature data from the get-go, so it could be used just about anywhere in the place of SF::Generic. Maybe something to test out in 1.7? chris On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > I started a branch for implementing and playing with lightweight > feature object. The branch is called 'lightweight_feature_branch'. > > Right now it is about 70% faster just in object creation based on > parsing features using Bio::Tools::GFF and swapping the types of > features that are created. It uses arrays instead of hashes under > the hood. > > So the objects don't have locations under the hood. My hope is if > this works okay we could use it for creating objects where we KNOW > the underlying features have simple locations so such as parsing in > GFF data. > > -jason > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Wed Nov 7 04:14:55 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 6 Nov 2007 23:14:55 -0500 Subject: [Bioperl-l] lightweight sequence features In-Reply-To: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> References: <5E209F80-2A49-4D6B-A621-04B27AF91D5D@uiuc.edu> Message-ID: Right - only for simple locations. I've got a bunch more tests and fixes to put in. I am hoping this can be fast replacement in the case where we're dealing with this "unflattened" data (i.e. GFF in FeatureIO & Gbrowse). This is sort of a playground until I feel like it can really get it tested a bit more. I'll give an all clear when the dust settles in terms of the design if anyone wants to play/help. -jason On Nov 6, 2007, at 4:57 PM, Chris Fields wrote: > Bravo! I once benchmarked Location instance creation once and > found it contributed quite a bit of overhead so the speedup with > that and the use of arrays makes quite a bit of sense to me. > > You mention only simple locations; I'm guessing this doesn't handle > 'fuzzy' ends? If it did I could see layering the feature data from > the get-go, so it could be used just about anywhere in the place of > SF::Generic. Maybe something to test out in 1.7? > > chris > > On Nov 6, 2007, at 3:18 PM, Jason Stajich wrote: > >> I started a branch for implementing and playing with lightweight >> feature object. The branch is called 'lightweight_feature_branch'. >> >> Right now it is about 70% faster just in object creation based on >> parsing features using Bio::Tools::GFF and swapping the types of >> features that are created. It uses arrays instead of hashes under >> the hood. >> >> So the objects don't have locations under the hood. My hope is if >> this works okay we could use it for creating objects where we KNOW >> the underlying features have simple locations so such as parsing in >> GFF data. >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From heikki at sanbi.ac.za Wed Nov 7 10:05:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 7 Nov 2007 12:05:59 +0200 Subject: [Bioperl-l] Bio::Tools::Run::Mdust Message-ID: <200711071205.59576.heikki@sanbi.ac.za> Hi Donald, I started using your Mdust module in bioperl-run and run into problems immediately. * Only Bio::Seq objects are accepted but not Bio::PrimarySeq objects, although the docs say otherwise * Sequences are modified in place. That is really bad, because that means that the user has to know to create a copy before running Mdust on it. * The docs say that you have to set MDUSTDIR envvar to tell the program where to find the binary. That is actually optional if the binary is on your path. * The tests do not cover any of the options to the program As a quick fix, I suggest that we: * leave the current way of working for Bio::SeqI objects: sequence string is not masked but seqfeatures to that effect are added * Modify run() to return the new masked sequence object when the target is a Bio::PrimarySeqI. * fix the documentation After that it will be possible to simply write: use Bio::Tools::Run::Mdust; $mdust = Bio::Tools::Run::Mdust->new(); $seq_dusted = $m->run($seq); # $seq->isa(PrimarySeqI); Are you happy for me to do this or do you want to do it yourself? Yours, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho _/_/_/_/_/ heikki at_sanbi _ac _za skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Kevin.M.Brown at asu.edu Wed Nov 7 18:04:50 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 7 Nov 2007 11:04:50 -0700 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> I installed bioperl-ext from CVS, but can't figure out what else is missing to utilize Bio::Tools::pSW. The error I get from the example script in the wiki is: The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. Compilation failed in require at ./align_test.pl line 3. BEGIN failed--compilation aborted at ./align_test.pl line 3. In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called Align, but no Align.pm file. I followed the directions in the wiki to install 1.5.2_102 (think I had _100 installed previously). Any thoughts on what I'm missing? From jason at bioperl.org Wed Nov 7 19:52:16 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 14:52:16 -0500 Subject: [Bioperl-l] (no subject) Message-ID: The array-based Bio::SeqFeature::Slim is only about 7% faster than Bio::Graphics::Feature so I suspect most of the speedup comes from removing location objects. Generic 6.75 -- -37% -41% GraphicsF 4.26 58% -- -7% Slim 3.98 70% 7% -- this is using code on the lightweight_feature_branch so cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r lightweight_feature_branch -d core_lwf bioperl-live http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl and the GFF3 file I used to parse http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 -jason From lstein at cshl.edu Wed Nov 7 20:04:24 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Nov 2007 15:04:24 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: References: Message-ID: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> I wonder if it is worth moving to the array-based version more generally, then. How does the array based feature object deal with tags? Lincoln On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > The array-based Bio::SeqFeature::Slim is only about 7% faster than > Bio::Graphics::Feature so I suspect most of the speedup comes from removing > location objects. > > Generic 6.75 -- -37% -41% > GraphicsF 4.26 58% -- -7% > Slim 3.98 70% 7% -- > > this is using code on the lightweight_feature_branch so > cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r > lightweight_feature_branch -d core_lwf bioperl-live > > http://jason.open-bio.org/~jason/bioperl/seqfeature_speed.pl > and the GFF3 file I used to parse > http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2 > > -jason > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason at bioperl.org Wed Nov 7 20:09:35 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Nov 2007 15:09:35 -0500 Subject: [Bioperl-l] (no subject) In-Reply-To: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> Message-ID: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> It uses hashes there so technically it is not entirely array based. -jason On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > I wonder if it is worth moving to the array-based version more > generally, > then. > > How does the array based feature object deal with tags? > > Lincoln > > On Nov 7, 2007 2:52 PM, Jason Stajich wrote: > >> The array-based Bio::SeqFeature::Slim is only about 7% faster than >> Bio::Graphics::Feature so I suspect most of the speedup comes from >> removing >> location objects. >> >> Generic 6.75 -- -37% -41% >> GraphicsF 4.26 58% -- -7% >> Slim 3.98 70% 7% -- >> >> this is using code on the lightweight_feature_branch so >> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >> lightweight_feature_branch -d core_lwf bioperl-live >> >> http://jason.open-bio.org/~jason/bioperl/ >> seqfeature_speed.pl> seqfeature_speed.pl> >> and the GFF3 file I used to parse >> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >> >> -jason >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Nov 7 21:12:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 15:12:35 -0600 Subject: [Bioperl-l] (no subject) In-Reply-To: <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> Message-ID: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> I can see preferring a lightweight simple SF over SF::Generic in the next BioPerl dev cycle. I guess we would just layer split locations as simple sub-features/segments, typing when necessary? That shouldn't be much more overhead than creating a layered Location::Split. chris On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > It uses hashes there so technically it is not entirely array based. > > -jason > On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: > >> I wonder if it is worth moving to the array-based version more >> generally, >> then. >> >> How does the array based feature object deal with tags? >> >> Lincoln >> >> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >> >>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>> removing >>> location objects. >>> >>> Generic 6.75 -- -37% -41% >>> GraphicsF 4.26 58% -- -7% >>> Slim 3.98 70% 7% -- >>> >>> this is using code on the lightweight_feature_branch so >>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>> lightweight_feature_branch -d core_lwf bioperl-live >>> >>> http://jason.open-bio.org/~jason/bioperl/ >>> seqfeature_speed.pl>> seqfeature_speed.pl> >>> and the GFF3 file I used to parse >>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>> >>> -jason >>> >> >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Wed Nov 7 23:19:15 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 7 Nov 2007 18:19:15 -0500 Subject: [Bioperl-l] lightweight features In-Reply-To: <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: It seems to me that there are applications where you're dealing with a huge number of features (such as GFF) and where therefore a lightweight object makes tremendous sense. But when you parse a genbank file, I'm not sure that's the bottleneck, unless maybe it's a large contig with lots of feature annotations. I guess we'll ultimately want a way to control the type of feature being instantiated by a parser, e..g using a factory. -hilmar On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > I can see preferring a lightweight simple SF over SF::Generic in the > next BioPerl dev cycle. I guess we would just layer split locations > as simple sub-features/segments, typing when necessary? That > shouldn't be much more overhead than creating a layered > Location::Split. > > chris > > On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: > >> It uses hashes there so technically it is not entirely array based. >> >> -jason >> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >> >>> I wonder if it is worth moving to the array-based version more >>> generally, >>> then. >>> >>> How does the array based feature object deal with tags? >>> >>> Lincoln >>> >>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>> >>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>> removing >>>> location objects. >>>> >>>> Generic 6.75 -- -37% -41% >>>> GraphicsF 4.26 58% -- -7% >>>> Slim 3.98 70% 7% -- >>>> >>>> this is using code on the lightweight_feature_branch so >>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl co -r >>>> lightweight_feature_branch -d core_lwf bioperl-live >>>> >>>> http://jason.open-bio.org/~jason/bioperl/ >>>> seqfeature_speed.pl>>> seqfeature_speed.pl> >>>> and the GFF3 file I used to parse >>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>> >>>> -jason >>>> >>> >>> >>> >>> -- >>> Lincoln D. Stein >>> Cold Spring Harbor Laboratory >>> 1 Bungtown Road >>> Cold Spring Harbor, NY 11724 >>> (516) 367-8380 (voice) >>> (516) 367-8389 (fax) >>> FOR URGENT MESSAGES & SCHEDULING, >>> PLEASE CONTACT MY ASSISTANT, >>> SANDRA MICHELSEN, AT michelse at cshl.edu >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Nov 8 01:04:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 19:04:05 -0600 Subject: [Bioperl-l] lightweight features In-Reply-To: References: <6dce9a0b0711071204g7588f268ye80dbd6c97ca0d6f@mail.gmail.com> <494D22CA-D8B4-4DD0-B2C5-CEEA09BC1C34@bioperl.org> <219BE0EA-1272-4E78-810C-A8E81674B38C@uiuc.edu> Message-ID: I'm also thinking a factory is a good possibility; maybe something to take the place of FTHelper. chris On Nov 7, 2007, at 5:19 PM, Hilmar Lapp wrote: > It seems to me that there are applications where you're dealing with > a huge number of features (such as GFF) and where therefore a > lightweight object makes tremendous sense. But when you parse a > genbank file, I'm not sure that's the bottleneck, unless maybe it's a > large contig with lots of feature annotations. > > I guess we'll ultimately want a way to control the type of feature > being instantiated by a parser, e..g using a factory. > > -hilmar > > On Nov 7, 2007, at 4:12 PM, Chris Fields wrote: > >> I can see preferring a lightweight simple SF over SF::Generic in the >> next BioPerl dev cycle. I guess we would just layer split locations >> as simple sub-features/segments, typing when necessary? That >> shouldn't be much more overhead than creating a layered >> Location::Split. >> >> chris >> >> On Nov 7, 2007, at 2:09 PM, Jason Stajich wrote: >> >>> It uses hashes there so technically it is not entirely array based. >>> >>> -jason >>> On Nov 7, 2007, at 3:04 PM, Lincoln Stein wrote: >>> >>>> I wonder if it is worth moving to the array-based version more >>>> generally, >>>> then. >>>> >>>> How does the array based feature object deal with tags? >>>> >>>> Lincoln >>>> >>>> On Nov 7, 2007 2:52 PM, Jason Stajich wrote: >>>> >>>>> The array-based Bio::SeqFeature::Slim is only about 7% faster than >>>>> Bio::Graphics::Feature so I suspect most of the speedup comes from >>>>> removing >>>>> location objects. >>>>> >>>>> Generic 6.75 -- -37% -41% >>>>> GraphicsF 4.26 58% -- -7% >>>>> Slim 3.98 70% 7% -- >>>>> >>>>> this is using code on the lightweight_feature_branch so >>>>> cvs -d:ext:USERNAME at pub.open-bio.org:/home/repository/bioperl >>>>> co -r >>>>> lightweight_feature_branch -d core_lwf bioperl-live >>>>> >>>>> http://jason.open-bio.org/~jason/bioperl/ >>>>> seqfeature_speed.pl>>>> seqfeature_speed.pl> >>>>> and the GFF3 file I used to parse >>>>> http://jason.open-bio.org/~jason/bioperl/sgd.gff3.bz2>>>> jason.open-bio.org/%7Ejason/bioperl/sgd.gff3.bz2> >>>>> >>>>> -jason >>>>> >>>> >>>> >>>> >>>> -- >>>> Lincoln D. Stein >>>> Cold Spring Harbor Laboratory >>>> 1 Bungtown Road >>>> Cold Spring Harbor, NY 11724 >>>> (516) 367-8380 (voice) >>>> (516) 367-8389 (fax) >>>> FOR URGENT MESSAGES & SCHEDULING, >>>> PLEASE CONTACT MY ASSISTANT, >>>> SANDRA MICHELSEN, AT michelse at cshl.edu >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Nov 8 04:45:26 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Nov 2007 22:45:26 -0600 Subject: [Bioperl-l] test please ignore Message-ID: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> From cjfields at uiuc.edu Thu Nov 8 15:50:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Nov 2007 09:50:02 -0600 Subject: [Bioperl-l] test please ignore In-Reply-To: <47332534.5090205@bms.com> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> <47332534.5090205@bms.com> Message-ID: And respond back! Just checking the mail list; the open-bio wiki pages were down last night. chris On Nov 8, 2007, at 9:03 AM, Stefan Kirov wrote: > Chris Fields wrote: >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > This is the best way to make everyone open this e-mail ;-) > Stefan From stefan.kirov at bms.com Thu Nov 8 15:03:16 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Thu, 08 Nov 2007 10:03:16 -0500 Subject: [Bioperl-l] test please ignore In-Reply-To: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> References: <6F8F6A4C-6A2D-4322-843B-90288D700156@uiuc.edu> Message-ID: <47332534.5090205@bms.com> Chris Fields wrote: > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > This is the best way to make everyone open this e-mail ;-) Stefan From Kevin.M.Brown at asu.edu Thu Nov 8 22:30:24 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 8 Nov 2007 15:30:24 -0700 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <20071108003638.GA5892@eniac.jgi-psf.org> References: <1A4207F8295607498283FE9E93B775B403F7F6FE@EX02.asurite.ad.asu.edu> <20071108003638.GA5892@eniac.jgi-psf.org> Message-ID: <1A4207F8295607498283FE9E93B775B403F7F9D3@EX02.asurite.ad.asu.edu> OK, found the issue. For whatever reason the Align.pm file is inside the Align folder and so the package name and path don't match up once it is installed. This would cause it to have a name of "Bio::Ext::Align::Align" instead of "Bio::Ext::Align". Not sure why this wasn't caught when I did "perl Makefile.pl && make && make test && make install" > -----Original Message----- > From: Joel Martin [mailto:j_martin at lbl.gov] > Sent: Wednesday, November 07, 2007 5:37 PM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Ext::Align? > > Hello, > Might be a side effect of fixing the other bioperl-ext package, > what steps exactly did this entail: > > > I installed bioperl-ext from CVS, > > ? > > you can probably bypass it at the moment by doing this after > unpacking the > bioperl-ext package > > cd Bio/Ext/Align > perl Makefile.PL > make > make test > make install > > and > > cd Bio/Ext/HMM > perl Makefile.PL > make > make test > make install > > Joel > > but can't figure out what else is > > missing to utilize Bio::Tools::pSW. The error I get from > the example > > script in the wiki is: > > > > The C-compiled engine for Smith Waterman alignments > (Bio::Ext::Align) > > has not been installed. > > Please read the install the bioperl-ext package > > > > BEGIN failed--compilation aborted at > > /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/pSW.pm line 128. > > Compilation failed in require at ./align_test.pl line 3. > > BEGIN failed--compilation aborted at ./align_test.pl line 3. > > > > In /usr/lib/perl5/site_perl/5.8.5/Bio/Ext there is a folder called > > Align, but no Align.pm file. > > > > I followed the directions in the wiki to install 1.5.2_102 > (think I had > > _100 installed previously). Any thoughts on what I'm missing? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From akarger at CGR.Harvard.edu Fri Nov 9 14:53:02 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Fri, 9 Nov 2007 09:53:02 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? Message-ID: When I tblastn ENSP00000349467 against the human genome, I get a few hits on chr10, among which are: Score = 192 bits (487), Expect(2) = 5e-64 Identities = 99/109 (90%), Positives = 99/109 (90%) Frame = +2 Query: 40 LGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNG 99 L QNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRE F VFDKDGNG Sbjct: 71593562 LRQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIRETFCVFDKDGNG 71593741 Query: 100 YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTA 148 YIS EL HVMTNLG KLTDEEVD MIREAD DGDGQVNY EFVQMMTA Sbjct: 71593742 YISGVELHHVMTNLGVKLTDEEVD*MIREADPDGDGQVNY-EFVQMMTA 71593885 Score = 75.1 bits (183), Expect(2) = 5e-64 Identities = 36/43 (83%), Positives = 39/43 (90%) Frame = +1 Query: 1 MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQN 43 MADQLTEEQI EFKE FSLFDKDGDGTITTK+LGTVMRS ++ Sbjct: 71593447 MADQLTEEQIVEFKEVFSLFDKDGDGTITTKKLGTVMRSQAES 71593575 As you can see from Sbjct lines, these two hits are basically contiguous. I was surprised to see that the bit scores and identities and alignment lengths here are totally different but the expectation values are identical. After a bit of grepping in the BLAST source, I found reference to "sum segments" and "a collection [of] multiple distinct alignments with asymmetric gaps between the alignments" and decided it was time to cry for help. When does BLAST decide that two or more alignments belong "together" and how does the affect the evalue? Is the evalue really showing how good those two alignments combined are, despite the frame shift? (It so happens that that's what I want.) And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) Thanks, - Amir Karger Research Computing Life Sciences Division Harvard University From cjfields at uiuc.edu Fri Nov 9 17:58:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Nov 2007 11:58:16 -0600 Subject: [Bioperl-l] GFF3loader and indexing Message-ID: <77845E27-1327-43DD-BA45-222C071217D7@uiuc.edu> Quick question: shouldn't the new Index attribute be passed on to seqfeatures by DB::SeqFeature::Store::GFF3Loader for round-tripping purposes (for instance, properly reloading dumped gff3 data)? I'm testing out a feature editor using volvox.gff3 data in GBrowse and the mRNA features appear to drop this attribute once loaded: Original data: ctgA example gene 1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN. 1;Note=Eden splice form 1;Index=1 ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.1 partial gff3_string(1) output: ctgA example gene 1050 9000 . + . Name=EDEN;ID=50;Alias=EDEN;Note=protein kinase ctgA example mRNA 1050 9000 . + . Name=EDEN. 1;Parent=50;ID=51;Alias=EDEN.1;Note=Eden splice form 1 ctgA example five_prime_UTR 1050 1200 . + . Parent=51;ID=52 ... chris From David.Messina at sbc.su.se Sat Nov 10 11:04:25 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 10 Nov 2007 12:04:25 +0100 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: Message-ID: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave From sac at bioperl.org Sat Nov 10 22:59:28 2007 From: sac at bioperl.org (Steve Chervitz) Date: Sat, 10 Nov 2007 14:59:28 -0800 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> The Bioperl blast parser should extract that value and you can obtain it from an HSP object, via the HSPI::n() method, documented here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/HSP/HSPI.html#POD23 Dave's basically correct in his explanation. It's a result of the application of sum statistics by the blast algorithm. You can read all about it in Korf et al's BLAST book. Here's the relevant section: http://books.google.com/books?id=xvcnhDG9fNUC&pg=PA102&lpg=PA102&dq=blast+sum+statistics&source=web&ots=WIudsJGaCk&sig=v66X3wRLEHvpTLUD36AE5DGpPBY#PPA102,M1 Steve On Nov 10, 2007 3:04 AM, Dave Messina wrote: > Hi Amir, > > I don't have my BLAST book handy, and my memory is a little fuzzy, but I > think the Expect(2) you're seeing is the E-value based on both HSPs > combined. And I think this is why you see the same Expect value for both -- > because it is shared between them (which sounds like what you wanted). > > Again, this is just from memory, but I think this is an option that has to > be turned on rather than something which Blast decides to do on its own. > > > I don't know whether BioPerl reports this or not. Would you mind e-mailing > me a entire BLAST report as a sample? When I have some time I'd like to play > around with this a bit. > > Thanks, > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Tue Nov 13 11:57:04 2007 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 13 Nov 2007 12:57:04 +0100 Subject: [Bioperl-l] Panel link Message-ID: <716af09c0711130357n4ba72901lf2236ddfd853c945@mail.gmail.com> Hi, Is it possible with Panel to provide javascript event handlers? With -link we can provide hrefs as: -link => 'http://www.google.com/search?q=$description' or use a coderef that returns a href. However, I'd like to set-up links as: Is this possible by default with Panel? Regards, Bernd From akarger at CGR.Harvard.edu Tue Nov 13 17:12:32 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:12:32 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> Message-ID: Thanks for the reply. I'm curious as to how BLAST decides to do this, but not curious enough to buy the BLAST book. If you want to see this, you could just tblastn the ENSP00000349467 sequence vs. the genome: MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG NGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDE EVDEMIREADIDGDGQVNYEEFVQMMTAK against the human genome at NCBI or locally. I've attached the tblastn report for that protein, which includes the results I quoted. (It was done as part of a blast of 150 proteins vs. the genome.) -Amir ________________________________ From: dave at davemessina.com [mailto:dave at davemessina.com] On Behalf Of Dave Messina Sent: Saturday, November 10, 2007 6:04 AM To: Amir Karger Cc: bioperl-l Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? Hi Amir, I don't have my BLAST book handy, and my memory is a little fuzzy, but I think the Expect(2) you're seeing is the E-value based on both HSPs combined. And I think this is why you see the same Expect value for both -- because it is shared between them (which sounds like what you wanted). Again, this is just from memory, but I think this is an option that has to be turned on rather than something which Blast decides to do on its own. I don't know whether BioPerl reports this or not. Would you mind e-mailing me a entire BLAST report as a sample? When I have some time I'd like to play around with this a bit. Thanks, Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: ENSP00000349467_tblastn.txt.gz Type: application/x-gzip Size: 9755 bytes Desc: ENSP00000349467_tblastn.txt.gz URL: From akarger at CGR.Harvard.edu Tue Nov 13 17:30:52 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue, 13 Nov 2007 12:30:52 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: > From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > Of Steve Chervitz > > The Bioperl blast parser should extract that value and you can obtain > it from an HSP object, via the HSPI::n() method, documented here: > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B io/Search/HSP/HSPI.html#POD23 As I mentioned in my email: And does anyone know off-hand if Bioperl will tell me when situations like this happen? I thought the Bio::Search::HSP::BlastHSP::n subroutine would help, but I just get a bunch of empty strings for that, whether or not there's a (2) in the Expect string. (hsp->n is empty, hsp->{"_n"} is undef.) And the docs for n() actually say, "This value is not defined with NCBI Blast2 with gapping" although they don't say why. Which may explain why, when I ran the following code on the blast result I included in my last email, I got empty values for all of the n's. (Why is n() undefined for gapped blast if I'm getting n's in my results from that blast?) use warnings; use strict; use Bio::SearchIO; my $blast_out = $ARGV[0]; my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_out, -report_type => 'tblastn'); print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N Evalue)), "\n"; while(my $query = $in->next_result) { while(my $subject = $query->next_hit) { while (my $hsp = $subject->next_hsp) { print join("\t", $query->query_name, $hsp->start("query"), $hsp->end("query"), $hsp->strand("hit"), $subject->name, $hsp->start("hit"), $hsp->end("hit"), $subject->frame, $hsp->n, $hsp->evalue, ),"\n"; } } } > Dave's basically correct in his explanation. It's a result of the > application of sum statistics by the blast algorithm. You can read all > about it in Korf et al's BLAST book. Here's the relevant section: [snip] Thanks, -Amir From cjfields at uiuc.edu Tue Nov 13 17:42:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Nov 2007 11:42:07 -0600 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: References: <628aabb70711100304l2af828e3lf9bb257177769845@mail.gmail.com> <8f200b4c0711101459q4ef7c978n8ce44e2903b8dfd3@mail.gmail.com> Message-ID: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Amir, Can you file this as a bug? Dave mentioned he would look into it but I think it warrants tracking to make sure it gets fixed: http://www.bioperl.org/wiki/Bugs Attach the example BLAST report from your last post to the report. BTW, I wonder how this appears in XML output? chris On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf >> Of Steve Chervitz >> >> The Bioperl blast parser should extract that value and you can obtain >> it from an HSP object, via the HSPI::n() method, documented here: >> >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > io/Search/HSP/HSPI.html#POD23 > > As I mentioned in my email: > > And does anyone know off-hand if Bioperl will tell me when situations > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > subroutine > would help, but I just get a bunch of empty strings for that, > whether or > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > {"_n"} is > undef.) > > And the docs for n() actually say, "This value is not defined with > NCBI > Blast2 with gapping" although they don't say why. Which may explain > why, > when I ran the following code on the blast result I included in my > last > email, I got empty values for all of the n's. (Why is n() undefined > for > gapped blast if I'm getting n's in my results from that blast?) > > use warnings; > use strict; > use Bio::SearchIO; > > my $blast_out = $ARGV[0]; > my $in = new Bio::SearchIO(-format => 'blast', > -file => $blast_out, > -report_type => 'tblastn'); > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart Send Frame N > Evalue)), "\n"; > while(my $query = $in->next_result) { > while(my $subject = $query->next_hit) { > while (my $hsp = $subject->next_hsp) { > print join("\t", > $query->query_name, > $hsp->start("query"), > $hsp->end("query"), > $hsp->strand("hit"), > $subject->name, > $hsp->start("hit"), > $hsp->end("hit"), > $subject->frame, > $hsp->n, > $hsp->evalue, > ),"\n"; > } > } > } > >> Dave's basically correct in his explanation. It's a result of the >> application of sum statistics by the blast algorithm. You can read >> all >> about it in Korf et al's BLAST book. Here's the relevant section: > > [snip] > > Thanks, > > -Amir > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lskatz at gatech.edu Wed Nov 14 01:27:45 2007 From: lskatz at gatech.edu (Lee Katz) Date: Tue, 13 Nov 2007 20:27:45 -0500 Subject: [Bioperl-l] chromatogram Message-ID: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Hi, I would like to know how to draw a chromatogram file. Does anyone have any sample code where you read in an scf file and create a jpeg or other image file? For that matter, I want to be able to customize these images with base calls if possible. I really appreciate the help, so thanks! -- Lee Katz From mvrmakam at yahoo.com Wed Nov 14 09:52:13 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Wed, 14 Nov 2007 01:52:13 -0800 (PST) Subject: [Bioperl-l] Installing Bioperl on Windows XP Message-ID: <235423.72586.qm@web33703.mail.mud.yahoo.com> Hi, I am encountering a problem while installing Bioperl on Windows XP. I have installed ActivePerl version 5.8.8.822. I am using Perl Package Manager GUI. Also, I am following the instructions outlined for installing Bioperl on Windows. I am getting an error. The error is as follows: Downloading ActiveState Package Repository packlist ... failed 500 Can't connect to ppm4.activestate.com:80 (Bad hostname 'ppm4.activestate.com') I do not know how to overcome this problem. The other issue is when I type bioperl in the search box I do not see any packages of bioperl. I do not know what the problem is. If anyone of you could guide me through the installation process I would appreciate it. Thanks, Roshan ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From cjfields at uiuc.edu Wed Nov 14 14:02:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 14 Nov 2007 08:02:05 -0600 Subject: [Bioperl-l] Installing Bioperl on Windows XP In-Reply-To: <235423.72586.qm@web33703.mail.mud.yahoo.com> References: <235423.72586.qm@web33703.mail.mud.yahoo.com> Message-ID: <22873767-9CBD-4D38-BC9C-5267F1FFB04D@uiuc.edu> The instructions are pretty specific: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Note the section on adding new repositories. As for the PPM connection error, it's more than likely an error with the default address but it isn't bioperl-related; maybe answers lie here: http://aspn.activestate.com/ASPN/docs/ActivePerl/5.8/faq/ActivePerl- faq2.html#ppm_repositories chris On Nov 14, 2007, at 3:52 AM, Roshan Makam wrote: > Hi, > > I am encountering a problem while installing Bioperl on Windows > XP. I have installed ActivePerl version 5.8.8.822. I am using > Perl Package Manager GUI. Also, I am following the instructions > outlined for installing Bioperl on Windows. I am getting an > error. The error is as follows: > > Downloading ActiveState Package Repository packlist ... failed 500 > Can't connect to ppm4.activestate.com:80 (Bad hostname > 'ppm4.activestate.com') > > I do not know how to overcome this problem. The other issue is > when I type bioperl in the search box I do not see any packages of > bioperl. I do not know what the problem is. If anyone of you > could guide me through the installation process I would appreciate it. > > Thanks, > > Roshan From reshetovdenis at gmail.com Wed Nov 14 17:28:40 2007 From: reshetovdenis at gmail.com (Denis Reshetov) Date: Wed, 14 Nov 2007 20:28:40 +0300 Subject: [Bioperl-l] how to load all genomes Message-ID: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Dear BioPerl-db Creators, I`m trying to load all genomes from NCBI ftp site to my BioSql database using common script load_seqdatabase.pl But it seems very slow. Let me know what is the better way to do it? Thank you very much, Denis. From barry.moore at genetics.utah.edu Wed Nov 14 19:18:29 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 14 Nov 2007 12:18:29 -0700 Subject: [Bioperl-l] how to load all genomes In-Reply-To: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> References: <7ed774ca0711140928r462976dcjae40fd0886031d08@mail.gmail.com> Message-ID: <66DEB322-7654-4E5E-9E96-BAE88262E3AC@genetics.utah.edu> Denis, You might be interested in this thread from a couple years ago. I was having a similar problem, that I eventually resolved. Unfortunately the reason for the problem and the solution weren't entirely clear, but you may be able to glean some ideas from it. Also, you may have already done this, but I suggest searching the archives from this list because it seems like this comes up every now and then, so there may be other postings similar to the one I'm sending you that could help you. http://www.bioperl.org/pipermail/bioperl-l/2005-January/018093.html Finally, if you are still having problems, you'll want to include a few more details about your situation. What DB are you using, have you preloaded taxonomy data etc. How fast/slow are your sequences loading? Barry On Nov 14, 2007, at 10:28 AM, Denis Reshetov wrote: > Dear BioPerl-db Creators, > > I`m trying to load all genomes from NCBI ftp site > to my BioSql database using common script load_seqdatabase.pl > > But it seems very slow. Let me know what is the better way to do it? > > Thank you very much, > > Denis. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Nov 14 19:57:49 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 08:57:49 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Here's my trace viewer. Please excuse my dodgy Perl and debugging code as it's still under development :-) Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ------------------------------------------------------------------------ ------------------ #!perl -w use ABI; use GD::Graph::lines; use GD::Graph::colour; use GD::Graph::Data; use Data::Dumper; use Getopt::Long; use constant HEIGHT => 300; GetOptions ('h|height=i' => \$HEIGHT, 'f|file=s' => \$FILE, 'o|out=s' => \$OUTFILE, 'l|left=s' => \$LEFT_SEQ, 'r|right=s' => \$RIGHT_SEQ, 's|size=i' => \$SIZE, ) || die < Set height of image (${\HEIGHT} pixels default) --file Filename for the ABI trace file --out Filename for the generated .png image --left --right --size Parse an ABI trace file and render a PNG image. See http://search.cpan.org/dist/ABI/ABI.pm or http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm USAGE my $height = $HEIGHT || HEIGHT; my $file = $FILE; my $outfile = $OUTFILE; my $abi = ABI->new(-file=> $file); my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" my @base_calls = $abi->get_base_calls(); # Get the base calls my $sequence =$abi->get_sequence(); @bp = split(//, $sequence); # iterate over array $size = $abi->get_trace_length(); for ($i=0,$count = 0; $i<$size; $i++) { if(grep(/\b$i\b/, @base_calls)){ $bases[$i] = $bp[$count]; $count++; }else{ $bases[$i] = ' '; } } # create the data. see GD::Graph::Data for details of the format my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); $graph->set( title => $abi->get_sample_name(), # y_max_value => $abi->get_max_trace() + 50, x_max_value => $abi->get_trace_length(), t_margin => 5, b_margin => 5, l_margin => 5, r_margin => 5, x_ticks => 0, text_space => 0, line_width => 1, transparent => 0, b_margin => 30, t_margin => 35, x_plot_values => 0, interlaced => 1, ); # allocate some colors for drawing the bases #use colors same as Chromas $graph->set( dclrs => [ qw( green blue black red pink) ] ); #plot the data my $gd = $graph->plot(\@data); $black = $gd->colorAllocate(0,0,0); # A $blue = $gd->colorAllocate(0,0,255); # C $red = $gd->colorAllocate(255,0,0); # G $green = $gd->colorAllocate(0,255,0); # T $magenta =$gd->colorAllocate(255,0,255); # N $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn $gray = $gd->colorAllocate(210,210,210); %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", $magenta, " ",$white); #$start_base = index(lc($sequence),lc($LEFT_SEQ)); $start_base = find_match($sequence,$LEFT_SEQ); #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ $end_base = find_match($sequence,$RIGHT_SEQ, 1); if($end_base){ $end_base += length($RIGHT_SEQ); } # get the coords of the features on the image @coords = $graph->get_hotspot(1); $size = @coords; $printed_num = 1; $basecount = 0; $numstoprint = $basecount - $start_base; # draw the colored bases and scale at top and bottom of image for ($i=0,$count = 0; $i<$size; $i++) { $c = $coords[$i]; (undef, $xs, undef, undef, undef, undef) = @$c; $base = $bases[$i]; if($base =~ /[ACGTN]/){ if($start_base - 1 == $basecount){$start_base_coord = $xs;} if($end_base - 1 == $basecount){$end_base_coord = $xs;} if(defined($SIZE) && $start_base+$SIZE -2 == $basecount){$end_base_coord_by_size = $xs;} $basecount++; $numstoprint++; $printed_num = 0; } # print the bases top and bottom $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); # print scale if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ if($LEFT_SEQ){ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; }else{ $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); $gd->string(GD::Font->Small(),$xs,$height - 15,$numstoprint,$black); $printed_num = 1; } } $top_right_corner = $xs; } # only draw the clipped region if the calculated size is + or - 6bp #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) - $SIZE >= -6 ){ # draw the clipped regions as gray #if LEFT_SEQ supplied and a match found if($LEFT_SEQ && $start_base > 0){ $gd->filledRectangle(38,35,$start_base_coord - 1,$height - 33,$red); $clipped = 1; } #if RIGHT_SEQ supplied and a match found if($RIGHT_SEQ && $end_base > 0){ print join("\t", ($end_base)),"\n"; $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - 33,$gray); $clipped = 1; } #if no RIGHT_SEQ supplied or no match found, use left match + seq length if(!$RIGHT_SEQ || $end_base < 0){ $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh t - 33,$blue); $clipped = 1; } # set height based on max trace within clipped region $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); # need to re-plot the data over the grayed out area $graph->plot(\@data) if $clipped; $gd->filledRectangle(0,0,$top_right_corner,33,$white); #} #print the graph open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; binmode OUT; print OUT $gd->png; close OUT; sub find_match{ my ($sequence,$query,$last) = @_; return -1 if length($query) < 6; my($odds, $evens, $ones, $twos, $threes, $match_pos); # try exact match $match_pos = do_regex($query, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every second base starting from the second base e.g. it will be .C.T.C.G.etc map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} ($query=~m/(\w\w)/g); $match_pos = do_regex($odds, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($evens, $sequence,$last); return $match_pos if $match_pos > 0; # try matching every third base starting from the first base e.g. it will be C..T..G..T etc map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; $threes.="..$3"} ($query =~m/(\w\w\w)/g); $match_pos = do_regex($ones, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($twos, $sequence,$last); return $match_pos if $match_pos > 0; $match_pos = do_regex($threes, $sequence,$last); return $match_pos if $match_pos > 0; # not found return -1; } sub do_regex(){ my ($query,$sequence,$last)= @_; #print "trying $query \n"; my $result = -1; $result = pos($sequence)-length($query)+1 if $last && ($sequence =~ m/.*($query)/ig); $result = pos($sequence)-length($query)+1 if($sequence =~ m/.*?($query)/ig); return $result; } ------------------------------------------------------------------------ ------------------ > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 20:47:20 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 15:47:20 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: <473B5ED8.1090201@mail.nih.gov> I guess you need chromatogram from SCF. I can't help in that. ABI.pm is not in Bioperl distribution. But to make the record straight, you can use one step chromatogram drawing in SVG from ABI file using my BioSVG module, available at: http://www.bioinformatics.org/~malay/biosvg/ Malay Smithies, Russell wrote: > Here's my trace viewer. > Please excuse my dodgy Perl and debugging code as it's still under > development :-) > > > Russell Smithies > > Bioinformatics Software Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > ------------------------------------------------------------------------ > ------------------ > > #!perl -w > use ABI; > > use GD::Graph::lines; > use GD::Graph::colour; > use GD::Graph::Data; > > use Data::Dumper; > > > use Getopt::Long; > > use constant HEIGHT => 300; > > GetOptions ('h|height=i' => \$HEIGHT, > 'f|file=s' => \$FILE, > 'o|out=s' => \$OUTFILE, > 'l|left=s' => \$LEFT_SEQ, > 'r|right=s' => \$RIGHT_SEQ, > 's|size=i' => \$SIZE, > ) || die < Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > test2.png -l actacgtacgta -r atgatcgtacgtac > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > Options: > --height Set height of image (${\HEIGHT} pixels default) > --file Filename for the ABI trace file > --out Filename for the generated .png image > --left > --right > --size > > Parse an ABI trace file and render a PNG image. > See http://search.cpan.org/dist/ABI/ABI.pm > or > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > USAGE > > my $height = $HEIGHT || HEIGHT; > my $file = $FILE; > my $outfile = $OUTFILE; > > my $abi = ABI->new(-file=> $file); > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > my @base_calls = $abi->get_base_calls(); # Get the base calls > my $sequence =$abi->get_sequence(); > @bp = split(//, $sequence); > > > > # iterate over array > $size = $abi->get_trace_length(); > for ($i=0,$count = 0; $i<$size; $i++) { > if(grep(/\b$i\b/, @base_calls)){ > $bases[$i] = $bp[$count]; > $count++; > }else{ > $bases[$i] = ' '; > } > } > > # create the data. see GD::Graph::Data for details of the format > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > $graph->set( > title => $abi->get_sample_name(), > # y_max_value => $abi->get_max_trace() + 50, > x_max_value => $abi->get_trace_length(), > t_margin => 5, > b_margin => 5, > l_margin => 5, > r_margin => 5, > x_ticks => 0, > text_space => 0, > line_width => 1, > transparent => 0, > b_margin => 30, > t_margin => 35, > x_plot_values => 0, > interlaced => 1, > ); > > # allocate some colors for drawing the bases > #use colors same as Chromas > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > #plot the data > my $gd = $graph->plot(\@data); > > $black = $gd->colorAllocate(0,0,0); # A > $blue = $gd->colorAllocate(0,0,255); # C > $red = $gd->colorAllocate(255,0,0); # G > $green = $gd->colorAllocate(0,255,0); # T > $magenta =$gd->colorAllocate(255,0,255); # N > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > $gray = $gd->colorAllocate(210,210,210); > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > $magenta, " ",$white); > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > $start_base = find_match($sequence,$LEFT_SEQ); > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > if($end_base){ > $end_base += length($RIGHT_SEQ); > } > > > # get the coords of the features on the image > @coords = $graph->get_hotspot(1); > $size = @coords; > $printed_num = 1; > $basecount = 0; > $numstoprint = $basecount - $start_base; > > # draw the colored bases and scale at top and bottom of image > for ($i=0,$count = 0; $i<$size; $i++) { > $c = $coords[$i]; > (undef, $xs, undef, undef, undef, undef) = @$c; > $base = $bases[$i]; > if($base =~ /[ACGTN]/){ > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > if(defined($SIZE) && $start_base+$SIZE -2 == > $basecount){$end_base_coord_by_size = $xs;} > $basecount++; > $numstoprint++; > $printed_num = 0; > } > # print the bases top and bottom > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > # print scale > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > if($LEFT_SEQ){ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > }else{ > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > $gd->string(GD::Font->Small(),$xs,$height - > 15,$numstoprint,$black); > $printed_num = 1; > } > } > $top_right_corner = $xs; > } > > > > # only draw the clipped region if the calculated size is + or - 6bp > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > - $SIZE >= -6 ){ > # draw the clipped regions as gray > #if LEFT_SEQ supplied and a match found > if($LEFT_SEQ && $start_base > 0){ > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > 33,$red); > $clipped = 1; > } > #if RIGHT_SEQ supplied and a match found > if($RIGHT_SEQ && $end_base > 0){ > print join("\t", ($end_base)),"\n"; > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > 33,$gray); > $clipped = 1; > } > #if no RIGHT_SEQ supplied or no match found, use left match + seq > length > if(!$RIGHT_SEQ || $end_base < 0){ > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > t - 33,$blue); > $clipped = 1; > } > > > > # set height based on max trace within clipped region > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > # need to re-plot the data over the grayed out area > $graph->plot(\@data) if $clipped; > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > #} > > #print the graph > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > binmode OUT; > print OUT $gd->png; > close OUT; > > > sub find_match{ > my ($sequence,$query,$last) = @_; > return -1 if length($query) < 6; > my($odds, $evens, $ones, $twos, $threes, $match_pos); > # try exact match > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > $match_pos > 0; > > # try matching every second base starting from the second base e.g. > it will be .C.T.C.G.etc > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > ($query=~m/(\w\w)/g); > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > if $match_pos > 0; > > # try matching every third base starting from the first base e.g. it > will be C..T..G..T etc > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > if $match_pos > 0; > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > if $match_pos > 0; > > # not found > return -1; > } > > sub do_regex(){ > my ($query,$sequence,$last)= @_; > #print "trying $query \n"; > my $result = -1; > $result = pos($sequence)-length($query)+1 if $last && ($sequence > =~ m/.*($query)/ig); > $result = pos($sequence)-length($query)+1 if($sequence =~ > m/.*?($query)/ig); > return $result; > } > > ------------------------------------------------------------------------ > ------------------ > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Lee Katz >> Sent: Wednesday, 14 November 2007 2:28 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] chromatogram >> >> Hi, >> I would like to know how to draw a chromatogram file. Does anyone >> have any sample code where you read in an scf file and create a jpeg >> or other image file? >> For that matter, I want to be able to customize these images with base >> calls if possible. I really appreciate the help, so thanks! >> >> -- >> Lee Katz >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Malay K Basu www.malaybasu.net From Russell.Smithies at agresearch.co.nz Wed Nov 14 20:58:19 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 15 Nov 2007 09:58:19 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B5ED8.1090201@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: We try and avoid SVG at all costs as installing plugins and viewers in a locked down corporate environment can be more trouble than it's worth whereas generating .png images works for any browser with no extras required. We actually call this trace drawing code from Python which then generates webpages with the embedded image. It also means we don't need to licence, install and maintain a trace viewer like Chromas. :-) Russell > -----Original Message----- > From: Malay [mailto:mbasu at mail.nih.gov] > Sent: Thursday, 15 November 2007 9:47 a.m. > To: Smithies, Russell > Cc: Lee Katz; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] chromatogram > > I guess you need chromatogram from SCF. I can't help in that. ABI.pm is > not in Bioperl distribution. But to make the record straight, you can > use one step chromatogram drawing in SVG from ABI file using my BioSVG > module, available at: > > http://www.bioinformatics.org/~malay/biosvg/ > > Malay > > > > > Smithies, Russell wrote: > > Here's my trace viewer. > > Please excuse my dodgy Perl and debugging code as it's still under > > development :-) > > > > > > Russell Smithies > > > > Bioinformatics Software Developer > > T +64 3 489 9085 > > E russell.smithies at agresearch.co.nz > > > > Invermay Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T +64 3 489 3809 > > F +64 3 489 9174 > > www.agresearch.co.nz > > > > > > ------------------------------------------------------------------------ > > ------------------ > > > > #!perl -w > > use ABI; > > > > use GD::Graph::lines; > > use GD::Graph::colour; > > use GD::Graph::Data; > > > > use Data::Dumper; > > > > > > use Getopt::Long; > > > > use constant HEIGHT => 300; > > > > GetOptions ('h|height=i' => \$HEIGHT, > > 'f|file=s' => \$FILE, > > 'o|out=s' => \$OUTFILE, > > 'l|left=s' => \$LEFT_SEQ, > > 'r|right=s' => \$RIGHT_SEQ, > > 's|size=i' => \$SIZE, > > ) || die < > Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o > > test2.png -l actacgtacgta -r atgatcgtacgtac > > or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 > > --out test2.png --left actacgtacgta --right atgatcgtacgtac > > > > Options: > > --height Set height of image (${\HEIGHT} pixels default) > > --file Filename for the ABI trace file > > --out Filename for the generated .png image > > --left > > --right > > --size > > > > Parse an ABI trace file and render a PNG image. > > See http://search.cpan.org/dist/ABI/ABI.pm > > or > > http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm > > USAGE > > > > my $height = $HEIGHT || HEIGHT; > > my $file = $FILE; > > my $outfile = $OUTFILE; > > > > my $abi = ABI->new(-file=> $file); > > > > my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" > > my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" > > my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" > > my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" > > > > my @base_calls = $abi->get_base_calls(); # Get the base calls > > my $sequence =$abi->get_sequence(); > > @bp = split(//, $sequence); > > > > > > > > # iterate over array > > $size = $abi->get_trace_length(); > > for ($i=0,$count = 0; $i<$size; $i++) { > > if(grep(/\b$i\b/, @base_calls)){ > > $bases[$i] = $bp[$count]; > > $count++; > > }else{ > > $bases[$i] = ' '; > > } > > } > > > > # create the data. see GD::Graph::Data for details of the format > > my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); > > > > my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); > > $graph->set( > > title => $abi->get_sample_name(), > > # y_max_value => $abi->get_max_trace() + 50, > > x_max_value => $abi->get_trace_length(), > > t_margin => 5, > > b_margin => 5, > > l_margin => 5, > > r_margin => 5, > > x_ticks => 0, > > text_space => 0, > > line_width => 1, > > transparent => 0, > > b_margin => 30, > > t_margin => 35, > > x_plot_values => 0, > > interlaced => 1, > > ); > > > > # allocate some colors for drawing the bases > > #use colors same as Chromas > > $graph->set( dclrs => [ qw( green blue black red pink) ] ); > > > > #plot the data > > my $gd = $graph->plot(\@data); > > > > $black = $gd->colorAllocate(0,0,0); # A > > $blue = $gd->colorAllocate(0,0,255); # C > > $red = $gd->colorAllocate(255,0,0); # G > > $green = $gd->colorAllocate(0,255,0); # T > > $magenta =$gd->colorAllocate(255,0,255); # N > > $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn > > $gray = $gd->colorAllocate(210,210,210); > > %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", > > $magenta, " ",$white); > > > > #$start_base = index(lc($sequence),lc($LEFT_SEQ)); > > $start_base = find_match($sequence,$LEFT_SEQ); > > > > #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ > > $end_base = find_match($sequence,$RIGHT_SEQ, 1); > > if($end_base){ > > $end_base += length($RIGHT_SEQ); > > } > > > > > > # get the coords of the features on the image > > @coords = $graph->get_hotspot(1); > > $size = @coords; > > $printed_num = 1; > > $basecount = 0; > > $numstoprint = $basecount - $start_base; > > > > # draw the colored bases and scale at top and bottom of image > > for ($i=0,$count = 0; $i<$size; $i++) { > > $c = $coords[$i]; > > (undef, $xs, undef, undef, undef, undef) = @$c; > > $base = $bases[$i]; > > if($base =~ /[ACGTN]/){ > > if($start_base - 1 == $basecount){$start_base_coord = $xs;} > > if($end_base - 1 == $basecount){$end_base_coord = $xs;} > > if(defined($SIZE) && $start_base+$SIZE -2 == > > $basecount){$end_base_coord_by_size = $xs;} > > $basecount++; > > $numstoprint++; > > $printed_num = 0; > > } > > # print the bases top and bottom > > $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); > > $gd->string(GD::Font->Small(),$xs,$height - 30,$base,$colors{$base}); > > > > # print scale > > if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ > > if($LEFT_SEQ){ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > }else{ > > $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); > > $gd->string(GD::Font->Small(),$xs,$height - > > 15,$numstoprint,$black); > > $printed_num = 1; > > } > > } > > $top_right_corner = $xs; > > } > > > > > > > > # only draw the clipped region if the calculated size is + or - 6bp > > #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - $start_base) > > - $SIZE >= -6 ){ > > # draw the clipped regions as gray > > #if LEFT_SEQ supplied and a match found > > if($LEFT_SEQ && $start_base > 0){ > > $gd->filledRectangle(38,35,$start_base_coord - 1,$height - > > 33,$red); > > $clipped = 1; > > } > > #if RIGHT_SEQ supplied and a match found > > if($RIGHT_SEQ && $end_base > 0){ > > print join("\t", ($end_base)),"\n"; > > $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height - > > 33,$gray); > > $clipped = 1; > > } > > #if no RIGHT_SEQ supplied or no match found, use left match + seq > > length > > if(!$RIGHT_SEQ || $end_base < 0){ > > > > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh > > t - 33,$blue); > > $clipped = 1; > > } > > > > > > > > # set height based on max trace within clipped region > > $graph->set( y_max_value => 3000);#$abi->get_max_trace() + 50); > > > > # need to re-plot the data over the grayed out area > > $graph->plot(\@data) if $clipped; > > $gd->filledRectangle(0,0,$top_right_corner,33,$white); > > > > #} > > > > #print the graph > > open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; > > binmode OUT; > > print OUT $gd->png; > > close OUT; > > > > > > sub find_match{ > > my ($sequence,$query,$last) = @_; > > return -1 if length($query) < 6; > > my($odds, $evens, $ones, $twos, $threes, $match_pos); > > # try exact match > > $match_pos = do_regex($query, $sequence,$last); return $match_pos if > > $match_pos > 0; > > > > # try matching every second base starting from the second base e.g. > > it will be .C.T.C.G.etc > > map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} > > ($query=~m/(\w\w)/g); > > $match_pos = do_regex($odds, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($evens, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # try matching every third base starting from the first base e.g. it > > will be C..T..G..T etc > > map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; > > $threes.="..$3"} ($query =~m/(\w\w\w)/g); > > $match_pos = do_regex($ones, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($twos, $sequence,$last); return $match_pos > > if $match_pos > 0; > > $match_pos = do_regex($threes, $sequence,$last); return $match_pos > > if $match_pos > 0; > > > > # not found > > return -1; > > } > > > > sub do_regex(){ > > my ($query,$sequence,$last)= @_; > > #print "trying $query \n"; > > my $result = -1; > > $result = pos($sequence)-length($query)+1 if $last && ($sequence > > =~ m/.*($query)/ig); > > $result = pos($sequence)-length($query)+1 if($sequence =~ > > m/.*?($query)/ig); > > return $result; > > } > > > > ------------------------------------------------------------------------ > > ------------------ > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open- > >> bio.org] On Behalf Of Lee Katz > >> Sent: Wednesday, 14 November 2007 2:28 p.m. > >> To: bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] chromatogram > >> > >> Hi, > >> I would like to know how to draw a chromatogram file. Does anyone > >> have any sample code where you read in an scf file and create a jpeg > >> or other image file? > >> For that matter, I want to be able to customize these images with base > >> calls if possible. I really appreciate the help, so thanks! > >> > >> -- > >> Lee Katz > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ============================================================= > ========== > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > > ============================================================= > ========== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Malay K Basu > www.malaybasu.net ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mbasu at mail.nih.gov Wed Nov 14 21:04:25 2007 From: mbasu at mail.nih.gov (Malay) Date: Wed, 14 Nov 2007 16:04:25 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> Message-ID: <473B62D9.8010004@mail.nih.gov> You don't need any plugin. Firefox natively can show most of the SVG files. -Malay Smithies, Russell wrote: > We try and avoid SVG at all costs as installing plugins and viewers in a > locked down corporate environment can be more trouble than it's worth > whereas generating .png images works for any browser with no extras > required. > We actually call this trace drawing code from Python which then > generates webpages with the embedded image. > It also means we don't need to licence, install and maintain a trace > viewer like Chromas. > :-) > > Russell > >> -----Original Message----- >> From: Malay [mailto:mbasu at mail.nih.gov] >> Sent: Thursday, 15 November 2007 9:47 a.m. >> To: Smithies, Russell >> Cc: Lee Katz; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] chromatogram >> >> I guess you need chromatogram from SCF. I can't help in that. ABI.pm > is >> not in Bioperl distribution. But to make the record straight, you can >> use one step chromatogram drawing in SVG from ABI file using my BioSVG >> module, available at: >> >> http://www.bioinformatics.org/~malay/biosvg/ >> >> Malay >> >> >> >> >> Smithies, Russell wrote: >>> Here's my trace viewer. >>> Please excuse my dodgy Perl and debugging code as it's still under >>> development :-) >>> >>> >>> Russell Smithies >>> >>> Bioinformatics Software Developer >>> T +64 3 489 9085 >>> E russell.smithies at agresearch.co.nz >>> >>> Invermay Research Centre >>> Puddle Alley, >>> Mosgiel, >>> New Zealand >>> T +64 3 489 3809 >>> F +64 3 489 9174 >>> www.agresearch.co.nz >>> >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>> #!perl -w >>> use ABI; >>> >>> use GD::Graph::lines; >>> use GD::Graph::colour; >>> use GD::Graph::Data; >>> >>> use Data::Dumper; >>> >>> >>> use Getopt::Long; >>> >>> use constant HEIGHT => 300; >>> >>> GetOptions ('h|height=i' => \$HEIGHT, >>> 'f|file=s' => \$FILE, >>> 'o|out=s' => \$OUTFILE, >>> 'l|left=s' => \$LEFT_SEQ, >>> 'r|right=s' => \$RIGHT_SEQ, >>> 's|size=i' => \$SIZE, >>> ) || die <>> Usage: perl $0 -h 400 -f 1188_13_14728111_16654_48544_080.ab1 -o >>> test2.png -l actacgtacgta -r atgatcgtacgtac >>> or perl $0 --height 400 --file 1188_13_14728111_16654_48544_080.ab1 >>> --out test2.png --left actacgtacgta --right atgatcgtacgtac >>> >>> Options: >>> --height Set height of image (${\HEIGHT} pixels default) >>> --file Filename for the ABI trace file >>> --out Filename for the generated .png image >>> --left >>> --right >>> --size >>> >>> Parse an ABI trace file and render a PNG image. >>> See http://search.cpan.org/dist/ABI/ABI.pm >>> or >>> http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm >>> USAGE >>> >>> my $height = $HEIGHT || HEIGHT; >>> my $file = $FILE; >>> my $outfile = $OUTFILE; >>> >>> my $abi = ABI->new(-file=> $file); >>> >>> my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A" >>> my @trace_c = $abi->get_trace("C"); # Get the raw traces for "C" >>> my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G" >>> my @trace_t = $abi->get_trace("T"); # Get the raw traces for "T" >>> >>> my @base_calls = $abi->get_base_calls(); # Get the base calls >>> my $sequence =$abi->get_sequence(); >>> @bp = split(//, $sequence); >>> >>> >>> >>> # iterate over array >>> $size = $abi->get_trace_length(); >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> if(grep(/\b$i\b/, @base_calls)){ >>> $bases[$i] = $bp[$count]; >>> $count++; >>> }else{ >>> $bases[$i] = ' '; >>> } >>> } >>> >>> # create the data. see GD::Graph::Data for details of the format >>> my @data = (\@bases, \@trace_a, \@trace_c, \@trace_g, \@trace_t, ); >>> >>> my $graph = new GD::Graph::lines($abi->get_trace_length(),$height); >>> $graph->set( >>> title => $abi->get_sample_name(), >>> # y_max_value => $abi->get_max_trace() + 50, >>> x_max_value => $abi->get_trace_length(), >>> t_margin => 5, >>> b_margin => 5, >>> l_margin => 5, >>> r_margin => 5, >>> x_ticks => 0, >>> text_space => 0, >>> line_width => 1, >>> transparent => 0, >>> b_margin => 30, >>> t_margin => 35, >>> x_plot_values => 0, >>> interlaced => 1, >>> ); >>> >>> # allocate some colors for drawing the bases >>> #use colors same as Chromas >>> $graph->set( dclrs => [ qw( green blue black red pink) ] ); >>> >>> #plot the data >>> my $gd = $graph->plot(\@data); >>> >>> $black = $gd->colorAllocate(0,0,0); # A >>> $blue = $gd->colorAllocate(0,0,255); # C >>> $red = $gd->colorAllocate(255,0,0); # G >>> $green = $gd->colorAllocate(0,255,0); # T >>> $magenta =$gd->colorAllocate(255,0,255); # N >>> $white = $gd->colorAllocate(255,255,255); # undefined aren't drawn >>> $gray = $gd->colorAllocate(210,210,210); >>> %colors = ("A", $green, "C", $blue, "G",$black, "T", $red, "N", >>> $magenta, " ",$white); >>> >>> #$start_base = index(lc($sequence),lc($LEFT_SEQ)); >>> $start_base = find_match($sequence,$LEFT_SEQ); >>> >>> #if($end_base = rindex(lc($sequence),lc($RIGHT_SEQ)) > 0){ >>> $end_base = find_match($sequence,$RIGHT_SEQ, 1); >>> if($end_base){ >>> $end_base += length($RIGHT_SEQ); >>> } >>> >>> >>> # get the coords of the features on the image >>> @coords = $graph->get_hotspot(1); >>> $size = @coords; >>> $printed_num = 1; >>> $basecount = 0; >>> $numstoprint = $basecount - $start_base; >>> >>> # draw the colored bases and scale at top and bottom of image >>> for ($i=0,$count = 0; $i<$size; $i++) { >>> $c = $coords[$i]; >>> (undef, $xs, undef, undef, undef, undef) = @$c; >>> $base = $bases[$i]; >>> if($base =~ /[ACGTN]/){ >>> if($start_base - 1 == $basecount){$start_base_coord = $xs;} >>> if($end_base - 1 == $basecount){$end_base_coord = $xs;} >>> if(defined($SIZE) && $start_base+$SIZE -2 == >>> $basecount){$end_base_coord_by_size = $xs;} >>> $basecount++; >>> $numstoprint++; >>> $printed_num = 0; >>> } >>> # print the bases top and bottom >>> $gd->string(GD::Font->Small(),$xs,20,$base,$colors{$base}); >>> $gd->string(GD::Font->Small(),$xs,$height - > 30,$base,$colors{$base}); >>> # print scale >>> if($basecount > 0 && $numstoprint % 10 == 0 && $printed_num == 0){ >>> if($LEFT_SEQ){ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> }else{ >>> $gd->string(GD::Font->Small(),$xs,5,$numstoprint,$black); >>> $gd->string(GD::Font->Small(),$xs,$height - >>> 15,$numstoprint,$black); >>> $printed_num = 1; >>> } >>> } >>> $top_right_corner = $xs; >>> } >>> >>> >>> >>> # only draw the clipped region if the calculated size is + or - 6bp >>> #if(($end_base - $start_base) - $SIZE <= 6 && ($end_base - > $start_base) >>> - $SIZE >= -6 ){ >>> # draw the clipped regions as gray >>> #if LEFT_SEQ supplied and a match found >>> if($LEFT_SEQ && $start_base > 0){ >>> $gd->filledRectangle(38,35,$start_base_coord - 1,$height - >>> 33,$red); >>> $clipped = 1; >>> } >>> #if RIGHT_SEQ supplied and a match found >>> if($RIGHT_SEQ && $end_base > 0){ >>> print join("\t", ($end_base)),"\n"; >>> $gd->filledRectangle($end_base_coord,35,$top_right_corner,$height > - >>> 33,$gray); >>> $clipped = 1; >>> } >>> #if no RIGHT_SEQ supplied or no match found, use left match + seq >>> length >>> if(!$RIGHT_SEQ || $end_base < 0){ >>> >>> > $gd->filledRectangle($end_base_coord_by_size,35,$top_right_corner,$heigh >>> t - 33,$blue); >>> $clipped = 1; >>> } >>> >>> >>> >>> # set height based on max trace within clipped region >>> $graph->set( y_max_value => 3000);#$abi->get_max_trace() + > 50); >>> # need to re-plot the data over the grayed out area >>> $graph->plot(\@data) if $clipped; >>> $gd->filledRectangle(0,0,$top_right_corner,33,$white); >>> >>> #} >>> >>> #print the graph >>> open(OUT, ">$outfile") or die "can't open output file: $outfile\n"; >>> binmode OUT; >>> print OUT $gd->png; >>> close OUT; >>> >>> >>> sub find_match{ >>> my ($sequence,$query,$last) = @_; >>> return -1 if length($query) < 6; >>> my($odds, $evens, $ones, $twos, $threes, $match_pos); >>> # try exact match >>> $match_pos = do_regex($query, $sequence,$last); return > $match_pos if >>> $match_pos > 0; >>> >>> # try matching every second base starting from the second base > e.g. >>> it will be .C.T.C.G.etc >>> map {m/(\w)(\w)/g; $odds.="$1."; $evens.=".$2"} >>> ($query=~m/(\w\w)/g); >>> $match_pos = do_regex($odds, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($evens, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # try matching every third base starting from the first base > e.g. it >>> will be C..T..G..T etc >>> map {m/(\w)(\w)(\w)/g; $ones.="$1.."; $twos.=".$2."; >>> $threes.="..$3"} ($query =~m/(\w\w\w)/g); >>> $match_pos = do_regex($ones, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($twos, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> $match_pos = do_regex($threes, $sequence,$last); return > $match_pos >>> if $match_pos > 0; >>> >>> # not found >>> return -1; >>> } >>> >>> sub do_regex(){ >>> my ($query,$sequence,$last)= @_; >>> #print "trying $query \n"; >>> my $result = -1; >>> $result = pos($sequence)-length($query)+1 if $last && > ($sequence >>> =~ m/.*($query)/ig); >>> $result = pos($sequence)-length($query)+1 if($sequence =~ >>> m/.*?($query)/ig); >>> return $result; >>> } >>> >>> > ------------------------------------------------------------------------ >>> ------------------ >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open- >>>> bio.org] On Behalf Of Lee Katz >>>> Sent: Wednesday, 14 November 2007 2:28 p.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] chromatogram >>>> >>>> Hi, >>>> I would like to know how to draw a chromatogram file. Does anyone >>>> have any sample code where you read in an scf file and create a > jpeg >>>> or other image file? >>>> For that matter, I want to be able to customize these images with > base >>>> calls if possible. I really appreciate the help, so thanks! >>>> >>>> -- >>>> Lee Katz >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ============================================================= >> ========== >>> Attention: The information contained in this message and/or > attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or > privileged >>> material. Any review, retransmission, dissemination or other use of, > or >>> taking of any action in reliance upon, this information by persons > or >>> entities other than the intended recipients is prohibited by > AgResearch >>> Limited. If you have received this message in error, please notify > the >>> sender immediately. >>> >> ============================================================= >> ========== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Malay K Basu >> www.malaybasu.net > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- Malay K Basu www.malaybasu.net From tomboy at cs.huji.ac.il Thu Nov 15 02:43:43 2007 From: tomboy at cs.huji.ac.il (Tomer Hertz) Date: Wed, 14 Nov 2007 18:43:43 -0800 Subject: [Bioperl-l] problems in stalling bio perl Message-ID: hi when I try to install bioperl I get the following error message: hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 $ perl Build.PL Can't find file lib/Module/Build.pm to determine version at /usr/lib/perl5/site_ perl/5.8/Module/Build/Base.pm line 950. can you please help. I have tried reinstalling the build command and that does not seem to help as well. many thanks --Tomer -- -------------------------------------------------------------------------------- Tomer Hertz Postdoctoral Researcher Machine Learning and Applied Statistics Microsoft Research One Microsoft Way, Redmond, WA, 98052, USA Homepage: www.cs.huji.ac.il/~tomboy Email: hertz at microsoft dot com Tel: (425)-421-8313 Fax: (425) 936-7329 -------------------------------------------------------------------------------- From lskatz at gatech.edu Thu Nov 15 13:24:02 2007 From: lskatz at gatech.edu (Lee Katz) Date: Thu, 15 Nov 2007 08:24:02 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <473B62D9.8010004@mail.nih.gov> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> Message-ID: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Thank you all. Are you all sure in that there is no way to go from an scf to an image though? I do have abi files, but I am relying on Phred output for base calls for other things and I want to stay consistent. This means that if I use the fasta files that I get from Phred in another part of my program, I need to use the scf files it produces. If this is not possible, do you know if drawing an scf is in the works? Thanks. -- Lee Katz http://www.lskatz.com From cain.cshl at gmail.com Thu Nov 15 14:21:26 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Thu, 15 Nov 2007 09:21:26 -0500 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <1195136486.2785.12.camel@localhost.localdomain> Hi Lee, Distributed with GBrowse is Bio::Graphics::Glyph::trace, which uses Bio::SCF to draw trace files onto a Bio::Graphics::Panel. Bio::SCF is not part of bioperl, so you have to get it from CPAN and it depends on the Staden io-lib package, so you'll need that too. You can get GBrowse from http://www.gmod.org/gbrowse , and you can look at the tutorial for more information on configuring the trace glyph. Scott On Thu, 2007-11-15 at 08:24 -0500, Lee Katz wrote: > Thank you all. > Are you all sure in that there is no way to go from an scf to an image > though? I do have abi files, but I am relying on Phred output for > base calls for other things and I want to stay consistent. This means > that if I use the fasta files that I get from Phred in another part of > my program, I need to use the scf files it produces. > > If this is not possible, do you know if drawing an scf is in the works? Thanks. > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From bosborne11 at verizon.net Thu Nov 15 14:18:05 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 09:18:05 -0500 Subject: [Bioperl-l] problems in stalling bio perl In-Reply-To: Message-ID: Tomer, Interesting. When I used Cygwin I always worked entirely within the C: drive, it looks like you're executing the script from the E: drive. Is Cygwin installed in C:/cygwin? You can see what I'm getting at, it's possible that you need to set $PERL5LIB to something like /cygdrive/c/cygwin/usr/lib/perl5. What does 'echo $PERL5LIB' say? Brian O. On 11/14/07 9:43 PM, "Tomer Hertz" wrote: > hi > when I try to install bioperl I get the following error message: > > hertz at mlasbio6 /cygdrive/e/progs/bioperl-1.5.2_102 > $ perl Build.PL > Can't find file lib/Module/Build.pm to determine version at > /usr/lib/perl5/site_ > perl/5.8/Module/Build/Base.pm line 950. > can you please help. I have tried reinstalling the build command and that > does not seem to help as well. > > many thanks > --Tomer From bernd.web at gmail.com Thu Nov 15 15:26:42 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 16:26:42 +0100 Subject: [Bioperl-l] Graphics::Panel Message-ID: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Hi, Has someone been able to access '$description' for the production of imagemaps with Graphics::Panel? The map below does not print the "title" tag at all, '$description' seems not available, although for the tracks ($panel->add_track) it is available. $map = $panel->create_web_map($mapname, $linkrule, '$description'); Replacing '$description' with a coderef for the titletag does work, if I use the code below my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } Regards, Bernd From luciap at sas.upenn.edu Thu Nov 15 15:44:21 2007 From: luciap at sas.upenn.edu (Lucia Peixoto) Date: Thu, 15 Nov 2007 10:44:21 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? Message-ID: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Hi I was asked this question recently and it occurred to me I must be doing things inefficiently To produce gff file I was using SeqIO to parse the required fields, then according to the conventions just printing out whatever was required tab delimited, which is easy but if I wanted to generate a genbank file, extracting features from a gff file and a plain fasta file it was more complicated is there support for gff in bioperl now? anyone can contribute with smart way to go from/to gff, genebank and embl? thanks very much Lucia Peixoto Department of Biology,SAS University of Pennsylvania From lstein at cshl.edu Thu Nov 15 17:38:04 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 15 Nov 2007 12:38:04 -0500 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> Message-ID: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Depending on which Feature object you use, you may have to use a tag named "note" instead of "description". Lincoln On Nov 15, 2007 10:26 AM, Bernd Web wrote: > Hi, > > Has someone been able to access '$description' for the production of > imagemaps with Graphics::Panel? > The map below does not print the "title" tag at all, '$description' > seems not available, although for the tracks ($panel->add_track) it is > available. > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > Replacing '$description' with a coderef for the titletag does work, if > I use the code below > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From bernd.web at gmail.com Thu Nov 15 18:03:19 2007 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 15 Nov 2007 19:03:19 +0100 Subject: [Bioperl-l] Graphics::Panel In-Reply-To: <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> References: <716af09c0711150726r1dba8aa8v9c6bfd54825b99df@mail.gmail.com> <6dce9a0b0711150938t31a9e5c4w279441257dbd9040@mail.gmail.com> Message-ID: <716af09c0711151003w6b5965b6g967ae2391a460dcb@mail.gmail.com> On Nov 15, 2007 6:38 PM, Lincoln Stein wrote: > Depending on which Feature object you use, you may have to use a tag named > "note" instead of "description". > > Lincoln > > > > On Nov 15, 2007 10:26 AM, Bernd Web < bernd.web at gmail.com> wrote: > > > > > > > > Hi, > > > > Has someone been able to access '$description' for the production of > > imagemaps with Graphics::Panel? > > The map below does not print the "title" tag at all, '$description' > > seems not available, although for the tracks ($panel->add_track) it is > > available. > > $map = $panel->create_web_map($mapname, $linkrule, '$description'); > > > > Replacing '$description' with a coderef for the titletag does work, if > > I use the code below > > my $titlerule = sub { return ($_[0]->each_tag_value('description'))[0] }; > > > > > > I am using bioperl-1.5.2_102; Panel.pm: sub api_version { 1.654 } > > > > > > Regards, > > Bernd > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Thu Nov 15 18:43:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Nov 2007 12:43:02 -0600 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> References: <1195141461.473c6955bcd4b@webmail.sas.upenn.edu> Message-ID: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> There are currently many ways to get what you want, but not all are consistent (particularly re: GFF3). We are aiming for more consistent, compliant GFF/GTF output in the next developer series (1.7) of Bioperl. You can try using bp_genbank2gff or bp_genbank2gff3 (both in the scripts directory); these are probably the most common way when working directly from a seq record. Bio::Tools::GFF is the most commonly used class though I'm unsure of it's status for GFF3 output. From within a Bio::SeqI you can call write_gff() (currently not very flexible) or from the SeqFeature itself gff_string(). Bio::Graphics::Feature has the additional method gff3_string(). Bio::FeatureIO is also an option, though I would consider it very experimental (it will likely undergo significant revision in the next bioperl dev series). Any others anyone can think of, maybe non-BioPerl related as well? chris On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > Hi > I was asked this question recently > and it occurred to me I must be doing things inefficiently > To produce gff file I was using SeqIO to parse the required fields, > then > according to the conventions just printing out whatever was > required tab > delimited, which is easy > > but if I wanted to generate a genbank file, extracting features > from a gff file > and a plain fasta file it was more complicated > is there support for gff in bioperl now? > anyone can contribute with smart way to go from/to gff, genebank > and embl? > > thanks very much > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Thu Nov 15 19:19:41 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 15 Nov 2007 14:19:41 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: <220E2378-3937-410A-B10D-BF6B63EB9DD9@uiuc.edu> Message-ID: Chris, There's also a genbank2gff3.PLS script in the GMOD package ( http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? revision=1.5&view=markup). However, it has not been modified for a couple of years, it may not be the "preferred" script. See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information on using Bioperl's bp_genbank2gff3 script. Brian O. On 11/15/07 1:43 PM, "Chris Fields" wrote: > There are currently many ways to get what you want, but not all are > consistent (particularly re: GFF3). We are aiming for more > consistent, compliant GFF/GTF output in the next developer series > (1.7) of Bioperl. > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > scripts directory); these are probably the most common way when > working directly from a seq record. Bio::Tools::GFF is the most > commonly used class though I'm unsure of it's status for GFF3 > output. From within a Bio::SeqI you can call write_gff() (currently > not very flexible) or from the SeqFeature itself gff_string(). > Bio::Graphics::Feature has the additional method gff3_string(). > Bio::FeatureIO is also an option, though I would consider it very > experimental (it will likely undergo significant revision in the next > bioperl dev series). > > Any others anyone can think of, maybe non-BioPerl related as well? > > chris > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > >> Hi >> I was asked this question recently >> and it occurred to me I must be doing things inefficiently >> To produce gff file I was using SeqIO to parse the required fields, >> then >> according to the conventions just printing out whatever was >> required tab >> delimited, which is easy >> >> but if I wanted to generate a genbank file, extracting features >> from a gff file >> and a plain fasta file it was more complicated >> is there support for gff in bioperl now? >> anyone can contribute with smart way to go from/to gff, genebank >> and embl? >> >> thanks very much >> >> Lucia Peixoto >> Department of Biology,SAS >> University of Pennsylvania >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Nov 15 22:31:28 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 16 Nov 2007 11:31:28 +1300 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: Just to add to this, does anyone have any code for reading .sff 'traces' from 454 sequences? Thanx, Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Lee Katz > Sent: Wednesday, 14 November 2007 2:28 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] chromatogram > > Hi, > I would like to know how to draw a chromatogram file. Does anyone > have any sample code where you read in an scf file and create a jpeg > or other image file? > For that matter, I want to be able to customize these images with base > calls if possible. I really appreciate the help, so thanks! > > -- > Lee Katz > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From torsten.seemann at infotech.monash.edu.au Fri Nov 16 01:13:22 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 16 Nov 2007 12:13:22 +1100 Subject: [Bioperl-l] chromatogram In-Reply-To: References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> Message-ID: > Just to add to this, does anyone have any code for reading .sff 'traces' > from 454 sequences? The .SFF files can be manipulated using the SFF tools which 454 distribute with their result data. eg. "sffinfo 454AllContigs.sff" will list all the reads with the original flowgram values etc. However, the SFF tools are i386.Linux binaries, so not really a portable solution. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From mvrmakam at yahoo.com Fri Nov 16 03:04:55 2007 From: mvrmakam at yahoo.com (Roshan Makam) Date: Thu, 15 Nov 2007 19:04:55 -0800 (PST) Subject: [Bioperl-l] Problem with installing bioperl on Windows Message-ID: <456881.59573.qm@web33712.mail.mud.yahoo.com> Hi, I have installed Perl Package Manager ver 5.8.8.822 on windows XP. I have included all the repositories outlined in Installing BioPerl for Windows and have selected all Packages in the View. However, I am not able to see any packages in the view box. Can anyone help me in this matter. Roshan ____________________________________________________________________________________ Get easy, one-click access to your favorites. Make Yahoo! your homepage. http://www.yahoo.com/r/hs From David.Messina at sbc.su.se Fri Nov 16 08:33:04 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 16 Nov 2007 09:33:04 +0100 Subject: [Bioperl-l] chromatogram In-Reply-To: <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> Message-ID: <628aabb70711160033na56be2an5bff905fdf13a0c0@mail.gmail.com> > If this is not possible, do you know if drawing an scf is in the > works? Thanks. > One non-BioPerl solution is 4peaks: http://mekentosj.com/4peaks/ Mac only, but really great software. I'm also a fan of their Papers journal article PDF library program. Dave From neetisomaiya at gmail.com Mon Nov 19 06:11:49 2007 From: neetisomaiya at gmail.com (neeti somaiya) Date: Mon, 19 Nov 2007 11:41:49 +0530 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently Message-ID: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Hi, I am using Bio::SeqIO for parsing KEGG gene ent files. A part of my code is foreach my $key ( $ac->get_all_annotation_keys() ) { if($key eq "dblink") { my %values = $ac->get_Annotations($key); foreach my $value ( keys(%values )) { print "\n*****VALUE $value*****\n"; } } } Here not all dblinks present in the actual file get parsed. For eg, in the data below, ENTRY 116064 CDS H.sapiens NAME LRRC58 DEFINITION leucine rich repeat containing 58 POSITION 3q13.33 MOTIF Pfam: SdiA-regulated LRR_1 PROSITE: LEU_RICH DBLINKS NCBI-GI: 153792305 NCBI-GeneID: 116064 HGNC: 26968 Ensembl: ENSG00000163428 UniProt: Q96CX6 Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and PROSITE, but doesnt give me HGNC and UniProt. For other entries it gives me other combinations of dbs. Can anyone help me with this. Why is this happenning? I have no clue. Thanks and Regards, Neeti. -- -Neeti Even my blood says, B positive From johnston at biochem.ucl.ac.uk Mon Nov 19 11:44:59 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 11:44:59 +0000 (GMT) Subject: [Bioperl-l] blast database names Message-ID: Hello, Is there a list of the possible database names for -data => $dbname in RemoteBlast somwhere? Cheers, Cass From cjfields at uiuc.edu Mon Nov 19 13:44:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 07:44:46 -0600 Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: Here's a recent list (don't know if it's up-to-date): http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html chris On Nov 19, 2007, at 5:44 AM, Caroline Johnston wrote: > Hello, > > Is there a list of the possible database names for -data => > $dbname in RemoteBlast somwhere? > > Cheers, > Cass > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Nov 19 14:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Nov 2007 08:33:46 -0600 Subject: [Bioperl-l] problem with Bio::SeqIo KEGG - need help urgently In-Reply-To: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> References: <764978cf0711182211h591195c0n3d4d939368599953@mail.gmail.com> Message-ID: It makes sense in the light that you're (erroneously) using a hash: my %values = $ac->get_Annotations($key); This assigns key-value pairs of DBLink => DBLink; you don't see an error b/c the number of links happens to be even (I get 8) but you would if the number of links returned is odd (missing value for key error or something along those lines). So when you call: foreach my $value (keys(%values)) {....} you only get half of the DBLinks. You should use an array: my @values = $ac->get_Annotations($key); foreach my $value (@values) { print $value->as_text,"\n"; } Note the loop change; Bio::Annotation are no longer operator overloaded so your print statement wouldn't work in a bioperl 1.6 world. chris On Nov 19, 2007, at 12:11 AM, neeti somaiya wrote: > Hi, > > I am using Bio::SeqIO for parsing KEGG gene ent files. > > A part of my code is > > foreach my $key ( $ac->get_all_annotation_keys() ) > { > if($key eq "dblink") > { > my %values = > $ac->get_Annotations($key); > foreach my $value ( > keys(%values )) > { > print > "\n*****VALUE > $value*****\n"; > } > } > } > > Here not all dblinks present in the actual file get parsed. For eg, > in the > data below, > ENTRY 116064 CDS H.sapiens > NAME LRRC58 > DEFINITION leucine rich repeat containing 58 > POSITION 3q13.33 > MOTIF Pfam: SdiA-regulated LRR_1 > PROSITE: LEU_RICH > DBLINKS NCBI-GI: 153792305 > NCBI-GeneID: 116064 > HGNC: 26968 > Ensembl: ENSG00000163428 > UniProt: Q96CX6 > > Here, the dblink parsing gives me NCBI-GeneID, Ensembl, Pfam and > PROSITE, > but doesnt give me HGNC and UniProt. For other entries it gives me > other > combinations of dbs. > > Can anyone help me with this. Why is this happenning? I have no clue. > > Thanks and Regards, > Neeti. > -- > -Neeti > Even my blood says, B positive > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From akarger at CGR.Harvard.edu Mon Nov 19 15:38:26 2007 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 19 Nov 2007 10:38:26 -0500 Subject: [Bioperl-l] What does Expect(2) mean in a blast result? In-Reply-To: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> References: <3D48EDAE-A4CC-494A-9D14-484EC4AA843D@uiuc.edu> Message-ID: > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 13, 2007 12:42 PM > To: Amir Karger > Cc: Steve Chervitz; Dave Messina; bioperl-l > Subject: Re: [Bioperl-l] What does Expect(2) mean in a blast result? > > Amir, > > Can you file this as a bug? Done. http://bugzilla.open-bio.org/show_bug.cgi?id=2399 > Dave mentioned he would look > into it but > I think it warrants tracking to make sure it gets fixed: > > http://www.bioperl.org/wiki/Bugs > > Attach the example BLAST report from your last post to the report. > BTW, I wonder how this appears in XML output? > > chris > > On Nov 13, 2007, at 11:30 AM, Amir Karger wrote: > > >> From: trutane at gmail.com [mailto:trutane at gmail.com] On Behalf > >> Of Steve Chervitz > >> > >> The Bioperl blast parser should extract that value and you > can obtain > >> it from an HSP object, via the HSPI::n() method, documented here: > >> > >> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B > > io/Search/HSP/HSPI.html#POD23 > > > > As I mentioned in my email: > > > > And does anyone know off-hand if Bioperl will tell me when > situations > > like this happen? I thought the Bio::Search::HSP::BlastHSP::n > > subroutine > > would help, but I just get a bunch of empty strings for that, > > whether or > > not there's a (2) in the Expect string. (hsp->n is empty, hsp-> > > {"_n"} is > > undef.) > > > > And the docs for n() actually say, "This value is not defined with > > NCBI > > Blast2 with gapping" although they don't say why. Which may > explain > > why, > > when I ran the following code on the blast result I included in my > > last > > email, I got empty values for all of the n's. (Why is n() > undefined > > for > > gapped blast if I'm getting n's in my results from that blast?) > > > > use warnings; > > use strict; > > use Bio::SearchIO; > > > > my $blast_out = $ARGV[0]; > > my $in = new Bio::SearchIO(-format => 'blast', > > -file => $blast_out, > > -report_type => 'tblastn'); > > > > print join("\t", qw(Qname Qstart Qend Strand Sname Sstart > Send Frame N > > Evalue)), "\n"; > > while(my $query = $in->next_result) { > > while(my $subject = $query->next_hit) { > > while (my $hsp = $subject->next_hsp) { > > print join("\t", > > $query->query_name, > > $hsp->start("query"), > > $hsp->end("query"), > > $hsp->strand("hit"), > > $subject->name, > > $hsp->start("hit"), > > $hsp->end("hit"), > > $subject->frame, > > $hsp->n, > > $hsp->evalue, > > ),"\n"; > > } > > } > > } > > > >> Dave's basically correct in his explanation. It's a result of the > >> application of sum statistics by the blast algorithm. You > can read > >> all > >> about it in Korf et al's BLAST book. Here's the relevant section: > > > > [snip] > > > > Thanks, > > > > -Amir > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From aaron.j.mackey at gsk.com Mon Nov 19 16:50:53 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Mon, 19 Nov 2007 11:50:53 -0500 Subject: [Bioperl-l] What's the best way to produce gff files from genebank/embl formats? In-Reply-To: Message-ID: While Lucia's subject line asked for genbank2gff, her message actually asked the reverse (gff + fasta -> genbank). e.g. pretend you had to prepare a genome annotation for submission to GenBank ... and no, I don't know of any generalized gff2genbank script out there ... Lucia, the SeqIO::genbank module will write GenBank format, but you have to get all the bits and bobs together in the right way, i.e. construct the various AnnotationCollections and SeqFeatures (with SplitLocations for exons, CDS, etc.) that a GenBank record expects. One way to do this is to start with a template GenBank file that you'd like to mimic, strip it down to only two gene models, use SeqIO::genbank to read it into memory, and then step through the object with the Perl debugger to see how it is composed. -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM: > Chris, > > There's also a genbank2gff3.PLS script in the GMOD package ( > http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS? > revision=1.5&view=markup). However, it has not been modified for a couple of > years, it may not be the "preferred" script. > > See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and > http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more information > on using Bioperl's bp_genbank2gff3 script. > > Brian O. > > > On 11/15/07 1:43 PM, "Chris Fields" wrote: > > > There are currently many ways to get what you want, but not all are > > consistent (particularly re: GFF3). We are aiming for more > > consistent, compliant GFF/GTF output in the next developer series > > (1.7) of Bioperl. > > > > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the > > scripts directory); these are probably the most common way when > > working directly from a seq record. Bio::Tools::GFF is the most > > commonly used class though I'm unsure of it's status for GFF3 > > output. From within a Bio::SeqI you can call write_gff() (currently > > not very flexible) or from the SeqFeature itself gff_string(). > > Bio::Graphics::Feature has the additional method gff3_string(). > > Bio::FeatureIO is also an option, though I would consider it very > > experimental (it will likely undergo significant revision in the next > > bioperl dev series). > > > > Any others anyone can think of, maybe non-BioPerl related as well? > > > > chris > > > > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote: > > > >> Hi > >> I was asked this question recently > >> and it occurred to me I must be doing things inefficiently > >> To produce gff file I was using SeqIO to parse the required fields, > >> then > >> according to the conventions just printing out whatever was > >> required tab > >> delimited, which is easy > >> > >> but if I wanted to generate a genbank file, extracting features > >> from a gff file > >> and a plain fasta file it was more complicated > >> is there support for gff in bioperl now? > >> anyone can contribute with smart way to go from/to gff, genebank > >> and embl? > >> > >> thanks very much > >> > >> Lucia Peixoto > >> Department of Biology,SAS > >> University of Pennsylvania > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From johnston at biochem.ucl.ac.uk Mon Nov 19 14:46:03 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Mon, 19 Nov 2007 14:46:03 +0000 (GMT) Subject: [Bioperl-l] blast database names In-Reply-To: References: Message-ID: On Mon, 19 Nov 2007, Chris Fields wrote: > Here's a recent list (don't know if it's up-to-date): > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html Thanks. Perhaps I missed something in the docs, but I don't think I've quite understood how this is supposed to work. I'm trying to blast primer sequences against the ref genome sequence. Should I be using ref_contig? How can I limit the blast to a single species? cheers, Cass. From Kevin.M.Brown at asu.edu Mon Nov 19 18:31:38 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 19 Nov 2007 11:31:38 -0700 Subject: [Bioperl-l] pSW vs dpAlign Message-ID: <1A4207F8295607498283FE9E93B775B404042E1D@EX02.asurite.ad.asu.edu> I was able to get the Ext package installed, just had to copy the Align.pm file up one directory from where it was being put by the installer. Now I have a technician trying to use pSW (Bio::Tools::pSW) and it appears to have been last updated back in '99 and seems to lack certain methods to get things out of the alignment like the score. The test.pl script that Bio::Ext comes with actually uses Bio::Tools::dpAlign. Is dpAlign the replacement for pSW? From bernd.web at gmail.com Wed Nov 21 16:42:40 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 17:42:40 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <47020DC9.8040401@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> Message-ID: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Hi Russell, I came across your question. At first I thought all was well on my system, but indeed I also have these colouring problems. I noted that scrore in the bgcolor callback gets a different value! Printing score during hit parsing($hit->raw_score) gives the same score as -description my $score = $feature->score; However, printing score in the bgcolor sub gives 2573! All scores in the bgcolor routine all different and higher than the real scores. Were you able to solve this colouring issue? Regards, Bernd > Hi all, > I'm using a modified version of Lincoln's tutorial > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > to give a similar image to that from NCBI but for some reason, my > colours are coming out wrong (see attached example) > They seem to be off by one but I can't see why. > > Any ideas? > > I can't be certain but I think it's only started doing this since our > BLAST upgrade to 2.2.17 a few weeks ago. > > Here's the colouring code: > ------------------------------------------------------------------------ > ------- > my $track = $panel->add_track( > -glyph => 'segments', > -label => 1, > -connector => 'dashed', > -bgcolor => sub { > my $feature = shift; > my $score = $feature->score; > return 'red' if $score >= 200; > return 'fuchsia' if $score >= 80; > return 'lime' if $score >= 50; > return 'blue' if $score >= 40; > return 'black'; > }, > -font2color => 'gray', > -sort_order => 'high_score', > -description => sub { > my $feature = shift; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my $score = $feature->score; > "$description, score=$score"; > }, > ); > ------------------------------------------------------------------------ > --------- > > > Thanx, > > Russell Smithies > > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Wed Nov 21 17:38:30 2007 From: bernd.web at gmail.com (Bernd Web) Date: Wed, 21 Nov 2007 18:38:30 +0100 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> References: <4701AEE6.6070506@web.de> <470215E1.4080901@sheffield.ac.uk> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> Message-ID: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Hi, I now found that bgcolor is using a $feature->score that is coming directly from the blast report, it is not the bit score. -bgcolor => sub {my $feature = shift; my $score = $feature->score; print "$score\n"; } always print the score, even if the score is not set in the Bio::SeqFeature::Generic object. -description callbacks are somehow using the score from the SeqFeature object. Does anyone have an idea why? Further is is possible to get the raw_score of a hit. $hit->raw_score actually gets the bitscore (w/o decimal point). Bernd On Nov 21, 2007 5:42 PM, Bernd Web wrote: > Hi Russell, > > I came across your question. At first I thought all was well on my > system, but indeed I also have these colouring problems. > I noted that scrore in the bgcolor callback gets a different value! > Printing score during hit parsing($hit->raw_score) gives the same > score as -description > my $score = $feature->score; However, printing score in the bgcolor > sub gives 2573! > All scores in the bgcolor routine all different and higher than the > real scores. Were you able to solve this colouring issue? > > Regards, > Bernd > > > > Hi all, > > I'm using a modified version of Lincoln's tutorial > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > to give a similar image to that from NCBI but for some reason, my > > colours are coming out wrong (see attached example) > > They seem to be off by one but I can't see why. > > > > Any ideas? > > > > I can't be certain but I think it's only started doing this since our > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > Here's the colouring code: > > ------------------------------------------------------------------------ > > ------- > > my $track = $panel->add_track( > > -glyph => 'segments', > > -label => 1, > > -connector => 'dashed', > > -bgcolor => sub { > > my $feature = shift; > > my $score = $feature->score; > > return 'red' if $score >= 200; > > return 'fuchsia' if $score >= 80; > > return 'lime' if $score >= 50; > > return 'blue' if $score >= 40; > > return 'black'; > > }, > > -font2color => 'gray', > > -sort_order => 'high_score', > > -description => sub { > > my $feature = shift; > > return unless > > $feature->has_tag('description'); > > my ($description) = > > $feature->each_tag_value('description'); > > my $score = $feature->score; > > "$description, score=$score"; > > }, > > ); > > ------------------------------------------------------------------------ > > --------- > > > > > > Thanx, > > > > Russell Smithies > > > > > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From sac at bioperl.org Wed Nov 21 18:43:54 2007 From: sac at bioperl.org (Steve Chervitz) Date: Wed, 21 Nov 2007 10:43:54 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> Message-ID: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> On Nov 21, 2007 9:38 AM, Bernd Web wrote: > [snip] > > Further is is possible to get the raw_score of a hit. $hit->raw_score > actually gets the bitscore (w/o decimal point). Hmmm. raw_score should not be the same as bit score. So given an example blast hit line such as: Score = 60.0 bits (30), Expect = 1e-06 $hit->raw_score() should return 30, not 60, as you seem to be getting. Could you submit a bug report for this? http://www.bioperl.org/wiki/Bugs Thanks, Steve > > On Nov 21, 2007 5:42 PM, Bernd Web wrote: > > Hi Russell, > > > > I came across your question. At first I thought all was well on my > > system, but indeed I also have these colouring problems. > > I noted that scrore in the bgcolor callback gets a different value! > > Printing score during hit parsing($hit->raw_score) gives the same > > score as -description > > my $score = $feature->score; However, printing score in the bgcolor > > sub gives 2573! > > All scores in the bgcolor routine all different and higher than the > > real scores. Were you able to solve this colouring issue? > > > > Regards, > > Bernd > > > > > > > Hi all, > > > I'm using a modified version of Lincoln's tutorial > > > (http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output) > > > and I'm colouring the HSPs by setting the -bgcolor by score with a sub > > > to give a similar image to that from NCBI but for some reason, my > > > colours are coming out wrong (see attached example) > > > They seem to be off by one but I can't see why. > > > > > > Any ideas? > > > > > > I can't be certain but I think it's only started doing this since our > > > BLAST upgrade to 2.2.17 a few weeks ago. > > > > > > Here's the colouring code: > > > ------------------------------------------------------------------------ > > > ------- > > > my $track = $panel->add_track( > > > -glyph => 'segments', > > > -label => 1, > > > -connector => 'dashed', > > > -bgcolor => sub { > > > my $feature = shift; > > > my $score = $feature->score; > > > return 'red' if $score >= 200; > > > return 'fuchsia' if $score >= 80; > > > return 'lime' if $score >= 50; > > > return 'blue' if $score >= 40; > > > return 'black'; > > > }, > > > -font2color => 'gray', > > > -sort_order => 'high_score', > > > -description => sub { > > > my $feature = shift; > > > return unless > > > $feature->has_tag('description'); > > > my ($description) = > > > $feature->each_tag_value('description'); > > > my $score = $feature->score; > > > "$description, score=$score"; > > > }, > > > ); > > > ------------------------------------------------------------------------ > > > --------- > > > > > > > > > Thanx, > > > > > > Russell Smithies > > > > > > > > > > > > > > > ======================================================================= > > > Attention: The information contained in this message and/or attachments > > > from AgResearch Limited is intended only for the persons or entities > > > to which it is addressed and may contain confidential and/or privileged > > > material. Any review, retransmission, dissemination or other use of, or > > > taking of any action in reliance upon, this information by persons or > > > entities other than the intended recipients is prohibited by AgResearch > > > Limited. If you have received this message in error, please notify the > > > sender immediately. > > > ======================================================================= > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From binkley at genome.stanford.edu Thu Nov 22 00:35:02 2007 From: binkley at genome.stanford.edu (Jonathan Binkley) Date: Wed, 21 Nov 2007 16:35:02 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac Message-ID: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Hi, I installed bioperl on a Mac (OS 10.4, Intel) via fink, which put it here: /sw/lib/perl5/5.8.6/Bio/ It seems to work fine, but I need bioperl-ext for Smith-Waterman alignments. So, into which directory should I download bioperl-ext and run the Makefile? Thanks. From dcj at sanger.ac.uk Thu Nov 22 14:47:09 2007 From: dcj at sanger.ac.uk (Daniel Jeffares) Date: Thu, 22 Nov 2007 14:47:09 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml Message-ID: Hi all, Bio::Tools::Run::Phylo::PAML::Baseml from bioperl-run 1.5.2 seems to be a little 'broken', at least in my hands. First, $bml->set_parameter('runmode', 0); does not work (sets runmode to -2). setting runmode to 1 is OK. Also, $bml->no_param_checks(1); doesn't seem to work. The result is that the baseml.ctl file created under /tmp is not runnable by baseml with runmode 0. The phylip file created is run OK by baeml(with another .ctl file). My script & baseml.ctl below. Hope it can be fixed, cheers, Dan #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; my $alignio = Bio::AlignIO->new(-format => 'phylip',-file => 'test.phy'); my $aln = $alignio->next_aln; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; The baseml.ctl file produced: seqfile = /tmp/mtV8uuwTGW/FPS5kwtXSA outfile = mlb fix_rho = 1 verbose = 0 noisy = 0 RateAncestor = 1 kappa = 2.5 model = 0 ndata = 5 Small_Diff = 1e-6 runmode = -2 alpha = 0 fix_kappa = 0 rho = 0 nhomo = 0 getSE = 0 cleandata = 1 fix_alpha = 1 clock = 0 Malpha = 0 ncatG = 5 fix_blength = -1 nparK = 0 Regards, Daniel Jeffares ______________________________ Population and Comparative Genomics Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK Phone: +44(0)1223 834244 x 7297 Fax: +44 (0)1223 494919 www.sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Thu Nov 22 16:06:16 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 22 Nov 2007 17:06:16 +0100 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: References: Message-ID: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Daniel, I don't have bioperl-run or PAML installed on my system to test it myself, but have you tried the latest version of bioperl-run from CVS? It looks like that code has been worked on since 1.5.2 was released. If that still doesn't work, could you file this as a bug to make sure it gets followed up? Dave You can grab the tarball here: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl and if necessary file the bug here: BioPerl Bugzilla tracking system From arareko at campus.iztacala.unam.mx Thu Nov 22 16:37:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 22 Nov 2007 10:37:24 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> Message-ID: <4745B044.5090102@campus.iztacala.unam.mx> Hi Peter, In BioPerl, there's no such mapping for db_xref's that I'm aware of. Each parser handles db_xref records on its own. Take a look at the Bio::SeqIO::genbank code, inside the next_seq() method for example: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup Regards, Mauricio. Peter wrote: > Dear all, > > I'm one of the Biopython developers. I've recently got going with > BioSQL and have been getting to grips with the Biopython BioSQL > interface. I'm aware that we need to try and be consistent with > BioPerl and BioJava, so I'd like to pose my first question related to > that. > > When loading GenBank records, many features have db_xref qualifiers, > e.g. from a random CDS feature in E. coli K12: > > /db_xref="ASAP:1309" > /db_xref="GI:16128366" > /db_xref="ECOCYC:EG10213" > /db_xref="GeneID:945313" > > Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", > "GeneID" before using recording these entries in the seqfeature_dbxref > and dbxref tables. For example, "GI" becomes "GeneIndex". > Biopython's current mapping is as follows: > > # Dictionary of database types, keyed by GenBank db_xref abbreviation > db_dict = {'GeneID': 'Entrez', > 'GI': 'GeneIndex', > 'COG': 'COG', > 'CDD': 'CDD', > 'DDBJ': 'DNA Databank of Japan', > 'Entrez': 'Entrez', > 'GeneIndex': 'GeneIndex', > 'PUBMED': 'PubMed', > 'taxon': 'Taxon', > 'ATCC': 'ATCC', > 'ISFinder': 'ISFinder', > 'GOA': 'Gene Ontology Annotation', > 'ASAP': 'ASAP', > 'PSEUDO': 'PSEUDO', > 'InterPro': 'InterPro', > 'GEO': 'Gene Expression Omnibus', > 'EMBL': 'EMBL', > 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', > 'ECOCYC': 'EcoCyc', > 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' > } > > In my testing, I've found several GenBank db_xref abbreviation for > which we don't have a mapping defined, such as "LocusID", "dbSNP", > "MGD", "MIM", or from an EMBL file, "REMTREMBL". > > I'd like to know if BioPerl and/or BioJava and/or BioRuby define a > similar mapping in their BioSQL code (or GenBank parser), so that > Biopython can follow your example. > > Thank you, > > Peter > > P.S. See also Biopython bug 2405 > http://bugzilla.open-bio.org/show_bug.cgi?id=2405 > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From avilella at gmail.com Thu Nov 22 21:55:10 2007 From: avilella at gmail.com (Albert Vilella) Date: Thu, 22 Nov 2007 21:55:10 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign Message-ID: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Hi, Am I right in thinking that the '_symbols' hash in SimpleAlign is only used if one calls the symbol_chars method? When I comment out this line: map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if $seq->seq; # line 257 I get a nice speed boost on loading alignments. Can I comment this line out in the CVS HEAD? Cheers, Albert. [init] 5.96046447753906e-06 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.0022270679473877 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 2.14348912239075 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 6.91910791397095 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 15.8402290344238 secs... avilella at magneto:~$ perl /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ancestral_alleles.pl -dir /home/avilella/ensembl/exoseq/test -verbose [init] 1.21593475341797e-05 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162399.chr1.fasta] 0.00294303894042969 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000158022.chr1.fasta] 0.510555982589722 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000162585.chr1.fasta] 1.6192569732666 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000121957.chr1.fasta] 3.86473417282104 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000203717.chr1.fasta] 6.99602198600769 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000196188.chr1.fasta] 7.26704716682434 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000025800.chr1.fasta] 8.44332504272461 secs... [loading aln /home/avilella/ensembl/exoseq/test/ENSG00000117475.chr1.fasta] 12.103296995163 secs... From cjfields at uiuc.edu Fri Nov 23 00:30:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:30:51 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <99440C6C-74C1-4DCC-8C7D-EAABB7CA6B91@uiuc.edu> How are tests affected? It might be worth going through the revision history to see if there was a specific reason this was implemented, but if it passes tests I don't see why we need it. chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Nov 23 00:42:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:42:12 -0600 Subject: [Bioperl-l] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table In-Reply-To: <4745B044.5090102@campus.iztacala.unam.mx> References: <320fb6e00711201136i6b3ca41eo8f6718e98f79c531@mail.gmail.com> <4745B044.5090102@campus.iztacala.unam.mx> Message-ID: <47D0EC6F-C34A-4AA8-97EE-478F2A5ADF62@uiuc.edu> I think SeqIO checks the name for parsing reasons only, in cases where the format changes based on the source (such as GenPept DBSOURCE data). I don't think we go beyond that in Bioperl, probably b/c modifying or expanding names for data persistence would lead to volatile coding issues (i.e. consistency between parsers, constant updating to cover new crossrefs, etc). I would definitely suggest retaining the original DB as it appears in the dbxref for consistency/sanity; if needed return expanded names using a different method if they are designated. chris On Nov 22, 2007, at 10:37 AM, Mauricio Herrera Cuadra wrote: > Hi Peter, > > In BioPerl, there's no such mapping for db_xref's that I'm aware of. > Each parser handles db_xref records on its own. Take a look at the > Bio::SeqIO::genbank code, inside the next_seq() method for example: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup > > Regards, > Mauricio. > > Peter wrote: >> Dear all, >> >> I'm one of the Biopython developers. I've recently got going with >> BioSQL and have been getting to grips with the Biopython BioSQL >> interface. I'm aware that we need to try and be consistent with >> BioPerl and BioJava, so I'd like to pose my first question related to >> that. >> >> When loading GenBank records, many features have db_xref qualifiers, >> e.g. from a random CDS feature in E. coli K12: >> >> /db_xref="ASAP:1309" >> /db_xref="GI:16128366" >> /db_xref="ECOCYC:EG10213" >> /db_xref="GeneID:945313" >> >> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC", >> "GeneID" before using recording these entries in the >> seqfeature_dbxref >> and dbxref tables. For example, "GI" becomes "GeneIndex". >> Biopython's current mapping is as follows: >> >> # Dictionary of database types, keyed by GenBank db_xref abbreviation >> db_dict = {'GeneID': 'Entrez', >> 'GI': 'GeneIndex', >> 'COG': 'COG', >> 'CDD': 'CDD', >> 'DDBJ': 'DNA Databank of Japan', >> 'Entrez': 'Entrez', >> 'GeneIndex': 'GeneIndex', >> 'PUBMED': 'PubMed', >> 'taxon': 'Taxon', >> 'ATCC': 'ATCC', >> 'ISFinder': 'ISFinder', >> 'GOA': 'Gene Ontology Annotation', >> 'ASAP': 'ASAP', >> 'PSEUDO': 'PSEUDO', >> 'InterPro': 'InterPro', >> 'GEO': 'Gene Expression Omnibus', >> 'EMBL': 'EMBL', >> 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot', >> 'ECOCYC': 'EcoCyc', >> 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL' >> } >> >> In my testing, I've found several GenBank db_xref abbreviation for >> which we don't have a mapping defined, such as "LocusID", "dbSNP", >> "MGD", "MIM", or from an EMBL file, "REMTREMBL". >> >> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a >> similar mapping in their BioSQL code (or GenBank parser), so that >> Biopython can follow your example. >> >> Thank you, >> >> Peter >> >> P.S. See also Biopython bug 2405 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2405 >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l >> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Nov 23 00:49:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 22 Nov 2007 18:49:15 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> Message-ID: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Albert, Found it: http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ SimpleAlign.pm.diff?r1=1.36&r2=1.37 If it slows performance that dramatically, maybe we can move this to a separate AlignUtils method instead. Maybe something to ask Jason about? chris On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > Hi, > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > used if one calls the symbol_chars method? > > When I comment out this line: > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > $seq->seq; # line 257 > > I get a nice speed boost on loading alignments. > > Can I comment this line out in the CVS HEAD? > > Cheers, > > Albert. > > [init] 5.96046447753906e-06 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.0022270679473877 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 2.14348912239075 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 6.91910791397095 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 15.8402290344238 secs... > > avilella at magneto:~$ perl > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > ancestral_alleles.pl > -dir /home/avilella/ensembl/exoseq/test -verbose > [init] 1.21593475341797e-05 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162399.chr1.fasta] > 0.00294303894042969 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000158022.chr1.fasta] > 0.510555982589722 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000162585.chr1.fasta] > 1.6192569732666 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000121957.chr1.fasta] > 3.86473417282104 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000203717.chr1.fasta] > 6.99602198600769 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000196188.chr1.fasta] > 7.26704716682434 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000025800.chr1.fasta] > 8.44332504272461 secs... > [loading aln /home/avilella/ensembl/exoseq/test/ > ENSG00000117475.chr1.fasta] > 12.103296995163 secs... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 23 12:29:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 23 Nov 2007 12:29:37 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> Message-ID: <4746C7B1.1010002@sendu.me.uk> Dave Messina wrote: > Daniel, > > I don't have bioperl-run or PAML installed on my system to test it myself, > but have you tried the latest version of bioperl-run from CVS? It looks like > that code has been worked on since 1.5.2 was released. Yes, I fixed it in CVS so it should at least /run/. I don't know about the parsing side of things, though that may also have been fixed recently by someone else. From avilella at gmail.com Fri Nov 23 13:08:59 2007 From: avilella at gmail.com (Albert Vilella) Date: Fri, 23 Nov 2007 13:08:59 +0000 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <4746C7B1.1010002@sendu.me.uk> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> Message-ID: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Just to mention that the new paml4 has a "basemlg" instead of a "baseml" binary. AFAIK, Jason fixed codeml to make it work both for paml3.xx a paml4, but I am not sure about baseml. Also, I think if you set runmode 0, you have to provide a tree: #!/usr/bin/perl use Bio::Tools::Run::Phylo::PAML::Baseml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.phy'); my $treeio = Bio::TreeIO->new(-format => 'newick', -file => '/home/avilella/bioperl/vanilla/bioperl-run/scripts/test.tree'); my $aln = $alignio->next_aln; my $tree = $treeio->next_tree; my $bml = Bio::Tools::Run::Phylo::PAML::Baseml->new(); $bml->alignment($aln); $bml->tree($tree); $bml->executable("/home/avilella/9_opl/paml/paml3.14/src/baseml"); $bml->save_tempfiles(1); my $tempdir = $bml->tempdir(); #set the runmode to zero $bml->set_parameter('runmode', 0); my ($rc,$parser) = $bml->run(); system "more $tempdir/baseml.ctl"; while ( my $result = $parser->next_result ) { my @otus = $result->get_seqs(); my $MLmatrix = $result->get_MLmatrix(); $DB::single=1;1; # 0 and 1 correspond to the 1st and 2nd entry in the @otus array } exit; 4 50 Homo_sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC Pan_panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC Gorilla_go AGUCGCGUCG--GCAGAUACGCAUCACGGAC- Pongo_pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC ACAUUUU-CCUUGCAAAG ACAUCAU-CCUUGCAAAG ACAUCAUCCCUCGCAGAG ACAUCAUCCCUUGCAGAG (((Homo_sapie,Pan_panisc),Gorilla_go),Pongo_pigm); On Nov 23, 2007 12:29 PM, Sendu Bala wrote: > Dave Messina wrote: > > Daniel, > > > > I don't have bioperl-run or PAML installed on my system to test it myself, > > but have you tried the latest version of bioperl-run from CVS? It looks like > > that code has been worked on since 1.5.2 was released. > > Yes, I fixed it in CVS so it should at least /run/. I don't know about > the parsing side of things, though that may also have been fixed > recently by someone else. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Fri Nov 23 16:24:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 10:24:59 -0600 Subject: [Bioperl-l] Porblems with Bio::Tools::Run::Phylo::PAML::Baseml In-Reply-To: <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> References: <628aabb70711220806l6dc28336ud668b525e982a674@mail.gmail.com> <4746C7B1.1010002@sendu.me.uk> <358f4d650711230508j4cb58279n98fb0e5dc2563f71@mail.gmail.com> Message-ID: <6D4B909E-4B4E-45D4-B9BA-F99431B0EC65@uiuc.edu> I have both 'baseml' and 'basemlg' with paml4 on Mac OS X (not just 'basemlg'), so it would need to work with both. Do we want to put a PAML parser/wrapper overhaul on the TODO list for 1.6? chris On Nov 23, 2007, at 7:08 AM, Albert Vilella wrote: > Just to mention that the new paml4 has a "basemlg" instead of a > "baseml" binary. AFAIK, Jason fixed codeml to make it work both for > paml3.xx a paml4, but I am not sure about baseml. ... From arvindvanam at gmail.com Fri Nov 23 21:26:06 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 13:26:06 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl Message-ID: <13918981.post@talk.nabble.com> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); my $rnafold = $factory->program('rnafold'); my $job=$rnafold->run(-rnafold => 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); I installed Vienna package and then i tried using Pise to create an object for the program but its giving the following error ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Bio::Tools::Run::PiseJob terminated: URL missing STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::PiseJob::terminated /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 STACK: Bio::Tools::Run::PiseApplication::submit /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 STACK: Bio::Tools::Run::PiseApplication::run /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 STACK: evaluate.pl:12 how to make the program RNAfold run in perl... IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? plz reply soon -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13918981 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Fri Nov 23 22:49:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Nov 2007 16:49:43 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13918981.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> Message-ID: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> The Pise wrappers run the programs remotely; see Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ mfold wrappers but haven't done so yet. The Vienna tools do have a Perl-based (non-BioPerl-based) module included which uses libRNA, and is well worth a look. Try 'perldoc RNA' if you have installed the tools locally, or look here for other Perl-based tools: http://www.tbi.univie.ac.at/~ivo/RNA/utils.html chris On Nov 23, 2007, at 3:26 PM, vanam wrote: > > how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? > > my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); > my $rnafold = $factory->program('rnafold'); > my $job=$rnafold->run(-rnafold => > 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); > > I installed Vienna package and then i tried using Pise to create an > object > for the program but its giving the following error > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Bio::Tools::Run::PiseJob terminated: URL missing > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::PiseJob::terminated > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 > STACK: Bio::Tools::Run::PiseApplication::submit > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 > STACK: Bio::Tools::Run::PiseApplication::run > /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 > STACK: evaluate.pl:12 > > > how to make the program RNAfold run in perl... > IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? > > plz reply soon > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13918981 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Sat Nov 24 07:29:11 2007 From: arvindvanam at gmail.com (vanam) Date: Fri, 23 Nov 2007 23:29:11 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> Message-ID: <13922740.post@talk.nabble.com> i have seen the documentation for Bio::Tools::Run::AnalysisFactory::Pise and i used it exactly as it was mentioned in it. i just want that instead of running its perl version "RNAfold.pl" I can use the functions associated with RNAfold with a perl program without having to call the program using system() command. if you can just tell me how to use these wrapper modules it would b of gr8 help...like while using clustalw or clustalx we define the environment variable for it ..do we have to do the same for RNAfold or Mfold Chris Fields wrote: > > The Pise wrappers run the programs remotely; see > Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a > local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ > mfold wrappers but haven't done so yet. The Vienna tools do have a > Perl-based (non-BioPerl-based) module included which uses libRNA, and > is well worth a look. Try 'perldoc RNA' if you have installed the > tools locally, or look here for other Perl-based tools: > > http://www.tbi.univie.ac.at/~ivo/RNA/utils.html > > chris > > On Nov 23, 2007, at 3:26 PM, vanam wrote: > >> >> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >> >> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >> my $rnafold = $factory->program('rnafold'); >> my $job=$rnafold->run(-rnafold => >> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >> >> I installed Vienna package and then i tried using Pise to create an >> object >> for the program but its giving the following error >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::PiseJob::terminated >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >> STACK: Bio::Tools::Run::PiseApplication::submit >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >> STACK: Bio::Tools::Run::PiseApplication::run >> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >> STACK: evaluate.pl:12 >> >> >> how to make the program RNAfold run in perl... >> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >> >> plz reply soon >> -- >> View this message in context: http://www.nabble.com/run-RNAfold-in- >> perl-tf4863835.html#a13918981 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13922740 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From avilella at gmail.com Sun Nov 25 11:50:42 2007 From: avilella at gmail.com (Albert Vilella) Date: Sun, 25 Nov 2007 11:50:42 +0000 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> Message-ID: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> cvs commited now. it is calculated anyway when calling symbol_chars so... On Nov 23, 2007 12:49 AM, Chris Fields wrote: > Albert, > > Found it: > > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > SimpleAlign.pm.diff?r1=1.36&r2=1.37 > > If it slows performance that dramatically, maybe we can move this to > a separate AlignUtils method instead. Maybe something to ask Jason > about? > > chris > > On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > > > > Hi, > > > > Am I right in thinking that the '_symbols' hash in SimpleAlign is only > > used if one calls the symbol_chars method? > > > > When I comment out this line: > > > > map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > > $seq->seq; # line 257 > > > > I get a nice speed boost on loading alignments. > > > > Can I comment this line out in the CVS HEAD? > > > > Cheers, > > > > Albert. > > > > [init] 5.96046447753906e-06 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.0022270679473877 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 2.14348912239075 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 6.91910791397095 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 15.8402290344238 secs... > > > > avilella at magneto:~$ perl > > /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > > ancestral_alleles.pl > > -dir /home/avilella/ensembl/exoseq/test -verbose > > [init] 1.21593475341797e-05 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162399.chr1.fasta] > > 0.00294303894042969 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000158022.chr1.fasta] > > 0.510555982589722 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000162585.chr1.fasta] > > 1.6192569732666 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000121957.chr1.fasta] > > 3.86473417282104 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000203717.chr1.fasta] > > 6.99602198600769 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000196188.chr1.fasta] > > 7.26704716682434 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000025800.chr1.fasta] > > 8.44332504272461 secs... > > [loading aln /home/avilella/ensembl/exoseq/test/ > > ENSG00000117475.chr1.fasta] > > 12.103296995163 secs... > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From cjfields at uiuc.edu Sun Nov 25 15:05:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:05:27 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13922740.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> Message-ID: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Again, these wrappers are for submitting data to a Pise server for the corresponding programs (run on a remote server). There are no wrappers for running RNAfold on your computer (i.e. locally), with or w/o a set env. variable. You can try instaling Pise locally and setting the location() as shown in POD to localhost, however I don't know how stable these modules are with newer versions of Pise. These haven't been updated in a few years, apart from getting tests to work. Another option is installing EMBOSS along with the EMBASSY version of RNAFold; this could conceivably be run through Bio::Factory::EMBOSS. chris On Nov 24, 2007, at 1:29 AM, vanam wrote: > > i have seen the documentation for > Bio::Tools::Run::AnalysisFactory::Pise and > i used it exactly as it was mentioned in it. > > i just want that instead of running its perl version "RNAfold.pl" I > can use > the functions associated with RNAfold with a perl program without > having to > call the program using system() command. > > if you can just tell me how to use these wrapper modules it would b > of gr8 > help...like while using clustalw or clustalx we define the environment > variable for it ..do we have to do the same for RNAfold or Mfold > > > > > Chris Fields wrote: >> >> The Pise wrappers run the programs remotely; see >> Bio::Tools::Run::AnalysisFactory::Pise on how to run it. As for a >> local RNAfold wrapper, I had planned on making Bioperl-based Vienna/ >> mfold wrappers but haven't done so yet. The Vienna tools do have a >> Perl-based (non-BioPerl-based) module included which uses libRNA, and >> is well worth a look. Try 'perldoc RNA' if you have installed the >> tools locally, or look here for other Perl-based tools: >> >> http://www.tbi.univie.ac.at/~ivo/RNA/utils.html >> >> chris >> >> On Nov 23, 2007, at 3:26 PM, vanam wrote: >> >>> >>> how to run RNAfold using Bio::Tools::Run::AnalysisFactory::Pise????? >>> >>> my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new(); >>> my $rnafold = $factory->program('rnafold'); >>> my $job=$rnafold->run(-rnafold => >>> 'UUUGACGACAGACGACUCAAUGUCAGCUAGCUAGUACGAUCGAUC'); >>> >>> I installed Vienna package and then i tried using Pise to create an >>> object >>> for the program but its giving the following error >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Bio::Tools::Run::PiseJob terminated: URL missing >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw >>> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:359 >>> STACK: Bio::Tools::Run::PiseJob::terminated >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseJob.pm:460 >>> STACK: Bio::Tools::Run::PiseApplication::submit >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:416 >>> STACK: Bio::Tools::Run::PiseApplication::run >>> /usr/local/share/perl/5.8.8/Bio/Tools/Run/PiseApplication.pm:352 >>> STACK: evaluate.pl:12 >>> >>> >>> how to make the program RNAfold run in perl... >>> IS THERE ANY NEED TO SPECIFY WHERE MY rnafold program is??? >>> >>> plz reply soon >>> -- >>> View this message in context: http://www.nabble.com/run-RNAfold-in- >>> perl-tf4863835.html#a13918981 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/run-RNAfold-in- > perl-tf4863835.html#a13922740 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 15:38:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 09:38:40 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: Albert, I was getting a single AlignIO.t fail which appeared to be related to this: ... ok 122 - The object isa Bio::Align::AlignI ok 123 - consensus_string on metafasta not ok 124 - symbol_chars() using metafasta # Failed test 'symbol_chars() using metafasta' # in t/AlignIO.t at line 346. # got: '0' # expected: '23' It was b/c the symbol hash was initialized in the constructor (so it was present, just empty). I have changed that in CVS; all tests pass now. chris On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > cvs commited now. it is calculated anyway when calling symbol_chars > so... > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: >> Albert, >> >> Found it: >> >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >> Bio/ >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >> >> If it slows performance that dramatically, maybe we can move this to >> a separate AlignUtils method instead. Maybe something to ask Jason >> about? >> >> chris >> >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >> >> >>> Hi, >>> >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>> only >>> used if one calls the symbol_chars method? >>> >>> When I comment out this line: >>> >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>> $seq->seq; # line 257 >>> >>> I get a nice speed boost on loading alignments. >>> >>> Can I comment this line out in the CVS HEAD? >>> >>> Cheers, >>> >>> Albert. >>> >>> [init] 5.96046447753906e-06 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.0022270679473877 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 2.14348912239075 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 6.91910791397095 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 15.8402290344238 secs... >>> >>> avilella at magneto:~$ perl >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>> ancestral_alleles.pl >>> -dir /home/avilella/ensembl/exoseq/test -verbose >>> [init] 1.21593475341797e-05 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162399.chr1.fasta] >>> 0.00294303894042969 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000158022.chr1.fasta] >>> 0.510555982589722 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000162585.chr1.fasta] >>> 1.6192569732666 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000121957.chr1.fasta] >>> 3.86473417282104 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000203717.chr1.fasta] >>> 6.99602198600769 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000196188.chr1.fasta] >>> 7.26704716682434 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000025800.chr1.fasta] >>> 8.44332504272461 secs... >>> [loading aln /home/avilella/ensembl/exoseq/test/ >>> ENSG00000117475.chr1.fasta] >>> 12.103296995163 secs... >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd.web at gmail.com Sun Nov 25 16:13:44 2007 From: bernd.web at gmail.com (Bernd Web) Date: Sun, 25 Nov 2007 17:13:44 +0100 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> Message-ID: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Hi, I am not sure if this is related, but I remember SimpleAlign was adapted to cope with more gap symbols that can occur in alignments/FastA sequences, as: . _ - = Previous versions would throw an error on 'illegal' gap characters, Regards, Bernd On Nov 25, 2007 4:38 PM, Chris Fields wrote: > Albert, > > I was getting a single AlignIO.t fail which appeared to be related to > this: > > ... > ok 122 - The object isa Bio::Align::AlignI > ok 123 - consensus_string on metafasta > > not ok 124 - symbol_chars() using metafasta > # Failed test 'symbol_chars() using metafasta' > # in t/AlignIO.t at line 346. > # got: '0' > # expected: '23' > > It was b/c the symbol hash was initialized in the constructor (so it > was present, just empty). I have changed that in CVS; all tests pass > now. > > chris > > > On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: > > > cvs commited now. it is calculated anyway when calling symbol_chars > > so... > > > > On Nov 23, 2007 12:49 AM, Chris Fields wrote: > >> Albert, > >> > >> Found it: > >> > >> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > >> Bio/ > >> SimpleAlign.pm.diff?r1=1.36&r2=1.37 > >> > >> If it slows performance that dramatically, maybe we can move this to > >> a separate AlignUtils method instead. Maybe something to ask Jason > >> about? > >> > >> chris > >> > >> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: > >> > >> > >>> Hi, > >>> > >>> Am I right in thinking that the '_symbols' hash in SimpleAlign is > >>> only > >>> used if one calls the symbol_chars method? > >>> > >>> When I comment out this line: > >>> > >>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if > >>> $seq->seq; # line 257 > >>> > >>> I get a nice speed boost on loading alignments. > >>> > >>> Can I comment this line out in the CVS HEAD? > >>> > >>> Cheers, > >>> > >>> Albert. > >>> > >>> [init] 5.96046447753906e-06 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.0022270679473877 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 2.14348912239075 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 6.91910791397095 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 15.8402290344238 secs... > >>> > >>> avilella at magneto:~$ perl > >>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ > >>> ancestral_alleles.pl > >>> -dir /home/avilella/ensembl/exoseq/test -verbose > >>> [init] 1.21593475341797e-05 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162399.chr1.fasta] > >>> 0.00294303894042969 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000158022.chr1.fasta] > >>> 0.510555982589722 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000162585.chr1.fasta] > >>> 1.6192569732666 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000121957.chr1.fasta] > >>> 3.86473417282104 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000203717.chr1.fasta] > >>> 6.99602198600769 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000196188.chr1.fasta] > >>> 7.26704716682434 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000025800.chr1.fasta] > >>> 8.44332504272461 secs... > >>> [loading aln /home/avilella/ensembl/exoseq/test/ > >>> ENSG00000117475.chr1.fasta] > >>> 12.103296995163 secs... > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Sun Nov 25 16:39:01 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 10:39:01 -0600 Subject: [Bioperl-l] proposed change -- symbols SimpleAlign In-Reply-To: <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> References: <358f4d650711221355s655e0db5w22c0d0589b248d6e@mail.gmail.com> <6754677D-11BF-45AD-B510-68669D1C1016@uiuc.edu> <358f4d650711250350r60d31a9elf09e7fd4513fc618@mail.gmail.com> <716af09c0711250813x2cd851d3i5345c3161d87d928@mail.gmail.com> Message-ID: Bernd, That would be when generating Bio::LocatableSeq instances for building a Bio::SimpleAlign object. Judging by test suite results that doesn't appear to be affected. chris On Nov 25, 2007, at 10:13 AM, Bernd Web wrote: > Hi, > > I am not sure if this is related, but I remember SimpleAlign was > adapted to cope with more gap symbols that can occur in > alignments/FastA sequences, as: . _ - = > Previous versions would throw an error on 'illegal' gap characters, > > Regards, > Bernd > > On Nov 25, 2007 4:38 PM, Chris Fields wrote: >> Albert, >> >> I was getting a single AlignIO.t fail which appeared to be related to >> this: >> >> ... >> ok 122 - The object isa Bio::Align::AlignI >> ok 123 - consensus_string on metafasta >> >> not ok 124 - symbol_chars() using metafasta >> # Failed test 'symbol_chars() using metafasta' >> # in t/AlignIO.t at line 346. >> # got: '0' >> # expected: '23' >> >> It was b/c the symbol hash was initialized in the constructor (so it >> was present, just empty). I have changed that in CVS; all tests pass >> now. >> >> chris >> >> >> On Nov 25, 2007, at 5:50 AM, Albert Vilella wrote: >> >>> cvs commited now. it is calculated anyway when calling symbol_chars >>> so... >>> >>> On Nov 23, 2007 12:49 AM, Chris Fields wrote: >>>> Albert, >>>> >>>> Found it: >>>> >>>> http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ >>>> Bio/ >>>> SimpleAlign.pm.diff?r1=1.36&r2=1.37 >>>> >>>> If it slows performance that dramatically, maybe we can move >>>> this to >>>> a separate AlignUtils method instead. Maybe something to ask Jason >>>> about? >>>> >>>> chris >>>> >>>> On Nov 22, 2007, at 3:55 PM, Albert Vilella wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> Am I right in thinking that the '_symbols' hash in SimpleAlign is >>>>> only >>>>> used if one calls the symbol_chars method? >>>>> >>>>> When I comment out this line: >>>>> >>>>> map { $self->{'_symbols'}->{$_} = 1; } split(//,$seq->seq) if >>>>> $seq->seq; # line 257 >>>>> >>>>> I get a nice speed boost on loading alignments. >>>>> >>>>> Can I comment this line out in the CVS HEAD? >>>>> >>>>> Cheers, >>>>> >>>>> Albert. >>>>> >>>>> [init] 5.96046447753906e-06 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.0022270679473877 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 2.14348912239075 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 6.91910791397095 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 15.8402290344238 secs... >>>>> >>>>> avilella at magneto:~$ perl >>>>> /home/avilella/src/ensembl_main/ensembl-personal/avilella/exoseq/ >>>>> ancestral_alleles.pl >>>>> -dir /home/avilella/ensembl/exoseq/test -verbose >>>>> [init] 1.21593475341797e-05 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162399.chr1.fasta] >>>>> 0.00294303894042969 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000158022.chr1.fasta] >>>>> 0.510555982589722 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000162585.chr1.fasta] >>>>> 1.6192569732666 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000121957.chr1.fasta] >>>>> 3.86473417282104 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000203717.chr1.fasta] >>>>> 6.99602198600769 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000196188.chr1.fasta] >>>>> 7.26704716682434 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000025800.chr1.fasta] >>>>> 8.44332504272461 secs... >>>>> [loading aln /home/avilella/ensembl/exoseq/test/ >>>>> ENSG00000117475.chr1.fasta] >>>>> 12.103296995163 secs... >>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Nov 25 18:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Nov 2007 12:51:42 -0600 Subject: [Bioperl-l] [ANNOUNCE] bioperl-ext updates and bioperl-live Message-ID: <32B25A3B-0F04-43CB-8A66-1019EFD3BEB0@uiuc.edu> I have been making some significant changes to Bio::SeqIO::staden::read over the last few months which incorporate code from Bugzilla (bugs 2074 and 2329, very kindly donated from Chris Bailey and Joel Martin, cheers!). Significant Changes: * All Inline code in staden::read are now XS-based * A new method has been added to Bio::SeqIO::staden::read for optionally getting trace data (i.e. for drawing graphs). The method ode is now implemented in Bio::SeqIO::abi, with example code in examples/quality/svgtrace.pl. These changes should allow newer versions of Staden io_lib as well (the code is tested with io_lib 1.9.2), though they haven't been tested extensively as I am having problems compiling newer io_lib versions on my MacBook. It's very likely more changes will need to be made along the way; some issues were found with XS compilation which appear harmless but need to be investigated, and trace data from other formats need to be evaluated. The possibility exists that many of these changes break backward compatibility with older bioperl releases, though tests passed with bioperl 1.5.2. Any feedback re: platform issues, test results using newer io_lib versions, older bioperl-versions, etc would be greatly appreciated. I'm hoping this will stimulate more interest in getting other bioperl- ext modules up-to-date with bioperl-live. chris From cjfields at uiuc.edu Mon Nov 26 18:59:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 12:59:23 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> References: <4701AEE6.6070506@web.de> <47022278.7010700@web.de> <47025AD9.1090105@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: Steve, Bernd, (and Jason, since you may have some input on this as well), I am now looking into the bug Bernd submitted and it seems there is a serious discrepancy with the way the hit raw_score, bits, and significance is determined for Hit objects. Unless I am mistaken these should always come from the best HSP when they are present, falling back to the hit table data only when no HSP alignments are present. Under the latter conditions a minimal Hit object is made from data in the hit table, which reports the rounded bit score, not the raw score, so in those cases the raw score would be undefined (and you probably should get a nasty warning indicating there are no HSPs present to get the data from). What is occurring now, though, is the raw_score and significance is explicitly set from the hit table in the BLAST parser for the Hit object at all times, while the bits are correctly derived from the best HSP (no fallback to the hit table). Changing to the behavior above results in several tests failing via SearchIO.t, with each failed test reporting the expected (read:correct) raw score. I'll look through the tests just in case, but I am planning on committing changes to the BLAST parsers, GenericHit, and SearchIO.t (to reflect the correct expected data) in the next day or two unless there are any objections. chris On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > On Nov 21, 2007 9:38 AM, Bernd Web wrote: >> [snip] >> >> Further is is possible to get the raw_score of a hit. $hit->raw_score >> actually gets the bitscore (w/o decimal point). > > Hmmm. raw_score should not be the same as bit score. So given an > example blast hit line such as: > > Score = 60.0 bits (30), Expect = 1e-06 > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > Could you submit a bug report for this? http://www.bioperl.org/ > wiki/Bugs > > Thanks, > Steve > >> >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: >>> Hi Russell, >>> >>> I came across your question. At first I thought all was well on my >>> system, but indeed I also have these colouring problems. >>> I noted that scrore in the bgcolor callback gets a different value! >>> Printing score during hit parsing($hit->raw_score) gives the same >>> score as -description >>> my $score = $feature->score; However, printing score in the bgcolor >>> sub gives 2573! >>> All scores in the bgcolor routine all different and higher than the >>> real scores. Were you able to solve this colouring issue? >>> >>> Regards, >>> Bernd >>> >>> >>>> Hi all, >>>> I'm using a modified version of Lincoln's tutorial >>>> (http://www.bioperl.org/wiki/ >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) >>>> and I'm colouring the HSPs by setting the -bgcolor by score with >>>> a sub >>>> to give a similar image to that from NCBI but for some reason, my >>>> colours are coming out wrong (see attached example) >>>> They seem to be off by one but I can't see why. >>>> >>>> Any ideas? >>>> >>>> I can't be certain but I think it's only started doing this >>>> since our >>>> BLAST upgrade to 2.2.17 a few weeks ago. >>>> >>>> Here's the colouring code: >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------- >>>> my $track = $panel->add_track( >>>> -glyph => 'segments', >>>> -label => 1, >>>> -connector => 'dashed', >>>> -bgcolor => sub { >>>> my $feature = shift; >>>> my $score = $feature->score; >>>> return 'red' if $score >= 200; >>>> return 'fuchsia' if $score >>>> >= 80; >>>> return 'lime' if $score >>>> >= 50; >>>> return 'blue' if $score >= 40; >>>> return 'black'; >>>> }, >>>> -font2color => 'gray', >>>> -sort_order => 'high_score', >>>> -description => sub { >>>> my $feature = shift; >>>> return unless >>>> $feature->has_tag('description'); >>>> my ($description) = >>>> $feature->each_tag_value('description'); >>>> my $score = $feature->score; >>>> "$description, score=$score"; >>>> }, >>>> ); >>>> ------------------------------------------------------------------- >>>> ----- >>>> --------- >>>> >>>> >>>> Thanx, >>>> >>>> Russell Smithies >>>> >>>> >>>> >>>> >>>> =================================================================== >>>> ==== >>>> Attention: The information contained in this message and/or >>>> attachments >>>> from AgResearch Limited is intended only for the persons or >>>> entities >>>> to which it is addressed and may contain confidential and/or >>>> privileged >>>> material. Any review, retransmission, dissemination or other use >>>> of, or >>>> taking of any action in reliance upon, this information by >>>> persons or >>>> entities other than the intended recipients is prohibited by >>>> AgResearch >>>> Limited. If you have received this message in error, please >>>> notify the >>>> sender immediately. >>>> =================================================================== >>>> ==== >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From arvindvanam at gmail.com Mon Nov 26 19:08:41 2007 From: arvindvanam at gmail.com (vanam) Date: Mon, 26 Nov 2007 11:08:41 -0800 (PST) Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> Message-ID: <13955209.post@talk.nabble.com> i searches for the embassy version of RNAFOLD (i guess its vrnafold) but i m unable to find a downloadable version.all ther is a web interface for it. can u tell frm wher to fdownload it???? or can you just tell me how to set the location in piseapplication to localhost n wat to enter in the $email variable???? -- View this message in context: http://www.nabble.com/run-RNAfold-in-perl-tf4863835.html#a13955209 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Nov 26 20:08:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Nov 2007 14:08:24 -0600 Subject: [Bioperl-l] run RNAfold in perl In-Reply-To: <13955209.post@talk.nabble.com> References: <13918981.post@talk.nabble.com> <214F1D57-E2F0-4D48-8DBA-89B0E28E4145@uiuc.edu> <13922740.post@talk.nabble.com> <1C24BBCD-88E3-4EA4-B79D-1F7DDAEDE3DE@uiuc.edu> <13955209.post@talk.nabble.com> Message-ID: <8F0B3E56-BC46-4794-9A30-12688A358CAD@uiuc.edu> On Nov 26, 2007, at 1:08 PM, vanam wrote: > i searches for the embassy version of RNAFOLD (i guess its > vrnafold) but i m > unable to find a downloadable version.all ther is a web interface > for it. > can u tell frm wher to fdownload it???? You will need to install EMBOSS as well as the EMBASSY version of VIENNA (something which is documented in the docs that come along with the distributions and I will not go into detail on): ftp://emboss.open-bio.org/pub/EMBOSS/ This would be your best bet. Understand that there is no specific class framework for dealing with RNA secondary structure in BioPerl, so you will be on your own for now. My suggestion for using Pise had the very important caveats that (1) it very well may not work, (2) I have no experience with Pise, let alone setting it up locally, therefore (3) I haven't tested it (and don't intend to, as I don't have the time). > or can you just tell me how to set the location in piseapplication to > localhost n wat to enter in the $email variable???? I have pointed out documentation previously which comes with the modules in question. Remember perldoc is your friend; consulting it saves me (and everyone else) time. From 'perldoc Bio::Tools::Run::AnalysisFactory::Pise': ---------------------------------------------- DESCRIPTION Bio::Tools::Run::AnalysisFactory::Pise is a class to create Pise appli- cation objects, that let you submit jobs on a Pise server. my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -email => 'me at myhome'); The email is optional (there is default one). It can be useful, though. Your program might enter infinite loops, or just run many jobs: the Pise server maintainer needs a contact (s/he could of course cancel any requests from your address...). And if you plan to run a lot of heavy jobs, or to do a course with many students, please ask the maintainer before. The location parameter stands for the actual CGI location, except when set at the factory creation step, where it is rather the root of all CGI. There are default values for most of Pise programs. You can either set location at: 1 factory creation: my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new( -location => 'http://somewhere/ Pise/cgi-bin', -email => 'me at myhome'); 2 program creation: my $program = $factory->program('water', -location => 'http://somewhere/Pise/ cgi-bin/water.pl' ); 3 any time before running: $program->location('http://somewhere/Pise/cgi-bin/water.pl'); $job = $program->run(); 4 when running: $job = $program->run(-location => 'http://somewhere/Pise/cgi- bin/water.pl'); You can also retrieve a previous job results by providing its url: $job = $factory->job($url); You get the url of a job by: $job->jobid; ---------------------------------------------- chris From sac at bioperl.org Tue Nov 27 01:41:59 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 17:41:59 -0800 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> Message-ID: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Chris, Cood catch. You're on track here with one exception: WU blast and NCBI blast behave differently in what they report in the hit table: WU blast puts the raw score in the table not the bit score as NCBI blast does (see example below for reference). WU blast also swaps their location in the HSP header relative to how NCBI reports it. It would be good to verify that the blast parser isn't befuddled by this. A quick look at SearchIO::blast and it appears that data from the hit table is always getting stored as score, not bits for WU blast. Not sure if the HSP section data are parsed correctly. I'd recommend looking into these things when you do your fixes. So in the end, WU blast HSPs that are built from the hit table should report a value for raw_score and punt on bits, but NCBI HSPs so constructed should do the opposite. The downside to this arrangement is that code that works for NCBI blast hits will need modification to work for WU blast hits, but that is just the nature of the data. It shouldn't be an issue for the majority of users that stick with one flavor of blast and don't switch back and forth, or for users that get their HSP scoring data from HSP sections rather than relying on the hit table. Ideally, the HSP object would know whether it was NCBI or WU-based and issue an informative warning when attempting to access data it doesn't have. One solution might be for the parser to put a 'WU-' in front of the algorithm name for WU blast reports, so it would then be available for the contained hit/hsp objects. This could break anything dependent on algorithm name, so it would need some testing. Steve Example WU blast table and HSP header: Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N gb|AAC73113.1| (AE000111) aspartokinase I, homoserine deh... 4141 0.0 1 gb|AAC76922.1| (AE000468) aspartokinase II and homoserine... 844 3.1e-86 1 gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensi... 483 2.8e-47 1 gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia c... 97 0.0010 1 >gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I [Escherichia coli] Length = 820 Score = 4141 (1462.8 bits), Expect = 0.0, P = 0.0 Identities = 820/820 (100%), Positives = 820/820 (100%) Example NCBI blast table and HSP header: Score E Sequences producing significant alignments: (bits) Value ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:E... 120 3e-27 ENSP00000327738 pep:known-ccds chromosome:NCBI36:4:189297592:189... 115 8e-26 >ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1 gene:ENSG00000137397 transcript:ENST00000357569 Length = 425 Score = 120 bits (301), Expect = 3e-27 Identities = 76/261 (29%), Positives = 140/261 (53%), Gaps = 21/261 (8%) On Nov 26, 2007 10:59 AM, Chris Fields wrote: > Steve, Bernd, (and Jason, since you may have some input on this as > well), > > I am now looking into the bug Bernd submitted and it seems there is a > serious discrepancy with the way the hit raw_score, bits, and > significance is determined for Hit objects. Unless I am mistaken > these should always come from the best HSP when they are present, > falling back to the hit table data only when no HSP alignments are > present. Under the latter conditions a minimal Hit object is made > from data in the hit table, which reports the rounded bit score, not > the raw score, so in those cases the raw score would be undefined > (and you probably should get a nasty warning indicating there are no > HSPs present to get the data from). > > What is occurring now, though, is the raw_score and significance is > explicitly set from the hit table in the BLAST parser for the Hit > object at all times, while the bits are correctly derived from the > best HSP (no fallback to the hit table). Changing to the behavior > above results in several tests failing via SearchIO.t, with each > failed test reporting the expected (read:correct) raw score. > > I'll look through the tests just in case, but I am planning on > committing changes to the BLAST parsers, GenericHit, and SearchIO.t > (to reflect the correct expected data) in the next day or two unless > there are any objections. > > chris > > > On Nov 21, 2007, at 12:43 PM, Steve Chervitz wrote: > > > On Nov 21, 2007 9:38 AM, Bernd Web wrote: > >> [snip] > >> > >> Further is is possible to get the raw_score of a hit. $hit->raw_score > >> actually gets the bitscore (w/o decimal point). > > > > Hmmm. raw_score should not be the same as bit score. So given an > > example blast hit line such as: > > > > Score = 60.0 bits (30), Expect = 1e-06 > > > > $hit->raw_score() should return 30, not 60, as you seem to be getting. > > > > Could you submit a bug report for this? http://www.bioperl.org/ > > wiki/Bugs > > > > Thanks, > > Steve > > > >> > >> On Nov 21, 2007 5:42 PM, Bernd Web wrote: > >>> Hi Russell, > >>> > >>> I came across your question. At first I thought all was well on my > >>> system, but indeed I also have these colouring problems. > >>> I noted that scrore in the bgcolor callback gets a different value! > >>> Printing score during hit parsing($hit->raw_score) gives the same > >>> score as -description > >>> my $score = $feature->score; However, printing score in the bgcolor > >>> sub gives 2573! > >>> All scores in the bgcolor routine all different and higher than the > >>> real scores. Were you able to solve this colouring issue? > >>> > >>> Regards, > >>> Bernd > >>> > >>> > >>>> Hi all, > >>>> I'm using a modified version of Lincoln's tutorial > >>>> (http://www.bioperl.org/wiki/ > >>>> HOWTO:Graphics#Parsing_Real_BLAST_Output) > >>>> and I'm colouring the HSPs by setting the -bgcolor by score with > >>>> a sub > >>>> to give a similar image to that from NCBI but for some reason, my > >>>> colours are coming out wrong (see attached example) > >>>> They seem to be off by one but I can't see why. > >>>> > >>>> Any ideas? > >>>> > >>>> I can't be certain but I think it's only started doing this > >>>> since our > >>>> BLAST upgrade to 2.2.17 a few weeks ago. > >>>> > >>>> Here's the colouring code: > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> ------- > >>>> my $track = $panel->add_track( > >>>> -glyph => 'segments', > >>>> -label => 1, > >>>> -connector => 'dashed', > >>>> -bgcolor => sub { > >>>> my $feature = shift; > >>>> my $score = $feature->score; > >>>> return 'red' if $score >= 200; > >>>> return 'fuchsia' if $score > >>>> >= 80; > >>>> return 'lime' if $score > >>>> >= 50; > >>>> return 'blue' if $score >= 40; > >>>> return 'black'; > >>>> }, > >>>> -font2color => 'gray', > >>>> -sort_order => 'high_score', > >>>> -description => sub { > >>>> my $feature = shift; > >>>> return unless > >>>> $feature->has_tag('description'); > >>>> my ($description) = > >>>> $feature->each_tag_value('description'); > >>>> my $score = $feature->score; > >>>> "$description, score=$score"; > >>>> }, > >>>> ); > >>>> ------------------------------------------------------------------- > >>>> ----- > >>>> --------- > >>>> > >>>> > >>>> Thanx, > >>>> > >>>> Russell Smithies > >>>> > >>>> > >>>> > >>>> > >>>> =================================================================== > >>>> ==== > >>>> Attention: The information contained in this message and/or > >>>> attachments > >>>> from AgResearch Limited is intended only for the persons or > >>>> entities > >>>> to which it is addressed and may contain confidential and/or > >>>> privileged > >>>> material. Any review, retransmission, dissemination or other use > >>>> of, or > >>>> taking of any action in reliance upon, this information by > >>>> persons or > >>>> entities other than the intended recipients is prohibited by > >>>> AgResearch > >>>> Limited. If you have received this message in error, please > >>>> notify the > >>>> sender immediately. > >>>> =================================================================== > >>>> ==== > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From sac at bioperl.org Tue Nov 27 03:27:09 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 26 Nov 2007 19:27:09 -0800 Subject: [Bioperl-l] Installing bioperl-ext on Mac In-Reply-To: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> References: <4DE80AE8-89A8-4C71-A36E-E7245AF28B63@genome.stanford.edu> Message-ID: <8f200b4c0711261927h7ed8887ay8ab788f4f70fa197@mail.gmail.com> Hi Jon, I'd recommend downloading it into a separate location of your choosing (~/lib/bioperl-ext for example) and running the installer as specified in the docs that come with the download. Then you can include the location you installed it into via a "use lib '~/lib/bioperl-ext'" statement at the top of your script. It may be handy to install it as a non-root user so that you don't alter the main perl installation. This way your ext install will stay separate from your main bioperl and perl installations. There are some docs about the ext packages you might want to check out at http://www.bioperl.org/wiki/Ext_package. Steve On Nov 21, 2007 4:35 PM, Jonathan Binkley wrote: > Hi, > > I installed bioperl on a Mac (OS 10.4, Intel) via fink, > which put it here: > > /sw/lib/perl5/5.8.6/Bio/ > > It seems to work fine, but I need bioperl-ext for > Smith-Waterman alignments. > > So, into which directory should I download bioperl-ext and > run the Makefile? > > Thanks. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From a_arya2000 at yahoo.com Tue Nov 27 14:51:41 2007 From: a_arya2000 at yahoo.com (a_arya2000) Date: Tue, 27 Nov 2007 06:51:41 -0800 (PST) Subject: [Bioperl-l] Bioperl-ext test fails Message-ID: <615478.1036.qm@web60113.mail.yahoo.com> Hello, I downloaded latest bioperl-ext from bioperl website, and I have io_lib v1.8.11 installed, and I was trying to install Bio::SeqIO::staden::read (of bioperl-ext). It compiled fine without any error but when I run make test I got following output. ERL_DL_NONLAZY=1 perl-5.8.8/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/staden_read....ok 3/94# Test 7 got: "0" (t/staden_read.t at line 110 *TODO*) # Expected: (We don't have the ability to write files for abi format) # t/staden_read.t line 110 is: ok(0, undef, "We don't have the ability to write files for $format format") for 1..7; # Test 8 got: "0" (t/staden_read.t at line 110 fail #2 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 9 got: "0" (t/staden_read.t at line 110 fail #3 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 10 got: "0" (t/staden_read.t at line 110 fail #4 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 11 got: "0" (t/staden_read.t at line 110 fail #5 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 12 got: "0" (t/staden_read.t at line 110 fail #6 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 13 got: "0" (t/staden_read.t at line 110 fail #7 *TODO*) # Expected: (We don't have the ability to write files for abi format) # Test 14 got: "0" (t/staden_read.t at line 62 *TODO*) # Expected: (Still missing test files for alf format) # t/staden_read.t line 62 is: ok(0, undef, "Still missing test files for $format format") for (1..$formatlooptests); # Test 15 got: "0" (t/staden_read.t at line 62 fail #2 *TODO*) # Expected: (Still missing test files for alf format) # Test 16 got: "0" (t/staden_read.t at line 62 fail #3 *TODO*) # Expected: (Still missing test files for alf format) # Test 17 got: "0" (t/staden_read.t at line 62 fail #4 *TODO*) # Expected: (Still missing test files for alf format) # Test 18 got: "0" (t/staden_read.t at line 62 fail #5 *TODO*) # Expected: (Still missing test files for alf format) # Test 19 got: "0" (t/staden_read.t at line 62 fail #6 *TODO*) # Expected: (Still missing test files for alf format) # Test 20 got: "0" (t/staden_read.t at line 62 fail #7 *TODO*) # Expected: (Still missing test files for alf format) # Test 21 got: "0" (t/staden_read.t at line 62 fail #8 *TODO*) # Expected: (Still missing test files for alf format) # Test 22 got: "0" (t/staden_read.t at line 62 fail #9 *TODO*) # Expected: (Still missing test files for alf format) # Test 23 got: "0" (t/staden_read.t at line 62 fail #10 *TODO*) # Expected: (Still missing test files for alf format) # Test 24 got: "0" (t/staden_read.t at line 62 fail #11 *TODO*) # Expected: (Still missing test files for alf format) # Test 25 got: "0" (t/staden_read.t at line 62 fail #12 *TODO*) # Expected: (Still missing test files for alf format) # Test 31 got: "0" (t/staden_read.t at line 107 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # t/staden_read.t line 107 is: ok(0, undef, "Can't write valid ctf files until we have a trace object") for 1..7; # Test 32 got: "0" (t/staden_read.t at line 107 fail #2 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 33 got: "0" (t/staden_read.t at line 107 fail #3 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 34 got: "0" (t/staden_read.t at line 107 fail #4 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 35 got: "0" (t/staden_read.t at line 107 fail #5 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 36 got: "0" (t/staden_read.t at line 107 fail #6 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) # Test 37 got: "0" (t/staden_read.t at line 107 fail #7 *TODO*) # Expected: (Can't write valid ctf files until we have a trace object) t/staden_read....ok All tests successful. Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + 0.15 csys = 1.71 CPU) Anyone has any idea what might be going wrong here? By the way, my OS is Linux. Thank you very much. Arya ____________________________________________________________________________________ Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ From bix at sendu.me.uk Tue Nov 27 15:41:38 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 15:41:38 +0000 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <615478.1036.qm@web60113.mail.yahoo.com> References: <615478.1036.qm@web60113.mail.yahoo.com> Message-ID: <474C3AB2.5050208@sendu.me.uk> a_arya2000 wrote: > Hello, > I downloaded latest bioperl-ext from bioperl website, > and I have io_lib v1.8.11 installed, and I was trying > to install Bio::SeqIO::staden::read (of bioperl-ext). > It compiled fine without any error but when I run make > test I got following output. [...] > All tests successful. > Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + > 0.15 csys = 1.71 CPU) > > > Anyone has any idea what might be going wrong here? By > the way, my OS is Linux. Thank you very much. Not being familiar with the test script or ext, I can at least say that nothing actually went wrong: 'All tests successful'. Apparently there are some things in the test script that are known by the author to not work quite right, so he marked them as 'todo'. The problems seem harmless in any case, with things returning 0 instead of undef. So, unless you've reason to believe there is something significant going on, all is well. From alison.waller at utoronto.ca Mon Nov 26 21:06:35 2007 From: alison.waller at utoronto.ca (alison waller) Date: Mon, 26 Nov 2007 16:06:35 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results Message-ID: <005a01c83070$3a814580$d81efea9@AWALL> Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From bix at sendu.me.uk Tue Nov 27 17:01:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Nov 2007 17:01:36 +0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <474C4D70.2010206@sendu.me.uk> alison waller wrote: > I am trying to write a script that will parse large blast files (usually > blastx) I also want to be able to specify how many hits I want to report > information on. > > Most of the time I will only want information on the top hit, but I want to > have the flexibility to obtain information on say the top5. I am pretty > sure I have done this wrong, any advice on how to correct my script to do > this, would be great. [snip] > if ($top_hit=$result->next_hit) # this might be wrong - I want to > specify how many hits to print results for I didn't really pay attention to the rest of your code, but assuming it all works except for only ever giving you info for the top hit, you just need to change this 'if' to a loop of some kind. # ... my $hits = 0; while (my $hit = $result->next_hit) { $hits++; last if $hits > $tophit; # ... } From David.Messina at sbc.su.se Tue Nov 27 17:55:44 2007 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 27 Nov 2007 18:55:44 +0100 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <474C4D70.2010206@sendu.me.uk> References: <005a01c83070$3a814580$d81efea9@AWALL> <474C4D70.2010206@sendu.me.uk> Message-ID: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Hi Alison, As Sendu mentioned, the key bit is adding a condition to the hit loop to limit the number of hits that are printed. I didn't test the below extensively, but give it a try... Dave #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while ( my $result = $report->next_result ) { my $i = 0; while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { while ( my $hsp = $hit->next_hsp ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } From Russell.Smithies at agresearch.co.nz Tue Nov 27 19:31:29 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 28 Nov 2007 08:31:29 +1300 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: Do the hits need to be sorted first or is this done automagicly? I ask this as I know Megablast doesn't provide sorted output for most of it's formats. Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Dave Messina > Sent: Wednesday, 28 November 2007 6:56 a.m. > To: alison waller > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > Hi Alison, > As Sendu mentioned, the key bit is adding a condition to the hit loop to > limit the number of hits that are printed. I didn't test the below > extensively, but give it a try... > > > Dave > > > > #!/usr/local/bin/perl -w > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > use strict; > use warnings; > use Bio::SearchIO; > > my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; > if (@ARGV != 2) { die $usage; } > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > print OUT > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t > Qstrand\tHstrand\n"; > > # Go through BLAST reports one by one > while ( my $result = $report->next_result ) { > my $i = 0; > while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { > while ( my $hsp = $hit->next_hsp ) { > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > > if ($i == 0) { print OUT "no hits\n"; } > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at uiuc.edu Tue Nov 27 21:09:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:09:43 -0600 Subject: [Bioperl-l] Bioperl-ext test fails In-Reply-To: <474C3AB2.5050208@sendu.me.uk> References: <615478.1036.qm@web60113.mail.yahoo.com> <474C3AB2.5050208@sendu.me.uk> Message-ID: <3B8DD37B-F856-4365-86F0-038A00E26766@uiuc.edu> You can always test it within the bioperl suite after it's installed; several tests (abi.t, ztr.t) use Bio:SeqIO::staden::read. In general though if it's passing tests it should be fine. chris On Nov 27, 2007, at 9:41 AM, Sendu Bala wrote: > a_arya2000 wrote: >> Hello, >> I downloaded latest bioperl-ext from bioperl website, >> and I have io_lib v1.8.11 installed, and I was trying >> to install Bio::SeqIO::staden::read (of bioperl-ext). >> It compiled fine without any error but when I run make >> test I got following output. > [...] >> All tests successful. >> Files=1, Tests=94, 2 wallclock secs ( 1.56 cusr + >> 0.15 csys = 1.71 CPU) >> >> >> Anyone has any idea what might be going wrong here? By >> the way, my OS is Linux. Thank you very much. > > Not being familiar with the test script or ext, I can at least say > that > nothing actually went wrong: 'All tests successful'. Apparently there > are some things in the test script that are known by the author to not > work quite right, so he marked them as 'todo'. The problems seem > harmless in any case, with things returning 0 instead of undef. > > So, unless you've reason to believe there is something significant > going > on, all is well. From cjfields at uiuc.edu Tue Nov 27 21:00:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Nov 2007 15:00:33 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: References: <005a01c83070$3a814580$d81efea9@AWALL><474C4D70.2010206@sendu.me.uk> <628aabb70711270955w2b04c4eaqf2d1ec2804d166cf@mail.gmail.com> Message-ID: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From johnston at biochem.ucl.ac.uk Wed Nov 28 01:06:30 2007 From: johnston at biochem.ucl.ac.uk (Caroline Johnston) Date: Wed, 28 Nov 2007 01:06:30 +0000 (GMT) Subject: [Bioperl-l] Bio::Tools::Run::Primer3 Message-ID: Hello, I was playing around with Primer3, and I hit a problem. Not sure if it's a bug or if I was doing something I wasn't supposed to, but if it's the latter, I thought it might save someone else half an hour of banging their head of a keyboard if I mentioned it: What I was doing was roughly: # create a primer3 obj my $p3 = ...Primer3->new(); # loop through some sequences generating primers for # each of them using the same primer3 obj while (@some_bio_seqs){ my $res = $p3->run; ... } This worked fine for a while, but broke when I tried to set PRIMER_MIN_GC, at which point it worked for a few sequences then I got a "can't place primer on sequence" error. After a bit of faffing about, I think the problem occurs when no primers are found. In which case $p3 still has the primers from the previous run, which don't come from the current sequence, so can't be placed on it. I tried calling $p3->cleanup in the loop, but that didn't work either. Creating a new $p3 every time works fine. Are you supposed to create a new Primer3 object for every sequence? (Apologies if I missed the relevant bit of the docs). Cheers, Cass xx From alison.waller at utoronto.ca Tue Nov 27 21:32:07 2007 From: alison.waller at utoronto.ca (alison waller) Date: Tue, 27 Nov 2007 16:32:07 -0500 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <5888BAD6-AF81-4843-B791-9666E6DABF08@uiuc.edu> Message-ID: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Thanks Everyone, Your edits worked Dave, however after looking at the output I realized that I only want information on the top hsp per query returned. For example some of the querys the top hit has two hsps so it returned both. I tried to further edit it, but after 3 attempts they are all failing, I think due to me using the loops wrong. I also have another problem, I also want to retrieve the gi, this doesn't seem to be straight forward as it should. I found another method _get_seq_identifiers, but this looks awkward, isn't there and object for the gi? I've pasted my non-working script below if there are any suggestions on how to get it to print out just the first hsp per hit, that would be great. Thanks, #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; my $usage = "to run type: blast_parse_aw.pl <# of hits>\n"; if (@ARGV != 2) { die $usage; } my $infile = $ARGV[0]; my $outfile = $infile . '.parsed'; my $tophit = $ARGV[1]; # to specify in the command line how many hits # to report for each query #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $!\n"; my $report = new Bio::SearchIO( -file => "$infile", -format => "blast" ); print OUT "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t strand\tHstrand\n"; # Go through BLAST reports one by one while (my $result = $report->next_result) { my $i=0; while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { # Print some tab-delimited data about this hit print OUT $result->query_name, "\t"; print OUT $hit->name, "\t"; print OUT $hit->significance, "\t"; print OUT $hit->bits, "\t"; print OUT $hsp->evalue, "\t"; print OUT $hsp->percent_identity, "\t"; print OUT $hsp->length('total'), "\t"; print OUT $hsp->num_identical, "\t"; print OUT $hsp->gaps, "\t"; print OUT $hsp->strand('query'), "\t"; print OUT $hsp->strand('hit'), "\n"; } } if ($i == 0) { print OUT "no hits\n"; } } -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Tuesday, November 27, 2007 4:01 PM To: Smithies, Russell Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results The hits/HSPs are generally in the order they appear in the report. If you are looking for best/worst HSP after parsing you can use the $hit->hsp() method: # best and worst my $best = $hit->hsp('best'); # also 'first' my $worst = $hit->hsp('worst'); # also last The SearchIO text BLAST parser also has several options implemented for finer control: -inclusion_threshold => e-value threshold for inclusion in the PSI-BLAST score matrix model (blastpgp) -signif => float or scientific notation number to be used as a P- or Expect value cutoff -score => integer or scientific notation number to be used as a blast score value cutoff -bits => integer or scientific notation number to be used as a bit score value cutoff -hit_filter => reference to a function to be used for filtering hits based on arbitrary criteria. All hits of each BLAST report must satisfy this criteria to be retained. If a hit fails this test, it is ignored. This function should take a Bio::Search::Hit::BlastHit.pm object as its first argument and return true if the hit should be retained. Sample filter function: -hit_filter => sub { $hit = shift; $hit->gaps == 0; }, (Note: -filt_func is synonymous with -hit_filter) -overlap => integer. The amount of overlap to permit between adjacent HSPs when tiling HSPs. A reasonable value is 2. Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. chris On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > Do the hits need to be sorted first or is this done automagicly? > I ask this as I know Megablast doesn't provide sorted output for > most of > it's formats. > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open- >> bio.org] On Behalf Of Dave Messina >> Sent: Wednesday, 28 November 2007 6:56 a.m. >> To: alison waller >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >> >> Hi Alison, >> As Sendu mentioned, the key bit is adding a condition to the hit loop > to >> limit the number of hits that are printed. I didn't test the below >> extensively, but give it a try... >> >> >> Dave >> >> >> >> #!/usr/local/bin/perl -w >> >> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >> # alison waller November 2007 >> >> use strict; >> use warnings; >> use Bio::SearchIO; >> >> my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; >> if (@ARGV != 2) { die $usage; } >> >> my $infile = $ARGV[0]; >> my $outfile = $infile . '.parsed'; >> my $tophit = $ARGV[1]; # to specify in the command line how many >> hits >> # to report for each query >> >> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >> \n"; >> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! > $!\n"; >> >> my $report = new Bio::SearchIO( >> -file => "$infile", >> -format => "blast" >> ); >> >> print OUT >> > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tga > ps\t >> Qstrand\tHstrand\n"; >> >> # Go through BLAST reports one by one >> while ( my $result = $report->next_result ) { >> my $i = 0; >> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >> while ( my $hsp = $hit->next_hsp ) { >> >> # Print some tab-delimited data about this hit >> print OUT $result->query_name, "\t"; >> print OUT $hit->name, "\t"; >> print OUT $hit->significance, "\t"; >> print OUT $hit->bits, "\t"; >> print OUT $hsp->evalue, "\t"; >> print OUT $hsp->percent_identity, "\t"; >> print OUT $hsp->length('total'), "\t"; >> print OUT $hsp->num_identical, "\t"; >> print OUT $hsp->gaps, "\t"; >> print OUT $hsp->strand('query'), "\t"; >> print OUT $hsp->strand('hit'), "\n"; >> } >> } >> >> if ($i == 0) { print OUT "no hits\n"; } >> } >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dennis.prickett at bbsrc.ac.uk Wed Nov 28 10:18:26 2007 From: dennis.prickett at bbsrc.ac.uk (dennis prickett (IAH-C)) Date: Wed, 28 Nov 2007 10:18:26 -0000 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <005a01c83070$3a814580$d81efea9@AWALL> References: <005a01c83070$3a814580$d81efea9@AWALL> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9504751EF0@iahce2ksrv1.iah.bbsrc.ac.uk> Dear Alison Or, if you are absolutely only interested in the top hit you could limit it to that in the blast command by adding the parameters " -b 1 ". This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps, etc). Your blasts run faster and then you won't have to worry about how to parse out the top blast hit(s). However, if there are any caveats for using this parameter that I am not aware of please let us know. Dennis Prickett Institute of Animal Health Compton, nr Newbury RG2 9FS United Kingdom -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller Sent: 26 November 2007 21:07 To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] help using SEARCH IO to parse blast results Hello all, It's the usual story, I'm an engineer turned biologist who now needs help with bioinformatics so I can analyze huge amounts of data to finish my thesis. I am trying to write a script that will parse large blast files (usually blastx) I also want to be able to specify how many hits I want to report information on. Most of the time I will only want information on the top hit, but I want to have the flexibility to obtain information on say the top5. I am pretty sure I have done this wrong, any advice on how to correct my script to do this, would be great. Thanks so much, Alison #!/usr/local/bin/perl -w # Parsing BLAST reports with BioPerl's Bio::SearchIO module # alison waller November 2007 use strict; use warnings; use Bio::SearchIO; # to run type: blast_parse_aw.pl input.txt #of hits my $infile =shift(@ARGV); my $outfile ="$ARGV[0].parsed"; my $tophit = $ARGV[1]; # I want to specify in the command line how many hits to report for each query open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n"; open (OUT,">$outfile"); $report = new Bio::SearchIO( -file=>"$inFile", -format => "blast"); print "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga ps\t Qstrand\tHstrand\n"; # Go through BLAST reports one by one while($result = $report->next_result) { if ($top_hit=$result->next_hit) # this might be wrong - I want to specify how many hits to print results for # Print some tab-delimited data about this hit { print $result->query_name, "\t"; print $hit->description, "\t"; print $hit->significance, "\t"; print $hit->bits,"\t"; print $hsp->evalue, "\t"; print $hsp->percent_identity, "\t"; print $hsp->length('total'),"\t"; print $hsp->num_identical,"\t"; print $hsp->gaps,"\t"; print $hsp->strand('query'),"\t"; print $hsp->strand('hit'), "\n"; } else print "no hits\n"; } ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From t.nugent at cs.ucl.ac.uk Wed Nov 28 13:10:41 2007 From: t.nugent at cs.ucl.ac.uk (Tim Nugent) Date: Wed, 28 Nov 2007 13:10:41 +0000 Subject: [Bioperl-l] Helical Wheel module Message-ID: <474D68D1.3080602@cs.ucl.ac.uk> Hi everyone, I've been drawing a lot of helical wheels recently so put all my code into a module. I don't think there's anything in bioperl to do this yet though there are a few programs written in perl and flash on the web to do the same thing. I was thinking this could fit into biographics. Has lots of options to adjust labels, colours, ttf fonts and can output to png & svg. Tim ... Here's the output, converted to jpg from svg: http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg Module: http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz Works like this: use DrawHelicalWheel; my $im = DrawHelicalWheel->new(-title=>$title, -sequence=>$sequence, -helices=>\@helices, -ttf_font=>$font); open(OUTPUT, ">$svg"); binmode OUTPUT; print OUTPUT $im->svg; close OUTPUT; -- Tim Nugent (MRes) Research Student Bioinformatics Unit Department of Computer Science University College London Gower Street London WC1E 6BT Tel: 020-7679-0410 t.nugent at ucl.ac.uk http://www.cs.ucl.ac.uk/staff/T.Nugent From tristan.lefebure at gmail.com Wed Nov 28 15:46:11 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:46:11 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281046.11146.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From bix at sendu.me.uk Wed Nov 28 16:19:36 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Nov 2007 16:19:36 +0000 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <200711281046.11146.tnl7@cornell.edu> References: <200711281046.11146.tnl7@cornell.edu> Message-ID: <474D9518.7010201@sendu.me.uk> Tristan Lefebure wrote: > Hello! > > I was wondering if there was a function to remove sites/columns of an > alignment. Something like: $aln->remove_sites(@sites_to_remove) > I looked around Bio::SimpleAlign but did not find exactly that. There is > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. You might want to take a second look at the docs. You can supply column number ranges to remove_columns(), so it does exactly what you want. From tnl7 at cornell.edu Wed Nov 28 15:44:17 2007 From: tnl7 at cornell.edu (Tristan Lefebure) Date: Wed, 28 Nov 2007 10:44:17 -0500 Subject: [Bioperl-l] Remove sites of an alignment Message-ID: <200711281044.17770.tnl7@cornell.edu> Hello! I was wondering if there was a function to remove sites/columns of an alignment. Something like: $aln->remove_sites(@sites_to_remove) I looked around Bio::SimpleAlign but did not find exactly that. There is remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' criteria. I could recycle the '_remove_col' sub-function of 'remove_columns' to do so (it splits the alignment into sequence objects, removes the sites, and then regenerates an alignment object), but I would be surprised if there was nothing already doing the job... Thanks -Tristan From cjfields at uiuc.edu Wed Nov 28 13:57:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:57:27 -0600 Subject: [Bioperl-l] help using SEARCH IO to parse blast results In-Reply-To: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> References: <003f01c8313c$f69b22a0$6f00a8c0@AWALL> Message-ID: I had some code which does this which I committed yesterday to CVS; it catches the GI for the query and the hits: $result->query_gi; $hit->ncbi_gi; I am in the midst of fixing additional problems with WU-BLAST parsing but you are more than welcome to try it. chris On Nov 27, 2007, at 3:32 PM, alison waller wrote: > Thanks Everyone, > > Your edits worked Dave, however after looking at the output I > realized that > I only want information on the top hsp per query returned. For > example some > of the querys the top hit has two hsps so it returned both. > > I tried to further edit it, but after 3 attempts they are all > failing, I > think due to me using the loops wrong. > > I also have another problem, I also want to retrieve the gi, this > doesn't > seem to be straight forward as it should. I found another method > _get_seq_identifiers, but this looks awkward, isn't there and object > for the > gi? > > I've pasted my non-working script below if there are any suggestions > on how > to get it to print out just the first hsp per hit, that would be > great. > > Thanks, > > > #!/usr/local/bin/perl -w > > > # Parsing BLAST reports with BioPerl's Bio::SearchIO module > # alison waller November 2007 > > > use strict; > use warnings; > use Bio::SearchIO; > > > my $usage = "to run type: blast_parse_aw.pl <# of > hits>\n"; > if (@ARGV != 2) { die $usage; } > > > my $infile = $ARGV[0]; > my $outfile = $infile . '.parsed'; > my $tophit = $ARGV[1]; # to specify in the command line how many hits > # to report for each query > > > #open( IN, $infile ) || die "Can't open inputfile $infile! $!\n"; > open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! > \n"; > > > my $report = new Bio::SearchIO( > -file => "$infile", > -format => "blast" > ); > > > print OUT > > "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent > \tgaps\t > strand\tHstrand\n"; > > > # Go through BLAST reports one by one > while (my $result = $report->next_result) { > my $i=0; > while( ( $i++<$tophit) && (my $hit = $result->next_hit)){ > while ( ( $i++ < $tophit ) && (my $hsp = $hit->next_hsp) ) { > > > # Print some tab-delimited data about this hit > print OUT $result->query_name, "\t"; > print OUT $hit->name, "\t"; > print OUT $hit->significance, "\t"; > print OUT $hit->bits, "\t"; > print OUT $hsp->evalue, "\t"; > print OUT $hsp->percent_identity, "\t"; > print OUT $hsp->length('total'), "\t"; > print OUT $hsp->num_identical, "\t"; > print OUT $hsp->gaps, "\t"; > print OUT $hsp->strand('query'), "\t"; > print OUT $hsp->strand('hit'), "\n"; > } > } > if ($i == 0) { print OUT "no hits\n"; } > > } > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, November 27, 2007 4:01 PM > To: Smithies, Russell > Cc: Dave Messina; alison waller; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results > > The hits/HSPs are generally in the order they appear in the report. > > If you are looking for best/worst HSP after parsing you can use the > $hit->hsp() method: > > # best and worst > my $best = $hit->hsp('best'); # also 'first' > my $worst = $hit->hsp('worst'); # also last > > The SearchIO text BLAST parser also has several options implemented > for finer control: > > -inclusion_threshold => e-value threshold for inclusion in the > PSI-BLAST score matrix model (blastpgp) > -signif => float or scientific notation number to be used > as a P- or Expect value cutoff > -score => integer or scientific notation number to be used > as a blast score value cutoff > -bits => integer or scientific notation number to be used > as a bit score value cutoff > -hit_filter => reference to a function to be used for > filtering hits based on arbitrary criteria. > All hits of each BLAST report must satisfy > this criteria to be retained. > If a hit fails this test, it is ignored. > This function should take a > Bio::Search::Hit::BlastHit.pm object as its first > argument and return true > if the hit should be retained. > Sample filter function: > -hit_filter => sub { $hit = shift; > $hit->gaps == 0; }, > (Note: -filt_func is synonymous with -hit_filter) > -overlap => integer. The amount of overlap to permit between > adjacent HSPs when tiling HSPs. A reasonable > value is 2. > Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP. > > chris > > On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote: > >> Do the hits need to be sorted first or is this done automagicly? >> I ask this as I know Megablast doesn't provide sorted output for >> most of >> it's formats. >> >> Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open- >>> bio.org] On Behalf Of Dave Messina >>> Sent: Wednesday, 28 November 2007 6:56 a.m. >>> To: alison waller >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results >>> >>> Hi Alison, >>> As Sendu mentioned, the key bit is adding a condition to the hit >>> loop >> to >>> limit the number of hits that are printed. I didn't test the below >>> extensively, but give it a try... >>> >>> >>> Dave >>> >>> >>> >>> #!/usr/local/bin/perl -w >>> >>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module >>> # alison waller November 2007 >>> >>> use strict; >>> use warnings; >>> use Bio::SearchIO; >>> >>> my $usage = "to run type: blast_parse_aw.pl <# of >> hits>\n"; >>> if (@ARGV != 2) { die $usage; } >>> >>> my $infile = $ARGV[0]; >>> my $outfile = $infile . '.parsed'; >>> my $tophit = $ARGV[1]; # to specify in the command line how many >>> hits >>> # to report for each query >>> >>> #open( IN, $infile ) || die "Can't open inputfile $infile! $! >>> \n"; >>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! >> $!\n"; >>> >>> my $report = new Bio::SearchIO( >>> -file => "$infile", >>> -format => "blast" >>> ); >>> >>> print OUT >>> >> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent >> \tga >> ps\t >>> Qstrand\tHstrand\n"; >>> >>> # Go through BLAST reports one by one >>> while ( my $result = $report->next_result ) { >>> my $i = 0; >>> while ( ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) { >>> while ( my $hsp = $hit->next_hsp ) { >>> >>> # Print some tab-delimited data about this hit >>> print OUT $result->query_name, "\t"; >>> print OUT $hit->name, "\t"; >>> print OUT $hit->significance, "\t"; >>> print OUT $hit->bits, "\t"; >>> print OUT $hsp->evalue, "\t"; >>> print OUT $hsp->percent_identity, "\t"; >>> print OUT $hsp->length('total'), "\t"; >>> print OUT $hsp->num_identical, "\t"; >>> print OUT $hsp->gaps, "\t"; >>> print OUT $hsp->strand('query'), "\t"; >>> print OUT $hsp->strand('hit'), "\n"; >>> } >>> } >>> >>> if ($i == 0) { print OUT "no hits\n"; } >>> } >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use of, >> or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 13:54:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 07:54:39 -0600 Subject: [Bioperl-l] Helical Wheel module In-Reply-To: <474D68D1.3080602@cs.ucl.ac.uk> References: <474D68D1.3080602@cs.ucl.ac.uk> Message-ID: <053F7A0E-E0C3-4E86-AF7A-8F6F7A57DA37@uiuc.edu> Looks good! We recently added in your transmembrane module, so we could definitely add this in. chris On Nov 28, 2007, at 7:10 AM, Tim Nugent wrote: > Hi everyone, > > I've been drawing a lot of helical wheels recently so put all my code > into a module. I don't think there's anything in bioperl to do this > yet > though there are a few programs written in perl and flash on the web > to > do the same thing. I was thinking this could fit into biographics. Has > lots of options to adjust labels, colours, ttf fonts and can output to > png & svg. > > Tim > > ... > > Here's the output, converted to jpg from svg: > http://www.cs.ucl.ac.uk/staff/T.Nugent/images/2A79_B_helices.jpg > > Module: > http://www.cs.ucl.ac.uk/staff/T.Nugent/downloads/DrawHelicalWheel.tar.gz > > Works like this: > > use DrawHelicalWheel; > > my $im = DrawHelicalWheel->new(-title=>$title, > -sequence=>$sequence, > -helices=>\@helices, > -ttf_font=>$font); > open(OUTPUT, ">$svg"); > binmode OUTPUT; > print OUTPUT $im->svg; > close OUTPUT; > > -- > Tim Nugent (MRes) > Research Student > Bioinformatics Unit > Department of Computer Science > University College London > Gower Street > London WC1E 6BT > Tel: 020-7679-0410 > t.nugent at ucl.ac.uk > http://www.cs.ucl.ac.uk/staff/T.Nugent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Nov 28 18:43:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Nov 2007 12:43:58 -0600 Subject: [Bioperl-l] coloring of HSPs in blast panel In-Reply-To: <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> References: <4701AEE6.6070506@web.de> <4702BC5B.7040407@web.de> <62e9dabc0710021646h479d3104wca106ebc99a38b0e@mail.gmail.com> <716af09c0711210842w7fb49434hbcf083ea0ab23079@mail.gmail.com> <716af09c0711210938q1cca6bcdm43484501f008c34f@mail.gmail.com> <8f200b4c0711211043s4478a2a8oea81df4de2aaf979@mail.gmail.com> <8f200b4c0711261741v50147ce9k5562a7e833d3c3d9@mail.gmail.com> Message-ID: <55479E91-59AF-42B2-B15F-C4939531BC4D@uiuc.edu> On Nov 26, 2007, at 7:41 PM, Steve Chervitz wrote: > Chris, > > Cood catch. You're on track here with one exception: WU blast and NCBI > blast behave differently in what they report in the hit table: WU > blast puts the raw score in the table not the bit score as NCBI blast > does (see example below for reference). WU blast also swaps their > location in the HSP header relative to how NCBI reports it. It would > be good to verify that the blast parser isn't befuddled by this. A > quick look at SearchIO::blast and it appears that data from the hit > table is always getting stored as score, not bits for WU blast. Not > sure if the HSP section data are parsed correctly. I'd recommend > looking into these things when you do your fixes. What I have now after commits is: GenericHit - use the best HSP when possible for bits, score/raw_score, significance. When there is no HSP, construct a minimal Hit object using hit table data (WUBLAST maps the score to raw_score, NCBI BLAST maps to bits(), both map evalue/pvalue to significance). HSP mapping seems to be correct. One issue that has popped up is GenericHit::significance preferentially uses the best HSP. However, GenericHSP::significance uses evalues preferentially over pvalues; both Expect and P appear to be parsed for WU-BLAST HSPs now (so the evalue is reported); this apparently wasn't always the case if I read the GenericHit docs correctly. As NCBI BLAST doesn't report pvalues we could change that so it preferentially returns a pvalue if present, falling back to an evalue. This would match what is found hit table more closely and resembles what is documented for the method (for significance(), WU- BLAST gets pvalues, NCBI BLAST gets evalues). > So in the end, WU blast HSPs that are built from the hit table should > report a value for raw_score and punt on bits, but NCBI HSPs so > constructed should do the opposite. The downside to this arrangement > is that code that works for NCBI blast hits will need modification to > work for WU blast hits, but that is just the nature of the data. It > shouldn't be an issue for the majority of users that stick with one > flavor of blast and don't switch back and forth, or for users that get > their HSP scoring data from HSP sections rather than relying on the > hit table. In general I get my data from the HSPs, so this shouldn't be a significant issue (bad pun). I did find that changing it so that Hit objects use HSP data pointed out issues with test data; hit table raw/ bit scores were rounded from the HSP score data or vice versa since all data came from the hit table, so tests flunked. I think changing the way minimal hit objects report data (particularly for NCBI BLAST) will lead to a lot of confusion unless we clarify warnings when one or the other is missing (as you also indicated). I'm working on that now. > Ideally, the HSP object would know whether it was NCBI or WU-based and > issue an informative warning when attempting to access data it doesn't > have. One solution might be for the parser to put a 'WU-' in front of > the algorithm name for WU blast reports, so it would then be available > for the contained hit/hsp objects. This could break anything dependent > on algorithm name, so it would need some testing. > > Steve I can probably work around as noted above that unless you think it's warranted to add a 'WU' designation (the version info in the Result object has 'WashU' attached, so one could feasibly use that for distinguishing the two report types). Anyway, I'm committing my first batch of fixes, the significance test will fail for at least a day until I can look into it more. chris From tristan.lefebure at gmail.com Wed Nov 28 19:03:44 2007 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 28 Nov 2007 14:03:44 -0500 Subject: [Bioperl-l] Remove sites of an alignment In-Reply-To: <474D9518.7010201@sendu.me.uk> References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Hoops. I was reading the BioPerl 1.4 documentation. Actually, http://bioperl.org/wiki/Module:Bio::SimpleAlign send you to http://search.cpan.org/perldoc?Bio::SimpleAlign, which ends up to be the 1.4documentation... Thank you, it works great. On Nov 28, 2007 11:19 AM, Sendu Bala wrote: > Tristan Lefebure wrote: > > Hello! > > > > I was wondering if there was a function to remove sites/columns of an > > alignment. Something like: $aln->remove_sites(@sites_to_remove) > > I looked around Bio::SimpleAlign but did not find exactly that. There is > > remove_columns, but it works on 'match'|'weak'|'strong'|'mismatch' > criteria. > > You might want to take a second look at the docs. You can supply column > number ranges to remove_columns(), so it does exactly what you want. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Nov 28 21:57:14 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 29 Nov 2007 10:57:14 +1300 Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu><474D9518.7010201@sendu.me.uk> Message-ID: Has anyone got a good example of parsing ASN.1 with Bio::SeqIO::entrezgene? I'm trying to get GO ids and KEGG terms out but it's quite deeply nested and my Perl isn't that good :-( Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From stefan.kirov at bms.com Wed Nov 28 22:16:18 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 28 Nov 2007 17:16:18 -0500 (Eastern Standard Time) Subject: [Bioperl-l] Parsing Entrez Gene ASN.1 In-Reply-To: References: <200711281046.11146.tnl7@cornell.edu> <474D9518.7010201@sendu.me.uk> Message-ID: Here is an example for GO, will send the one for KEGG later: my $eio=new Bio::SeqIO(-file=>$file,-format=>'entrezgene', -service_record=>'yes');#, -locuslink=>'convert'); while (my $seq=$eio->next_seq) { my $gid=$seq->accession_number; foreach my $ot ($ann->get_Annotations('OntologyTerm')) { next if ($ot->term->authority eq 'STS marker'); #Do not need STS markers my $evid=$ot->comment; $evid=~s/evidence: //i; my @ref=$ot->term->get_references; #Really there should be just one? my $id=$ot->identifier; my $fid='GO:' . sprintf("%07u",$id); print join("\t",$gid,$ot->ontology->name,$ot->name,$evid,$fid, at ref?$ref[0]->medline:''),"\n"; } } Please note there is a bug in the parser that makes it suck a lot of RAM. I am fixing this and will commit probably by the week's end- you will have to update at that point. If you work with few records this should not matter. Stefan On Thu, 29 Nov 2007, Smithies, Russell wrote: > Has anyone got a good example of parsing ASN.1 with > Bio::SeqIO::entrezgene? > I'm trying to get GO ids and KEGG terms out but it's quite deeply nested > and my Perl isn't that good :-( > > Russell > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Nov 29 23:06:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 17:06:42 -0600 Subject: [Bioperl-l] PSIBLAST parsing added to SearchIO::blastxml Message-ID: <159ABF90-080B-4F98-BF63-7FCEE5D05F10@uiuc.edu> For anyone using PSI-BLAST: I have implemented experimental PSI-BLAST parsing in Bio::SearchIO::blastxml (though it appears to be pretty stable!). Since there isn't any easy way to distinguish between normal BLASTS and PSI-BLAST reports due to recent changes at NCBI to BLAST, you have to indicate how the report is to be parsed by passing in a '-blasttype' parameter: $searchio = Bio::SearchIO->new('-tempfile' => 1, '-format' => 'blastxml', '-file' => 'psiblast.xml', '-blasttype' => 'psiblast'); Otherwise it chunks the individual iterations out as separate BLAST reports and parses them as separate reports. Tests have also been added to SearchIO.t. I will update the HOWTO and blastxml docs soon. chris From cjfields at uiuc.edu Fri Nov 30 02:41:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 20:41:49 -0600 Subject: [Bioperl-l] Bio::Tools::Run::Primer3 In-Reply-To: References: Message-ID: <866C501B-EBFD-4E55-939E-AA97182C9EC4@uiuc.edu> It's probably safer to create a new instance each time but it really shouldn't be necessary for a wrapper module; this sounds like a bug to me. Could you file it in Bugzilla? On Nov 27, 2007, at 7:06 PM, Caroline Johnston wrote: > Hello, > > I was playing around with Primer3, and I hit a problem. Not sure if > it's a > bug or if I was doing something I wasn't supposed to, but if it's the > latter, I thought it might save someone else half an hour of banging > their > head of a keyboard if I mentioned it: > > What I was doing was roughly: > > # create a primer3 obj > my $p3 = ...Primer3->new(); > > # loop through some sequences generating primers for > # each of them using the same primer3 obj > while (@some_bio_seqs){ > my $res = $p3->run; > ... > } > > This worked fine for a while, but broke when I tried to set > PRIMER_MIN_GC, > at which point it worked for a few sequences then I got a "can't place > primer on sequence" error. > > After a bit of faffing about, I think the problem occurs when no > primers > are found. In which case $p3 still has the primers from the previous > run, > which don't come from the current sequence, so can't be placed on > it. I > tried calling $p3->cleanup in the loop, but that didn't work either. > Creating a new $p3 every time works fine. > > Are you supposed to create a new Primer3 object for every sequence? > (Apologies if I missed the relevant bit of the docs). > > Cheers, > Cass xx > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From paulhengen at coh.org Thu Nov 29 01:20:42 2007 From: paulhengen at coh.org (Paul N. Hengen) Date: Wed, 28 Nov 2007 17:20:42 -0800 (PST) Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs Message-ID: <14017289.post@talk.nabble.com> Hi. I have a number of gene IDs from Entrez and I want to find the start and end locations in the human genome. This seemed simple enough, so I started working through some of the examples for using the EntrezGene module at www.bioperl.org Of course this did not work because the core installation does not include this module. So, I think I have two choices (1) install the module (how?), or (2) find an easier way to get the locations in the human genome. I want to use the locations to grab sequences out of the genome. Can anyone offer advice on this? Thanks. -Paul. -- Paul N. Hengen, Ph.D. Hematopoietic Stem Cell and Leukemia Research City of Hope National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 USA mailto:paulhengen at coh.org -- View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From Viktor.Martyanov at Dartmouth.EDU Thu Nov 29 20:20:19 2007 From: Viktor.Martyanov at Dartmouth.EDU (Viktor Martyanov) Date: 29 Nov 2007 15:20:19 -0500 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases Message-ID: <193573097@newdonner.Dartmouth.EDU> A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 445 bytes Desc: not available URL: From alison.waller at utoronto.ca Thu Nov 29 16:20:59 2007 From: alison.waller at utoronto.ca (alison waller) Date: Thu, 29 Nov 2007 11:20:59 -0500 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) Message-ID: <002501c832a3$d3e09cf0$d81efea9@AWALL> Hi all, I would like to install the CVS version of bioperl as I know of some code changes that will be useful to me. However, I am having problems installing it. I am trying to install bioperl in my home directly on a linux cluster. I used > cd bioperl-live * perl Build.PL -install /home/awaller However after the build command I got a lot of errors. Do I have to also have perl installed in my home directory?? There is perl installed on the cluster in /usr/bin. Do I need to point to this or does Build.PL automatically look there? I noticed a few errors about not having permission and a few about not being able to connect. I've copied a portion of the messages after my Build.pl command. Any help would be appreciated, alison Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/02packages.details.txt.gz Trying to get away with old file: 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 /root/.cpan/sources/modules/02packages.details.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz Database was generated on Sat, 10 Nov 2007 22:36:34 GMT There's a new CPAN.pm version (v1.9204) available! [Current version is v1.7601] You might want to try install Bundle::CPAN reload cpan without quitting the current session. It should be a seamless upgrade while we are running... Warning: You are not allowed to write into directory "/root/.cpan/sources/modules". I'll continue, but if you encounter problems, they may be due to insufficient permissions. Fetching with LWP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission denied] Fetching with Net::FTP: ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from ftp.nrc.ca Fetching with LWP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[FTP close response: 500 Network seems to have barfed - Let's all phone our ISP and go postal! Unknown command. ] Fetching with Net::FTP: ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2265 Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca Fetching with LWP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'cpan.mirror.cygnal.ca'] Fetching with Net::FTP: ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Fetching with LWP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname 'mirror.isurf.ca'] Fetching with Net::FTP: ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Trying with "/usr/bin/lynx -source" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/lynx -source" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/lynx -source "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/ncftp" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz Use ncftpget or ncftpput to handle file URLs. System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Trying with "/usr/bin/wget -O -" to get ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission denied System call "/usr/bin/wget -O - "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > /root/.cpan/sources/modules/03modlist.data" returned status 1 (wstat 256), left /root/.cpan/sources/modules/03modlist.data.gz with size 141973 Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" Local directory now /root/.cpan/sources/modules local: 03modlist.data.gz: Permission denied Bad luck... Still failed! Can't access URL ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: cpan.mirror.cygnal.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. Issuing "/usr/bin/ftp -n" ftp: mirror.isurf.ca: Unknown host Not connected. Local directory now /root/.cpan/sources/modules Not connected. Not connected. Not connected. Not connected. Not connected. Not connected. Bad luck... Still failed! Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz. Please check, if the URLs I found in your configuration file (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are valid. The urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/' Could not fetch modules/03modlist.data.gz Trying to get away with old file: 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 /root/.cpan/sources/modules/03modlist.data.gz Going to read /root/.cpan/sources/modules/03modlist.data.gz Going to write /root/.cpan/Metadata can't create /root/.cpan/Metadata: Permission denied at /usr/share/perl/5.8/CPAN.pm line 3432 Running install for module Test::Harness Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at /usr/share/perl/5.8/CPAN.pm line 2342 ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller at chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 From cjfields at uiuc.edu Fri Nov 30 04:53:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:53:09 -0600 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: Alison, There are directions on how to do this here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA (TinyURL link) http://tinyurl.com/3263dd Note the additional configuration for CPAN in that section; you'll need to set up CPAN so it installs everything locally. chris On Nov 29, 2007, at 10:20 AM, alison waller wrote: > Hi all, > > > > I would like to install the CVS version of bioperl as I know of > some code > changes that will be useful to me. However, I am having problems > installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. > > > > I used > > > >> cd bioperl-live > > * perl Build.PL -install /home/awaller > > > > However after the build command I got a lot of errors. Do I have to > also > have perl installed in my home directory?? There is perl installed > on the > cluster in /usr/bin. Do I need to point to this or does Build.PL > automatically look there? I noticed a few errors about not having > permission and a few about not being able to connect. I've copied a > portion > of the messages after my Build.pl command. > > > > Any help would be appreciated, > > > > alison > > > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://mirror.isurf.ca/pub/CPAN/modules/02packages.details.txt.gz. > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/02packages.details.txt.gz > > Trying to get away with old file: > > 3604718 584 -rw-r--r-- 1 0 0 592967 Nov 12 22:53 > /root/.cpan/sources/modules/02packages.details.txt.gz > > Going to read /root/.cpan/sources/modules/02packages.details.txt.gz > > Database was generated on Sat, 10 Nov 2007 22:36:34 GMT > > > > There's a new CPAN.pm version (v1.9204) available! > > [Current version is v1.7601] > > You might want to try > > install Bundle::CPAN > > reload cpan > > without quitting the current session. It should be a seamless upgrade > > while we are running... > > > > Warning: You are not allowed to write into directory > "/root/.cpan/sources/modules". > > I'll continue, but if you encounter problems, they may be due > > to insufficient permissions. > > Fetching with LWP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[Cannot write to > '/root/.cpan/sources/modules/03modlist.data.gz-25787': Permission > denied] > > Fetching with Net::FTP: > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from ftp.nrc.ca > > Fetching with LWP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[FTP close response: 500 Network > seems to > have barfed - Let's all phone our ISP and go postal! > > Unknown command. > > ] > > Fetching with Net::FTP: > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Cannot open Local file /root/.cpan/sources/modules/03modlist.data.gz: > Permission denied > > at /usr/share/perl/5.8/CPAN.pm line 2265 > > Couldn't fetch 03modlist.data.gz from cpan.sunsite.ualberta.ca > > Fetching with LWP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'cpan.mirror.cygnal.ca'] > > Fetching with Net::FTP: > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Fetching with LWP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > LWP failed with code[500] message[LWP::Protocol::MyFTP: Bad hostname > 'mirror.isurf.ca'] > > Fetching with Net::FTP: > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/lynx -source" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/lynx -source > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/ncftp" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > Use ncftpget or ncftpput to handle file URLs. > > > > System call "cd /root/.cpan/sources/modules && /usr/bin/ncftp > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" " > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > > > Trying with "/usr/bin/wget -O -" to get > > ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > > sh: line 1: /root/.cpan/sources/modules/03modlist.data: Permission > denied > > > > System call "/usr/bin/wget -O - > "ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz" > > /root/.cpan/sources/modules/03modlist.data" > > returned status 1 (wstat 256), left > > /root/.cpan/sources/modules/03modlist.data.gz with size 141973 > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL ftp://ftp.nrc.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > Local directory now /root/.cpan/sources/modules > > local: 03modlist.data.gz: Permission denied > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.sunsite.ualberta.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: cpan.mirror.cygnal.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL > ftp://cpan.mirror.cygnal.ca/pub/CPAN/modules/03modlist.data.gz. > > > > Issuing "/usr/bin/ftp -n" > > ftp: mirror.isurf.ca: Unknown host > > Not connected. > > Local directory now /root/.cpan/sources/modules > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Not connected. > > Bad luck... Still failed! > > Can't access URL ftp://mirror.isurf.ca/pub/CPAN/modules/03modlist.data.gz > . > > > > Please check, if the URLs I found in your configuration file > > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/ > CPAN) are > > valid. The urllist can be edited. E.g. with 'o conf urllist push > > ftp://myurl/' > > > > Could not fetch modules/03modlist.data.gz > > Trying to get away with old file: > > 3604719 144 -rw-r--r-- 1 0 0 141973 Nov 12 22:53 > /root/.cpan/sources/modules/03modlist.data.gz > > Going to read /root/.cpan/sources/modules/03modlist.data.gz > > Going to write /root/.cpan/Metadata > > can't create /root/.cpan/Metadata: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 3432 > > Running install for module Test::Harness > > Running make for A/AN/ANDYA/Test-Harness-3.00.tar.gz > > mkdir /root/.cpan/sources/authors/id/A/AN: Permission denied at > /usr/share/perl/5.8/CPAN.pm line 2342 > > ****************************************** > Alison S. Waller M.A.Sc. > Doctoral Candidate > awaller at chem-eng.utoronto.ca > 416-978-4222 (lab) > Department of Chemical Engineering > Wallberg Building > 200 College st. > Toronto, ON > M5S 3E5 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Nov 30 04:57:36 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Nov 2007 22:57:36 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- core (I think they were added prior to the 1.5.1 release, but I'm not positive). If possible you should try installing bioperl 1.5.2 or the latest code from CVS. For directions on installing Bioperl for most OS's go here: http://www.bioperl.org/wiki/Installing_BioPerl From CVS: http://www.bioperl.org/wiki/Using_CVS chris On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org > > -- > View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Nov 30 08:45:57 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 30 Nov 2007 08:45:57 +0000 Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball from CVS) In-Reply-To: <002501c832a3$d3e09cf0$d81efea9@AWALL> References: <002501c832a3$d3e09cf0$d81efea9@AWALL> Message-ID: <474FCDC5.5020100@sendu.me.uk> alison waller wrote: > I would like to install the CVS version of bioperl as I know of some code > changes that will be useful to me. However, I am having problems installing > it. > > I am trying to install bioperl in my home directly on a linux cluster. [...] > Please check, if the URLs I found in your configuration file > (ftp://ftp.nrc.ca/pub/CPAN/, ftp://cpan.sunsite.ualberta.ca/pub/CPAN/, > ftp://cpan.mirror.cygnal.ca/pub/CPAN/, ftp://mirror.isurf.ca/pub/CPAN) are > valid. The urllist can be edited. E.g. with 'o conf urllist push > ftp://myurl/' Either these urls are invalid as suggested (try setting the urllist to nothing), or your linux cluster doesn't have internet access. You can't do a 'proper' install of BioPerl and its dependencies without internet access. However, for most purposes simply downloading the BioPerl modules (ie. from a different machine with internet access) and pointing your PERL5LIB to their location is sufficient. You can download CVS modules from the BioPerl website individually, or as a tarball or everything. From MEC at stowers-institute.org Fri Nov 30 14:12:09 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 30 Nov 2007 08:12:09 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> References: <14017289.post@talk.nabble.com> Message-ID: How many, how often? Use ensembl biomart! First time interactively. Then if you to pipeline it, take the perl code it generates for you and run it - of course you'll have to install the Ensembl Perl API.... Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Paul N. Hengen > Sent: Wednesday, November 28, 2007 7:21 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs > > > Hi. > > I have a number of gene IDs from Entrez and I want to find > the start and end locations in the human genome. This seemed > simple enough, so I started working through some of the > examples for using the EntrezGene module at www.bioperl.org > Of course this did not work because the core installation > does not include this module. So, I think I have two choices > (1) install the module (how?), or (2) find an easier way to > get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research City of Hope > National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 > USA mailto:paulhengen at coh.org > > -- > View this message in context: > http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E > ntrez-IDs-tf4894403.html#a14017289 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bosborne11 at verizon.net Fri Nov 30 14:38:58 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 30 Nov 2007 09:38:58 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <14017289.post@talk.nabble.com> Message-ID: Paul, Have you taken a look at this page? http://www.bioperl.org/wiki/Getting_Genomic_Sequences There's code there that looks similar to what you're proposing. Brian O. On 11/28/07 8:20 PM, "Paul N. Hengen" wrote: > > Hi. > > I have a number of gene IDs from Entrez and I want to find the > start and end locations in the human genome. This seemed simple > enough, so I started working through some of the examples for > using the EntrezGene module at www.bioperl.org Of course this > did not work because the core installation does not include this > module. So, I think I have two choices (1) install the module (how?), > or (2) find an easier way to get the locations in the human genome. > I want to use the locations to grab sequences out of the genome. > Can anyone offer advice on this? Thanks. > > -Paul. > > -- > Paul N. Hengen, Ph.D. > Hematopoietic Stem Cell and Leukemia Research > City of Hope National Medical Center > 1500 E. Duarte Road, Duarte, CA 91010 USA > mailto:paulhengen at coh.org From cjfields at uiuc.edu Fri Nov 30 15:47:32 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Nov 2007 09:47:32 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47502C75.60809@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> Message-ID: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask Mingyi Liu if he would like to include this parser with BioPerl (since it requires it, makes sense to me, and it avoids the circular dependency that has plagued these modules). chris On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > Chris Fields wrote: > Chris, > Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the > low-level parser and is not part of bioperl. There is a circular > dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... > Paul, you can get it from CPAN and this should make > Bio::SeqIO::entrezgene functional for you. > Stefan > > >> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >> core (I think they were added prior to the 1.5.1 release, but I'm not >> positive). If possible you should try installing bioperl 1.5.2 or >> the >> latest code from CVS. >> >> For directions on installing Bioperl for most OS's go here: >> >> http://www.bioperl.org/wiki/Installing_BioPerl >> >> From CVS: >> >> http://www.bioperl.org/wiki/Using_CVS >> >> chris >> >> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >> >> >>> Hi. >>> >>> I have a number of gene IDs from Entrez and I want to find the >>> start and end locations in the human genome. This seemed simple >>> enough, so I started working through some of the examples for >>> using the EntrezGene module at www.bioperl.org Of course this >>> did not work because the core installation does not include this >>> module. So, I think I have two choices (1) install the module >>> (how?), >>> or (2) find an easier way to get the locations in the human genome. >>> I want to use the locations to grab sequences out of the genome. >>> Can anyone offer advice on this? Thanks. >>> >>> -Paul. >>> >>> -- >>> Paul N. Hengen, Ph.D. >>> Hematopoietic Stem Cell and Leukemia Research >>> City of Hope National Medical Center >>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>> mailto:paulhengen at coh.org >>> >>> -- >>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stefan.kirov at bms.com Fri Nov 30 16:12:22 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 11:12:22 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> Message-ID: <47503666.8090004@bms.com> Chris Fields wrote: > My bad. I always forget about Bio::ASN1::Entrezgene. We should ask > Mingyi Liu if he would like to include this parser with BioPerl (since > it requires it, makes sense to me, and it avoids the circular > dependency that has plagued these modules). > Yes, I think this would be a good step. Stefan > chris > > On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: > > >> Chris Fields wrote: >> Chris, >> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >> low-level parser and is not part of bioperl. There is a circular >> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >> Paul, you can get it from CPAN and this should make >> Bio::SeqIO::entrezgene functional for you. >> Stefan >> >> >> >>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>> core (I think they were added prior to the 1.5.1 release, but I'm not >>> positive). If possible you should try installing bioperl 1.5.2 or >>> the >>> latest code from CVS. >>> >>> For directions on installing Bioperl for most OS's go here: >>> >>> http://www.bioperl.org/wiki/Installing_BioPerl >>> >>> From CVS: >>> >>> http://www.bioperl.org/wiki/Using_CVS >>> >>> chris >>> >>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>> >>> >>> >>>> Hi. >>>> >>>> I have a number of gene IDs from Entrez and I want to find the >>>> start and end locations in the human genome. This seemed simple >>>> enough, so I started working through some of the examples for >>>> using the EntrezGene module at www.bioperl.org Of course this >>>> did not work because the core installation does not include this >>>> module. So, I think I have two choices (1) install the module >>>> (how?), >>>> or (2) find an easier way to get the locations in the human genome. >>>> I want to use the locations to grab sequences out of the genome. >>>> Can anyone offer advice on this? Thanks. >>>> >>>> -Paul. >>>> >>>> -- >>>> Paul N. Hengen, Ph.D. >>>> Hematopoietic Stem Cell and Leukemia Research >>>> City of Hope National Medical Center >>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>> mailto:paulhengen at coh.org >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From stefan.kirov at bms.com Fri Nov 30 15:29:57 2007 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 30 Nov 2007 10:29:57 -0500 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: <47502C75.60809@bms.com> Chris Fields wrote: Chris, Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the low-level parser and is not part of bioperl. There is a circular dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... Paul, you can get it from CPAN and this should make Bio::SeqIO::entrezgene functional for you. Stefan > Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- > core (I think they were added prior to the 1.5.1 release, but I'm not > positive). If possible you should try installing bioperl 1.5.2 or the > latest code from CVS. > > For directions on installing Bioperl for most OS's go here: > > http://www.bioperl.org/wiki/Installing_BioPerl > > From CVS: > > http://www.bioperl.org/wiki/Using_CVS > > chris > > On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: > > >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find the >> start and end locations in the human genome. This seemed simple >> enough, so I started working through some of the examples for >> using the EntrezGene module at www.bioperl.org Of course this >> did not work because the core installation does not include this >> module. So, I think I have two choices (1) install the module (how?), >> or (2) find an easier way to get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research >> City of Hope National Medical Center >> 1500 E. Duarte Road, Duarte, CA 91010 USA >> mailto:paulhengen at coh.org >> >> -- >> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arareko at campus.iztacala.unam.mx Fri Nov 30 17:01:29 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 30 Nov 2007 11:01:29 -0600 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: <47503666.8090004@bms.com> References: <14017289.post@talk.nabble.com> <47502C75.60809@bms.com> <9D7ABDF6-489A-4C52-AB63-CE98915BC44F@uiuc.edu> <47503666.8090004@bms.com> Message-ID: <475041E9.8050909@campus.iztacala.unam.mx> I'm Cc'ing Mingyi Liu in this so he can know about your proposal (in the past, he mentioned he doesn't track the list closely). Mauricio. Stefan Kirov wrote: > Chris Fields wrote: >> My bad. I always forget about Bio::ASN1::Entrezgene. We should ask >> Mingyi Liu if he would like to include this parser with BioPerl (since >> it requires it, makes sense to me, and it avoids the circular >> dependency that has plagued these modules). >> > Yes, I think this would be a good step. > Stefan >> chris >> >> On Nov 30, 2007, at 9:29 AM, Stefan Kirov wrote: >> >> >>> Chris Fields wrote: >>> Chris, >>> Bio::SeqIO::entrezgene requires Bio::ASN1::Entrezgene, which is the >>> low-level parser and is not part of bioperl. There is a circular >>> dependency- Bio::ASN1::Entrezgene depends on Bio::SeqIO (I think).... >>> Paul, you can get it from CPAN and this should make >>> Bio::SeqIO::entrezgene functional for you. >>> Stefan >>> >>> >>> >>>> Bio::DB::EntrezGene and Bio::SeqIO::entrezgene are part of bioperl- >>>> core (I think they were added prior to the 1.5.1 release, but I'm not >>>> positive). If possible you should try installing bioperl 1.5.2 or >>>> the >>>> latest code from CVS. >>>> >>>> For directions on installing Bioperl for most OS's go here: >>>> >>>> http://www.bioperl.org/wiki/Installing_BioPerl >>>> >>>> From CVS: >>>> >>>> http://www.bioperl.org/wiki/Using_CVS >>>> >>>> chris >>>> >>>> On Nov 28, 2007, at 7:20 PM, Paul N. Hengen wrote: >>>> >>>> >>>> >>>>> Hi. >>>>> >>>>> I have a number of gene IDs from Entrez and I want to find the >>>>> start and end locations in the human genome. This seemed simple >>>>> enough, so I started working through some of the examples for >>>>> using the EntrezGene module at www.bioperl.org Of course this >>>>> did not work because the core installation does not include this >>>>> module. So, I think I have two choices (1) install the module >>>>> (how?), >>>>> or (2) find an easier way to get the locations in the human genome. >>>>> I want to use the locations to grab sequences out of the genome. >>>>> Can anyone offer advice on this? Thanks. >>>>> >>>>> -Paul. >>>>> >>>>> -- >>>>> Paul N. Hengen, Ph.D. >>>>> Hematopoietic Stem Cell and Leukemia Research >>>>> City of Hope National Medical Center >>>>> 1500 E. Duarte Road, Duarte, CA 91010 USA >>>>> mailto:paulhengen at coh.org >>>>> >>>>> -- >>>>> View this message in context: http://www.nabble.com/Collecting-genomic-DNA-sequences-using-Entrez-IDs-tf4894403.html#a14017289 >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Fri Nov 30 20:21:13 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 30 Nov 2007 12:21:13 -0800 Subject: [Bioperl-l] Trying to find multiple homologs in multiple databases In-Reply-To: <193573097@newdonner.Dartmouth.EDU> References: <193573097@newdonner.Dartmouth.EDU> Message-ID: <631A0D08-4135-4A26-962A-4D1DB31F7F05@bioperl.org> Viktor - Bio::SearchIO helps you parse BLAST reports, but don't underestimate the power of going as low-tech as possible and outputting scores with the -m 8 option in NCBI-BLAST or -mformat 3 that give you tabular format that is parseable with the 'split' function in Perl. See the wiki http://bioperl.org/wiki for HOWTOs and examples of using the parsers. You might also consider already-written tools like OrthoMCL, InParanoid, and other that help you define relationships like orthologs and paralogs among species. There also exist a few published web resources that have pre-computed homologs for you, might take a look around first unless the point of the project is to learn how to run these kinds of searches. For general Perl help consider Perlmonks.org and some of the introductory books that are available. -jason -- Jason Stajich jason at bioperl.org On Nov 29, 2007, at 12:20 PM, Viktor Martyanov wrote: > Hello, > > My name is Viktor Martyanov and I am a Ph.D. student in biology at > Dartmouth. > > I need to be able to use a set of genes or FASTA sequences from S. > cerevisiae and retrieve a set of corresponding homologs from other > fungal species via BLASTP searches. > > I would like to find out if there are Perl scripts that approach > this problem. By the way, is there a Perl community or forum where > I could post this question? > > Thanks very much. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.moore at genetics.utah.edu Fri Nov 30 22:03:23 2007 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 30 Nov 2007 15:03:23 -0700 Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez IDs In-Reply-To: References: <14017289.post@talk.nabble.com> Message-ID: Paul, One other alternative is to use the UCSC table browser (http:// genome.ucsc.edu/cgi-bin/hgTables?command=start). Select your organism, upload your ID list. Select you output options. You can download the coordinates or the fasta directly. You have options for including or excluding various parts of the gene, and upstream/ downstream sequences. This is similar the solution that Malcom suggested except the Ensembl option can be run repeatedly as perl code as he pointed out. UCSC allows you to do remote connections to their MySQL server so you could set up a repeated task and more complex queries that way with the UCSC method. Barry On Nov 30, 2007, at 7:12 AM, Cook, Malcolm wrote: > How many, how often? > > Use ensembl biomart! > > First time interactively. > > Then if you to pipeline it, take the perl code it generates for you > and > run it - of course you'll have to install the Ensembl Perl API.... > > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Paul N. Hengen >> Sent: Wednesday, November 28, 2007 7:21 PM >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Collecting genomic DNA sequences using Entrez >> IDs >> >> >> Hi. >> >> I have a number of gene IDs from Entrez and I want to find >> the start and end locations in the human genome. This seemed >> simple enough, so I started working through some of the >> examples for using the EntrezGene module at www.bioperl.org >> Of course this did not work because the core installation >> does not include this module. So, I think I have two choices >> (1) install the module (how?), or (2) find an easier way to >> get the locations in the human genome. >> I want to use the locations to grab sequences out of the genome. >> Can anyone offer advice on this? Thanks. >> >> -Paul. >> >> -- >> Paul N. Hengen, Ph.D. >> Hematopoietic Stem Cell and Leukemia Research City of Hope >> National Medical Center 1500 E. Duarte Road, Duarte, CA 91010 >> USA mailto:paulhengen at coh.org >> >> -- >> View this message in context: >> http://www.nabble.com/Collecting-genomic-DNA-sequences-using-E >> ntrez-IDs-tf4894403.html#a14017289 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l