From alexeymorozov1991 at gmail.com Thu Aug 2 23:54:02 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 3 Aug 2012 12:54:02 +0900 Subject: [Bioperl-l] Random trees generation Message-ID: Is it true that for generating random trees with integer branch lenghts I have to write my own generator? Seems like Tree::RandomFactory is only able to produce one with very small real values (and even that not at all branches). Is there no other good module fo that around? Alexey. From jason.stajich at gmail.com Fri Aug 3 11:52:48 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 3 Aug 2012 08:52:48 -0700 Subject: [Bioperl-l] Random trees generation In-Reply-To: References: Message-ID: The current Bioperl random tree factory is for use with the coalescent which I needed for my research -- it may or may not be suitable for your purposes. The module documentation echoes a call for more contribution to the implementations. Rutger's Bio::Phylo can generates random trees you can try it out too. http://search.cpan.org/~rvosa/Bio-Phylo-0.50/lib/Bio/Phylo/Generator.pm It really depends on what model you are trying to do. There are many tree simulators out there that may suit your needs better. http://evolution.genetics.washington.edu/phylip/software.html#Simulation Jason On Aug 2, 2012, at 8:54 PM, Alexey Morozov wrote: > Is it true that for generating random trees with integer branch lenghts I > have to write my own generator? Seems like Tree::RandomFactory is only able > to produce one with very small real values (and even that not at all > branches). Is there no other good module fo that around? > > Alexey. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From soyestepi at gmail.com Tue Aug 7 15:28:58 2012 From: soyestepi at gmail.com (Estefania) Date: Tue, 7 Aug 2012 16:28:58 -0300 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 Message-ID: Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. Using this script (previously cited here), nothing is printed and I have no errror messages. #!/usr/bin/perl -w use strict;use Data::Dumper; use Bio::SearchIO; my $infile = $ARGV[0]; # infernal report my $parser = Bio::SearchIO->new(-format => 'infernal', -file => $infile); while( my $result = $parser->next_result ) { print $result->query_name . "\n"; } If I try to print other elements, the only ones I can print are:$parser->algorithm(), $parser->version(), and for: $result = $parser->next_result, it works just for $size = $result->database_letters() and $dbname = $result->database_name() (but displays wrong name) Is this a problem of the version of Infernal? How can I parse this output? I also have tabulated output. Thanks in advance estepi From maquino at knome.com Tue Aug 7 20:16:56 2012 From: maquino at knome.com (Mark Aquino) Date: Wed, 8 Aug 2012 00:16:56 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: References: Message-ID: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Try changing the use to Bio::SeqIO::Infernal and see if that works. Sent from my iPhone On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. > Using this script (previously cited here), nothing is printed and I have no > errror messages. > #!/usr/bin/perl -w > use strict;use Data::Dumper; > use Bio::SearchIO; > > my $infile = $ARGV[0]; # infernal report > my $parser = Bio::SearchIO->new(-format => 'infernal', > -file => $infile); > while( my $result = $parser->next_result ) { > print $result->query_name . "\n"; > } > > If I try to print other elements, the only ones I can print > are:$parser->algorithm(), $parser->version(), > and for: $result = $parser->next_result, it works just for $size = > $result->database_letters() and $dbname = $result->database_name() (but > displays wrong name) > > Is this a problem of the version of Infernal? How can I parse this output? > I also have tabulated output. > Thanks in advance > estepi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 14 10:28:27 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 14 Aug 2012 10:28:27 -0400 Subject: [Bioperl-l] Protein GI to nucleotide GI Message-ID: HI All, I have thousands of protein GI/accession no. , is there any way i can get their corresponding nucleotide GIs. Thanks Shalabh From jason.stajich at gmail.com Tue Aug 14 14:48:19 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 14 Aug 2012 11:48:19 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: References: Message-ID: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Did you read the FAQ, this question is answered in there. http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F On Aug 14, 2012, at 7:28 AM, shalabh sharma wrote: > HI All, > I have thousands of protein GI/accession no. , is there any way i > can get their corresponding nucleotide GIs. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Wed Aug 15 11:50:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 15 Aug 2012 15:50:06 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> References: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B6C175@CHIMBX5.ad.uillinois.edu> Mark, no, the parser is Bio::SearchIO-based. My guess is this is a legitimate bug, Infernal 1.0.2 is the latest release and it is very possible there was a format change that is breaking things. Estepi, can you send me an example file to test? I know Infernal was recently updated and is much faster, I want to make sure BioPerl parses it correctly. chris On Aug 7, 2012, at 7:16 PM, Mark Aquino wrote: > Try changing the use to Bio::SeqIO::Infernal and see if that works. > > Sent from my iPhone > > On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > >> Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. >> Using this script (previously cited here), nothing is printed and I have no >> errror messages. >> #!/usr/bin/perl -w >> use strict;use Data::Dumper; >> use Bio::SearchIO; >> >> my $infile = $ARGV[0]; # infernal report >> my $parser = Bio::SearchIO->new(-format => 'infernal', >> -file => $infile); >> while( my $result = $parser->next_result ) { >> print $result->query_name . "\n"; >> } >> >> If I try to print other elements, the only ones I can print >> are:$parser->algorithm(), $parser->version(), >> and for: $result = $parser->next_result, it works just for $size = >> $result->database_letters() and $dbname = $result->database_name() (but >> displays wrong name) >> >> Is this a problem of the version of Infernal? How can I parse this output? >> I also have tabulated output. >> Thanks in advance >> estepi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From daisieh at gmail.com Thu Aug 16 14:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From daisieh at gmail.com Thu Aug 16 14:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hpm at ebi.ac.uk Fri Aug 17 06:39:43 2012 From: hpm at ebi.ac.uk (Hamish McWilliam) Date: Fri, 17 Aug 2012 11:39:43 +0100 Subject: [Bioperl-l] Programmatic Access To Biological Databases (Perl) Message-ID: <502E1F6F.7030205@ebi.ac.uk> *Date:* 1st-4th October 2012 *Venue:* EMBL-EBI, Hinxton, Nr Cambridge, CB10 1SD, UK *Registration Deadline:* 31st August 2012 This Perl based course in programmatic access to biological databases is ideal for bioinformaticians and biological researchers looking to develop data analysis pipelines, access data in an automated manner or to integrate web services into their own applications. What will it cover? - Overview of public domain biological databases at the EMBL-EBI. - Principles of Web Services, how they work and how to find them. - Integrating data from multiple sources. - Programmatic access to a variety of bioinformatic analysis tools. For a detailed programme and information about registration please see http://www.ebi.ac.uk/training/handson/course_121112_webservices.html All the best, Hamish -- ============================================================ Mr Hamish McWilliam European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK URL: http://www.ebi.ac.uk/ ============================================================ From saladi1 at illinois.edu Fri Aug 17 22:03:51 2012 From: saladi1 at illinois.edu (Shyam Saladi) Date: Fri, 17 Aug 2012 19:03:51 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> References: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Message-ID: Another way is through NCBI's E-utilities -- http://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Finding_Related_Data_Through_En On Tue, Aug 14, 2012 at 11:48 AM, Jason Stajich wrote: > Did you read the FAQ, this question is answered in there. > > http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F > > On Aug 14, 2012, at 7:28 AM, shalabh sharma > wrote: > > > HI All, > > I have thousands of protein GI/accession no. , is there any way > i > > can get their corresponding nucleotide GIs. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Tue Aug 21 11:32:17 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 21 Aug 2012 17:32:17 +0200 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> Message-ID: <5033AA01.8010801@gmail.com> Hi all, I have tested the code some more, made a couple of changes and put the branch in sync with master. This codes looks ready to me. I am prepared to either merge the branch in master or making a separate distro. Best, Florent On 06/03/12 02:08, Fields, Christopher J wrote: > I'll check it out. Want me to post test results here (I have access to a few systems to test on). > > chris > > On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: > >> To all interested, >> the AmpliconSearch module is in a decent state. If you want to test it or improve it, head to https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm >> Regards, >> Florent >> >> >> On 01/03/12 12:42, Fields, Christopher J wrote: >>> Florent, >>> >>> Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. >>> >>> chris >>> >>> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >>> >>>> There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. >>>> >>>> As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. >>>> >>>> BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. >>>> >>>> chris >>>> >>>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>>> >>>>> Thanks for everybody's feedback. >>>>> >>>>> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >>>>> >>>>> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >>>>> >>>>> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >>>>> >>>>> Florent >>>>> >>>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>>>>> >>>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>>> >>>>>> and it's not there. >>>>>> >>>>>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>>>>> >>>>>> chris >>>>>> >>>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>>> >>>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>>> >>>>>>> (There's supposedly a more recent version here: >>>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>>> but that file seems to be truncated). >>>>>>> >>>>>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>>>>> >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> >>>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>>>> github repo contains history going back to the original CVS repo. >>>>>>>> You can try contacting the author, possibly. >>>>>>>> >>>>>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>>>>> using regular expressions. I wanted to modularize my code more and >>>>>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Florent >>>>>>>> Maybe the last (InSilicoPCR). >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ Bioperl-l mailing >>>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l From BottomsC at missouri.edu Wed Aug 22 18:12:49 2012 From: BottomsC at missouri.edu (Bottoms, Christopher A) Date: Wed, 22 Aug 2012 22:12:49 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis Message-ID: Dear BioPerl community, I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. Thank you for all you do. Sincerely, Christopher Bottoms -------------------------------------- SYNOPSIS perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run DESCRIPTION This module pipelines steps in the analysis of SELEX (Systematic Evolution of Ligands through EXponential enrichment) data. This main module creates scripts to do the following: (1) Cluster similar sequences based on edit distance. (2) Align sequences within each cluster (using mafft). (3) Calculate the secondary structure of the aligned sequences (using RNAalifold, from the Vienna RNA package) (4) Build covariance models using cmbuild from Infernal. The module Bio::App::SELEX::CovarianceSearch can also be used to create scripts for doing iterative refinements of covariance models. EXAMPLE USE perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run (The file 'simple.seqs' should only contain sequences, one per line.) This will cluster the sequences found in 'simple.seqs' and create a FASTA file for each one. The FASTA files will be grouped into batches (i.e. one per cpu requested) that will be placed in a separate directory for each batch, and processed within that directory. At the end of processing, for each cluster there will be a covariance model and postscript illustration files. The batch script used to process each batch will be located in the respective batch directory. To produce the scripts without running them, simply exclude the --run flag from the command line. CONFIGURATION AND ENVIRONMENT As written, this code makes heavy use of UNIX utilities and is therefore only supported on UNIX-like environemnts (e.g. Linux, UNIX, Mac OS X). Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add the directories containing their executables to your PATH, so that the first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) will be generated for you with all of the correct parameters. Otherwise, you'll need to update your cluster.cfg file manually. After installing mafft, Infernal, and Vienna RNA packages, add the directories in which their executables reside in your PATH. For example, assuming that the mafft executable is located in the directory '/usr/local/myapps/bin/', you would want to add it to your PATH. To make sure this is done every time you open a terminal window, add this to your .bashrc file, thus: echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. Then, to make it effective immediately, you can source your .bashrc file: source ~/.bashrc INSTALLATION These installation instructions assume being able to open and use a terminal window on Linux. (0) Some systems need several dependencies installed ahead of time. You may be able to skip this step. However, if subsequent steps don't work, then be sure that some basic libraries are installed, as shown below (or ask a system administrator to take care of it): For RedHat or CentOS 5.x systems (tested on CentOS 5.5) Open a terminal and then type the following command, answering all questions in the afirmative: sudo yum install gcc For RedHat or CentOS 6.x systems (tested on CentOS 6.3) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo yum install gcc sudo yum install perl-devel For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo apt-get install gcc sudo apt-get install make (1) Install the non-Perl dependencies: (Versions shown are those that we've tested. Please contact us if newer versions do not work.) Infernal 1.0.2 (http://infernal.janelia.org/) MAFFT 6.849b (http://mafft.cbrc.jp/alignment/software/) RNA Vienna package 1.8.4 (http://www.tbi.univie.ac.at/~ivo/RNA/) (2) Either (a) download and run our installer or (b) use a CPAN client to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer creates the directory 'perl5' inside your home directory. This directory is for holding Perl modules, including this module and any Perl module dependencies not already included on your system. The installer also appends commands to your .bashrc file to make it easy for the Perl runtime to find these new modules (i.e. it includes your local 'perl5/lib/perl5' directory in the PERL5LIB environment variable). (a) Use the installer i. Download installer (and name it "installer") curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 ii. Make it executable chmod u+x installer iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the installer as many as three times to get all of the Perl modules installed. Please contact us if this doesn't work after three attempts. ./installer (b) If you prefer using a CPAN client, then we recommend that you install Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system perl, to avoid overwriting core Perl modules. If this doesn't make sense to you, then please be sure to use the installer as described in (a) above. INCOMPATIBILITIES None known BUGS AND LIMITATIONS There are no known bugs in this module. Please report problems to molecules cpan org Patches are welcome. RELATED PUBLICATIONS Ditzler et. al. Manuscript currently in review. From l.m.timmermans at students.uu.nl Fri Aug 24 05:59:16 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Fri, 24 Aug 2012 11:59:16 +0200 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A wrote: > I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. That's perfectly fine. > -------------------------------------- > SYNOPSIS > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run That is a bit wrong. .pm files are modules, not scripts. You're better off adding a small script that uses your module. > EXAMPLE USE > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run > > (The file 'simple.seqs' should only contain sequences, one per line.) Why are you not using a proper sequence format, Bio::SeqIO will allow you to accept any common format. > Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add > the directories containing their executables to your PATH, so that the > first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) > will be generated for you with all of the correct parameters. Otherwise, > you'll need to update your cluster.cfg file manually. > > After installing mafft, Infernal, and Vienna RNA packages, add the > directories in which their executables reside in your PATH. > > For example, assuming that the mafft executable is located in the directory > '/usr/local/myapps/bin/', you would want to add it to your PATH. To make > sure this is done every time you open a terminal window, add this to your > .bashrc file, thus: > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > Then, to make it effective immediately, you can source your .bashrc file: > > source ~/.bashrc If possible (perhaps it's not), you may want to create a so called Alien package that installs those requirements itself. Not sure if that's possible, and probably not that urgent either. > INSTALLATION > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) > > Open a terminal and then type the following commands, answering > all questions in the afirmative: > > sudo apt-get install gcc > sudo apt-get install make The package you're looking for is called build-essentials > (2) Either (a) download and run our installer or (b) use a CPAN client > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer > creates the directory 'perl5' inside your home directory. This > directory is for holding Perl modules, including this module and any > Perl module dependencies not already included on your system. The > installer also appends commands to your .bashrc file to make it easy > for the Perl runtime to find these new modules (i.e. it includes your > local 'perl5/lib/perl5' directory in the PERL5LIB environment > variable). > > (a) Use the installer > i. Download installer (and name it "installer") > > curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 That download doesn't work for me. > ii. Make it executable > > chmod u+x installer > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the > installer as many as three times to get all of the Perl > modules installed. Please contact us if this doesn't work > after three attempts. > > ./installer If it has that many issues, it's probably wrong. I'd strongly recommend going to CPAN way. > (b) If you prefer using a CPAN client, then we recommend that you install > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system > perl, to avoid overwriting core Perl modules. If this doesn't make > sense to you, then please be sure to use the installer as > described in (a) above. Installing locally is usually a good idea, I recommend local::lib in particular. This ?overwriting core Perl modules? suggests to me you're doing something wrong anyway though. Leon From alexeymorozov1991 at gmail.com Fri Aug 24 10:21:37 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 24 Aug 2012 22:21:37 +0800 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: 2012/8/24 Leon Timmermans > On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A > wrote: > > I developed this application for a research lab here at the University > of Missouri. I was wondering if this sounded okay and if it were okay to > use the "Bio" namespace. > > That's perfectly fine. > > > -------------------------------------- > > SYNOPSIS > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > That is a bit wrong. .pm files are modules, not scripts. You're better > off adding a small script that uses your module. > > > EXAMPLE USE > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > > > (The file 'simple.seqs' should only contain sequences, one per > line.) > > Why are you not using a proper sequence format, Bio::SeqIO will allow > you to accept any common format. > > > Install Infernal, MAFFT, and the RNA Vienna package ahead of > time and add > > the directories containing their executables to your PATH, so > that the > > first time you run RNAmotifAnalysis.pm a configuration file > (cluster.cfg) > > will be generated for you with all of the correct parameters. > Otherwise, > > you'll need to update your cluster.cfg file manually. > > > > After installing mafft, Infernal, and Vienna RNA packages, add > the > > directories in which their executables reside in your PATH. > > > > For example, assuming that the mafft executable is located in > the directory > > '/usr/local/myapps/bin/', you would want to add it to your PATH. > To make > > sure this is done every time you open a terminal window, add > this to your > > .bashrc file, thus: > > > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > > > Then, to make it effective immediately, you can source your > .bashrc file: > > > > source ~/.bashrc > > If possible (perhaps it's not), you may want to create a so called > Alien package that installs those requirements itself. Not sure if > that's possible, and probably not that urgent either. > > > INSTALLATION > > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu > 12-04 LTS) > > > > Open a terminal and then type the following commands, > answering > > all questions in the afirmative: > > > > sudo apt-get install gcc > > sudo apt-get install make > > The package you're looking for is called build-essentials > > > (2) Either (a) download and run our installer or (b) use a CPAN > client > > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our > installer > > creates the directory 'perl5' inside your home directory. > This > > directory is for holding Perl modules, including this module > and any > > Perl module dependencies not already included on your > system. The > > installer also appends commands to your .bashrc file to make > it easy > > for the Perl runtime to find these new modules (i.e. it > includes your > > local 'perl5/lib/perl5' directory in the PERL5LIB environment > > variable). > > > > (a) Use the installer > > i. Download installer (and name it "installer") > > > > curl -o installer -L > http://ircf.rnet.missouri.edu:8000/share.attachment/184 > > That download doesn't work for me. > > > ii. Make it executable > > > > chmod u+x installer > > > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had > to run the > > installer as many as three times to get all of the > Perl > > modules installed. Please contact us if this > doesn't work > > after three attempts. > > > > ./installer > > If it has that many issues, it's probably wrong. I'd strongly > recommend going to CPAN way. > > > (b) If you prefer using a CPAN client, then we recommend > that you install > > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to > system > > perl, to avoid overwriting core Perl modules. If this > doesn't make > > sense to you, then please be sure to use the installer as > > described in (a) above. > > Installing locally is usually a good idea, I recommend local::lib in > particular. This ?overwriting core Perl modules? suggests to me you're > doing something wrong anyway though. > > Leon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, right? What is required to get to official bioperl distribution? I think some of my code might eventially prove useful. Alexey Morozov LIN SB RAS Irkutsk, Russia From cjfields at illinois.edu Fri Aug 24 13:39:32 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Aug 2012 17:39:32 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B85703@CHIMBX5.ad.uillinois.edu> On Aug 24, 2012, at 9:21 AM, Alexey Morozov wrote: > 2012/8/24 Leon Timmermans > >> On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A >> wrote: >>> I developed this application for a research lab here at the University >> of Missouri. I was wondering if this sounded okay and if it were okay to >> use the "Bio" namespace. >> >> ... >> Installing locally is usually a good idea, I recommend local::lib in >> particular. This ?overwriting core Perl modules? suggests to me you're >> doing something wrong anyway though. >> >> Leon >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, > right? What is required to get to official bioperl distribution? I think > some of my code might eventially prove useful. > > Alexey Morozov > LIN SB RAS > Irkutsk, Russia Alexey, Any ideas on the name? We (BioPerl) don't technically own the primary Bio:: namespace, but we do have substantial real estate there :) Confusing namespaces are my only concern. Chris, Personally, using Bio::App doesn't seem right, mainly for the same reasons that Leon mentioned already, but if the modules are the basis for an application then I think the namespace makes sense (see the App::* namespace, for instance App::cpanminus/cpanm, App::perlbrew/perlbrew, etc). Everyone, It is a good practice to ask opinions on module names here and where they should go, though. Doing so here is completely acceptable. (well, Bio* specific ones...) My thoughts: There are a number of CPAN Bio::* modules that don't use BioPerl, and I wouldn't want to discourage anyone from submitting code to something like Bio::Foo as long as the dependencies are noted. I really want to remove the artificial barrier to CPAN submission for any Bio*-related Perl code, where the BioPerl devs must bless a set of modules prior to submission; it slows down development on your code as well as BioPerl in general. I do highly suggest naming your modules in a way so they wouldn't be confused with BioPerl if possible, though, e.g. don't name something in a more specific namespace that BioPerl already occupies, such as Bio::Seq::MySeqFile, but feel free to ask if there are questions on this. Re: what to do with modules: please submit the modules/distributions independently to CPAN. I *DON'T* suggest asking us to include code within the main BioPerl distribution, unless it is something integral to the entire BioPerl distribution (e.g. core-like). The reasons are two-fold. First, CPAN is an integral part of Perl, and interactions and submission of code to it should be part of the learning curve (just as creating eggs for python or gems for ruby are parts of their respective communities). It's very easy to add BioPerl as a dependency and submit a module on one's own: https://metacpan.org/release/Bio-EUtilities https://metacpan.org/release/Bio-Tools-Primer3Redux There are lots of tutorials for doing so, and if you have multiple modules or plan on maintaining support I highly suggest looking into some of the modern approaches to distribution and release management, Dist::Zilla being the primary one. BTW, the nice side benefits of submitting to CPAN: you get basic issue tracking and cross-platform testing for free (RT, CPAN Reporters), and it's easy enough to support. Second, we have been bitten many times in the past with code that was added to the core distribution (BioPerl). These are generally cases where code was supposed to be supported by the submitting authors, but for one reason or another they disappear, and the rest of the Bioperl developers may be left 'holding the bag' so to speak. We can't easily maintain code we don't write, particularly with various coding styles, practices, etc (bioperl-live/run have around ~1000 modules). Submission to CPAN places the maintenance responsibility back where it should be, on the submitting author. Frankly, beyond any namespace issues, wouldn't you want the ability/freedom to do with your code what you want? chris From jayoung at fhcrc.org Fri Aug 24 20:56:04 2012 From: jayoung at fhcrc.org (Janet Young) Date: Fri, 24 Aug 2012 17:56:04 -0700 Subject: [Bioperl-l] cigar_line Message-ID: <3B92347B-8105-4614-AA87-0B0DC4BF101E@fhcrc.org> Hi there, I'm playing around with alignment formats, and saw the function cigar_line for SimpleAlign objects. I have a couple of questions/suggestions: 1. It looks like the cigar string is being generated with respect to the consensus sequence. That's fine, but it would also be really useful to be able to generate it with respect to the reference (first) sequence. Would that be easy to implement? Could you consider that as a feature request? 2. Is there any commonly accepted definition of CIGAR format? and/or has it changed in recent years? The definition I've seen is from the SAM format (http://samtools.sourceforge.net/SAM1.pd) and these cigar strings don't look like they're in that format. The SAM definition carries a lot of useful information that this cigar string doesn't. 3. the 100% threshold used for generating the consensus from which cigar strings are made is very stringent (and counter-intuitive to the biologist: when I hear "consensus" I don't think 100% conserved). Also different to the default for consensus_string. Any thoughts on changing that threshold, or maybe just making the documentation a little clearer on that? 4. deletions with respect to consensus sequence don't seem to be reported in the cigar string (see seq4 in my toy example script below). Is this a bug? thanks for listening! Janet ------------------------------------------------------------------- Dr. Janet Young Tapscott and Malik labs Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org ------------------------------------------------------------------- #!/usr/bin/perl use warnings; use strict; use Bio::AlignIO; my $alignString = ">seq1\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq2\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq3\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq4\n AG-GAGGAGATCGGTAGCTGTTGCTAGTT"; my $stringfh; open($stringfh, "<", \$alignString); my $in = Bio::AlignIO->new(-fh => $stringfh, -format => "fasta"); while (my $aln = $in->next_aln()) { my $consString3 = $aln->consensus_string(100); print "\nconsensus100 $consString3\n"; my %cigars = $aln->cigar_line(); foreach my $seqname (sort keys %cigars) { my $shortseqname = (split /\//, $seqname)[0]; my $seq = $aln->get_seq_by_id($shortseqname)->seq(); print "seqname $seqname seq $seq cigar1 $cigars{$seqname}\n"; } } ##### script output: # consensus100 AG?GAGG?GATCGGTAGCTG?TGCTAGTT # seqname seq1/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq2/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq3/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq4/1-28 seq AG-GAGGAGATCGGTAGCTGTTGCTAGTT cigar1 1,6:8,19:21,28 From daisieh at zoology.ubc.ca Mon Aug 27 16:05:53 2012 From: daisieh at zoology.ubc.ca (Daisie Huang) Date: Mon, 27 Aug 2012 13:05:53 -0700 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module Message-ID: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? Thanks, Daisie ----------------------------------------- Daisie Huang, PhD Rm 318, Beaty Biodiversity Centre Department of Botany University of British Columbia http://cronklab.wikidot.com/daisie-huang From cjfields at illinois.edu Mon Aug 27 16:34:18 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 27 Aug 2012 20:34:18 +0000 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module In-Reply-To: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> References: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B895DF@CHIMBX5.ad.uillinois.edu> Yes, fixing things on a github fork and submitting a pull request is generally the best approach to this. If you have more substantial improvements over time we can add you as a developer on Github. chris On Aug 27, 2012, at 3:05 PM, Daisie Huang wrote: > I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? > > Thanks, > Daisie > ----------------------------------------- > Daisie Huang, PhD > Rm 318, Beaty Biodiversity Centre > Department of Botany > University of British Columbia > http://cronklab.wikidot.com/daisie-huang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Tue Aug 28 14:09:48 2012 From: jimhu at tamu.edu (Jim Hu) Date: Tue, 28 Aug 2012 13:09:48 -0500 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning Message-ID: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() --------------------- WARNING --------------------- MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 --------------------------------------------------- It still seems to work, though. Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Tue Aug 28 16:02:11 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 20:02:11 +0000 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Is the BioProject DBSOURCE retained if you write the output back using Bio::SeqIO? chris On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() > > --------------------- WARNING --------------------- > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > --------------------------------------------------- > > It still seems to work, though. > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 28 17:19:42 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 28 Aug 2012 17:19:42 -0400 Subject: [Bioperl-l] Bug reporting help Message-ID: Hi, Can somebody with Redmine experience help me out? I have an account associated with the address scott+bioperl at scottcain.net. When I try to reset my password by following the link that is emailed to me, no matter what I enter, I'm told the login is invalid. Any idea what I can do? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Aug 28 17:56:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 21:56:28 +0000 Subject: [Bioperl-l] Bug reporting help In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8B58F@CHIMBX5.ad.uillinois.edu> That's odd; just tried this with my account and had no problem. I can try changing it via the admin page and will send it to you. chris On Aug 28, 2012, at 4:19 PM, Scott Cain wrote: > Hi, > > Can somebody with Redmine experience help me out? I have an account > associated with the address scott+bioperl at scottcain.net. When I try > to reset my password by following the link that is emailed to me, no > matter what I enter, I'm told the login is invalid. Any idea what I > can do? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mahakadry at aucegypt.edu Tue Aug 28 20:48:58 2012 From: mahakadry at aucegypt.edu (maha ahmed) Date: Wed, 29 Aug 2012 02:48:58 +0200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Tue Aug 28 21:20:56 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 29 Aug 2012 13:20:56 +1200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF60AEC8@exchsth.agresearch.co.nz> What sequence identifiers are you using and what exactly are you trying to get? Data is available via URL so a simple Perl script will retrieve that: Eg. http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=unaligned http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=newick I haven't tried it but will Bio::DB::EMBL work? --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of maha ahmed Sent: Wednesday, 29 August 2012 12:49 p.m. To: Fields, Christopher J Cc: Jim Hu; Subject: Re: [Bioperl-l] Bio::DB::GenBank new(?) warning Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their > > web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Thu Aug 30 14:07:11 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 14:07:11 -0400 Subject: [Bioperl-l] reverse complement of fastq Message-ID: HI, I have a fastq file with few million reads. I need to find reverse complement of the reads. I used 'revcom' method but its not working for fastq. I will really appreciate if anyone can help me out. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Aug 30 14:54:14 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 30 Aug 2012 18:54:14 +0000 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> If you want something that gives you revcom *very quickly*, Bioperl is sadly not the way to go just yet. However, you can use something like seqtk, which is very fast: https://github.com/lh3/seqtk Something like this should work: $ seqtk seq -r orig.fq > rc.fq chris On Aug 30, 2012, at 1:07 PM, shalabh sharma wrote: > HI, > I have a fastq file with few million reads. I need to find reverse > complement of the reads. > I used 'revcom' method but its not working for fastq. > > I will really appreciate if anyone can help me out. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Thu Aug 30 16:01:10 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 16:01:10 -0400 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> Message-ID: Hey Chris, Thanks a lot it worked and it was really fast. Thanks Shalabh On Thu, Aug 30, 2012 at 2:54 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > If you want something that gives you revcom *very quickly*, Bioperl is > sadly not the way to go just yet. However, you can use something like > seqtk, which is very fast: > > https://github.com/lh3/seqtk > > Something like this should work: > > $ seqtk seq -r orig.fq > rc.fq > > chris > > On Aug 30, 2012, at 1:07 PM, shalabh sharma > wrote: > > > HI, > > I have a fastq file with few million reads. I need to find reverse > > complement of the reads. > > I used 'revcom' method but its not working for fastq. > > > > I will really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > > > > > -- > > Shalabh Sharma > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > Department of Marine Sciences > > University of Georgia > > Athens, GA 30602-3636 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From alexeymorozov1991 at gmail.com Thu Aug 2 23:54:02 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 3 Aug 2012 12:54:02 +0900 Subject: [Bioperl-l] Random trees generation Message-ID: Is it true that for generating random trees with integer branch lenghts I have to write my own generator? Seems like Tree::RandomFactory is only able to produce one with very small real values (and even that not at all branches). Is there no other good module fo that around? Alexey. From jason.stajich at gmail.com Fri Aug 3 11:52:48 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 3 Aug 2012 08:52:48 -0700 Subject: [Bioperl-l] Random trees generation In-Reply-To: References: Message-ID: The current Bioperl random tree factory is for use with the coalescent which I needed for my research -- it may or may not be suitable for your purposes. The module documentation echoes a call for more contribution to the implementations. Rutger's Bio::Phylo can generates random trees you can try it out too. http://search.cpan.org/~rvosa/Bio-Phylo-0.50/lib/Bio/Phylo/Generator.pm It really depends on what model you are trying to do. There are many tree simulators out there that may suit your needs better. http://evolution.genetics.washington.edu/phylip/software.html#Simulation Jason On Aug 2, 2012, at 8:54 PM, Alexey Morozov wrote: > Is it true that for generating random trees with integer branch lenghts I > have to write my own generator? Seems like Tree::RandomFactory is only able > to produce one with very small real values (and even that not at all > branches). Is there no other good module fo that around? > > Alexey. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From soyestepi at gmail.com Tue Aug 7 15:28:58 2012 From: soyestepi at gmail.com (Estefania) Date: Tue, 7 Aug 2012 16:28:58 -0300 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 Message-ID: Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. Using this script (previously cited here), nothing is printed and I have no errror messages. #!/usr/bin/perl -w use strict;use Data::Dumper; use Bio::SearchIO; my $infile = $ARGV[0]; # infernal report my $parser = Bio::SearchIO->new(-format => 'infernal', -file => $infile); while( my $result = $parser->next_result ) { print $result->query_name . "\n"; } If I try to print other elements, the only ones I can print are:$parser->algorithm(), $parser->version(), and for: $result = $parser->next_result, it works just for $size = $result->database_letters() and $dbname = $result->database_name() (but displays wrong name) Is this a problem of the version of Infernal? How can I parse this output? I also have tabulated output. Thanks in advance estepi From maquino at knome.com Tue Aug 7 20:16:56 2012 From: maquino at knome.com (Mark Aquino) Date: Wed, 8 Aug 2012 00:16:56 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: References: Message-ID: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Try changing the use to Bio::SeqIO::Infernal and see if that works. Sent from my iPhone On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. > Using this script (previously cited here), nothing is printed and I have no > errror messages. > #!/usr/bin/perl -w > use strict;use Data::Dumper; > use Bio::SearchIO; > > my $infile = $ARGV[0]; # infernal report > my $parser = Bio::SearchIO->new(-format => 'infernal', > -file => $infile); > while( my $result = $parser->next_result ) { > print $result->query_name . "\n"; > } > > If I try to print other elements, the only ones I can print > are:$parser->algorithm(), $parser->version(), > and for: $result = $parser->next_result, it works just for $size = > $result->database_letters() and $dbname = $result->database_name() (but > displays wrong name) > > Is this a problem of the version of Infernal? How can I parse this output? > I also have tabulated output. > Thanks in advance > estepi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 14 10:28:27 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 14 Aug 2012 10:28:27 -0400 Subject: [Bioperl-l] Protein GI to nucleotide GI Message-ID: HI All, I have thousands of protein GI/accession no. , is there any way i can get their corresponding nucleotide GIs. Thanks Shalabh From jason.stajich at gmail.com Tue Aug 14 14:48:19 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 14 Aug 2012 11:48:19 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: References: Message-ID: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Did you read the FAQ, this question is answered in there. http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F On Aug 14, 2012, at 7:28 AM, shalabh sharma wrote: > HI All, > I have thousands of protein GI/accession no. , is there any way i > can get their corresponding nucleotide GIs. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Wed Aug 15 11:50:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 15 Aug 2012 15:50:06 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> References: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B6C175@CHIMBX5.ad.uillinois.edu> Mark, no, the parser is Bio::SearchIO-based. My guess is this is a legitimate bug, Infernal 1.0.2 is the latest release and it is very possible there was a format change that is breaking things. Estepi, can you send me an example file to test? I know Infernal was recently updated and is much faster, I want to make sure BioPerl parses it correctly. chris On Aug 7, 2012, at 7:16 PM, Mark Aquino wrote: > Try changing the use to Bio::SeqIO::Infernal and see if that works. > > Sent from my iPhone > > On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > >> Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. >> Using this script (previously cited here), nothing is printed and I have no >> errror messages. >> #!/usr/bin/perl -w >> use strict;use Data::Dumper; >> use Bio::SearchIO; >> >> my $infile = $ARGV[0]; # infernal report >> my $parser = Bio::SearchIO->new(-format => 'infernal', >> -file => $infile); >> while( my $result = $parser->next_result ) { >> print $result->query_name . "\n"; >> } >> >> If I try to print other elements, the only ones I can print >> are:$parser->algorithm(), $parser->version(), >> and for: $result = $parser->next_result, it works just for $size = >> $result->database_letters() and $dbname = $result->database_name() (but >> displays wrong name) >> >> Is this a problem of the version of Infernal? How can I parse this output? >> I also have tabulated output. >> Thanks in advance >> estepi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From daisieh at gmail.com Thu Aug 16 14:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From daisieh at gmail.com Thu Aug 16 14:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hpm at ebi.ac.uk Fri Aug 17 06:39:43 2012 From: hpm at ebi.ac.uk (Hamish McWilliam) Date: Fri, 17 Aug 2012 11:39:43 +0100 Subject: [Bioperl-l] Programmatic Access To Biological Databases (Perl) Message-ID: <502E1F6F.7030205@ebi.ac.uk> *Date:* 1st-4th October 2012 *Venue:* EMBL-EBI, Hinxton, Nr Cambridge, CB10 1SD, UK *Registration Deadline:* 31st August 2012 This Perl based course in programmatic access to biological databases is ideal for bioinformaticians and biological researchers looking to develop data analysis pipelines, access data in an automated manner or to integrate web services into their own applications. What will it cover? - Overview of public domain biological databases at the EMBL-EBI. - Principles of Web Services, how they work and how to find them. - Integrating data from multiple sources. - Programmatic access to a variety of bioinformatic analysis tools. For a detailed programme and information about registration please see http://www.ebi.ac.uk/training/handson/course_121112_webservices.html All the best, Hamish -- ============================================================ Mr Hamish McWilliam European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK URL: http://www.ebi.ac.uk/ ============================================================ From saladi1 at illinois.edu Fri Aug 17 22:03:51 2012 From: saladi1 at illinois.edu (Shyam Saladi) Date: Fri, 17 Aug 2012 19:03:51 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> References: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Message-ID: Another way is through NCBI's E-utilities -- http://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Finding_Related_Data_Through_En On Tue, Aug 14, 2012 at 11:48 AM, Jason Stajich wrote: > Did you read the FAQ, this question is answered in there. > > http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F > > On Aug 14, 2012, at 7:28 AM, shalabh sharma > wrote: > > > HI All, > > I have thousands of protein GI/accession no. , is there any way > i > > can get their corresponding nucleotide GIs. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Tue Aug 21 11:32:17 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 21 Aug 2012 17:32:17 +0200 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> Message-ID: <5033AA01.8010801@gmail.com> Hi all, I have tested the code some more, made a couple of changes and put the branch in sync with master. This codes looks ready to me. I am prepared to either merge the branch in master or making a separate distro. Best, Florent On 06/03/12 02:08, Fields, Christopher J wrote: > I'll check it out. Want me to post test results here (I have access to a few systems to test on). > > chris > > On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: > >> To all interested, >> the AmpliconSearch module is in a decent state. If you want to test it or improve it, head to https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm >> Regards, >> Florent >> >> >> On 01/03/12 12:42, Fields, Christopher J wrote: >>> Florent, >>> >>> Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. >>> >>> chris >>> >>> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >>> >>>> There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. >>>> >>>> As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. >>>> >>>> BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. >>>> >>>> chris >>>> >>>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>>> >>>>> Thanks for everybody's feedback. >>>>> >>>>> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >>>>> >>>>> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >>>>> >>>>> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >>>>> >>>>> Florent >>>>> >>>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>>>>> >>>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>>> >>>>>> and it's not there. >>>>>> >>>>>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>>>>> >>>>>> chris >>>>>> >>>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>>> >>>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>>> >>>>>>> (There's supposedly a more recent version here: >>>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>>> but that file seems to be truncated). >>>>>>> >>>>>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>>>>> >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> >>>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>>>> github repo contains history going back to the original CVS repo. >>>>>>>> You can try contacting the author, possibly. >>>>>>>> >>>>>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>>>>> using regular expressions. I wanted to modularize my code more and >>>>>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Florent >>>>>>>> Maybe the last (InSilicoPCR). >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ Bioperl-l mailing >>>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l From BottomsC at missouri.edu Wed Aug 22 18:12:49 2012 From: BottomsC at missouri.edu (Bottoms, Christopher A) Date: Wed, 22 Aug 2012 22:12:49 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis Message-ID: Dear BioPerl community, I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. Thank you for all you do. Sincerely, Christopher Bottoms -------------------------------------- SYNOPSIS perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run DESCRIPTION This module pipelines steps in the analysis of SELEX (Systematic Evolution of Ligands through EXponential enrichment) data. This main module creates scripts to do the following: (1) Cluster similar sequences based on edit distance. (2) Align sequences within each cluster (using mafft). (3) Calculate the secondary structure of the aligned sequences (using RNAalifold, from the Vienna RNA package) (4) Build covariance models using cmbuild from Infernal. The module Bio::App::SELEX::CovarianceSearch can also be used to create scripts for doing iterative refinements of covariance models. EXAMPLE USE perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run (The file 'simple.seqs' should only contain sequences, one per line.) This will cluster the sequences found in 'simple.seqs' and create a FASTA file for each one. The FASTA files will be grouped into batches (i.e. one per cpu requested) that will be placed in a separate directory for each batch, and processed within that directory. At the end of processing, for each cluster there will be a covariance model and postscript illustration files. The batch script used to process each batch will be located in the respective batch directory. To produce the scripts without running them, simply exclude the --run flag from the command line. CONFIGURATION AND ENVIRONMENT As written, this code makes heavy use of UNIX utilities and is therefore only supported on UNIX-like environemnts (e.g. Linux, UNIX, Mac OS X). Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add the directories containing their executables to your PATH, so that the first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) will be generated for you with all of the correct parameters. Otherwise, you'll need to update your cluster.cfg file manually. After installing mafft, Infernal, and Vienna RNA packages, add the directories in which their executables reside in your PATH. For example, assuming that the mafft executable is located in the directory '/usr/local/myapps/bin/', you would want to add it to your PATH. To make sure this is done every time you open a terminal window, add this to your .bashrc file, thus: echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. Then, to make it effective immediately, you can source your .bashrc file: source ~/.bashrc INSTALLATION These installation instructions assume being able to open and use a terminal window on Linux. (0) Some systems need several dependencies installed ahead of time. You may be able to skip this step. However, if subsequent steps don't work, then be sure that some basic libraries are installed, as shown below (or ask a system administrator to take care of it): For RedHat or CentOS 5.x systems (tested on CentOS 5.5) Open a terminal and then type the following command, answering all questions in the afirmative: sudo yum install gcc For RedHat or CentOS 6.x systems (tested on CentOS 6.3) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo yum install gcc sudo yum install perl-devel For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo apt-get install gcc sudo apt-get install make (1) Install the non-Perl dependencies: (Versions shown are those that we've tested. Please contact us if newer versions do not work.) Infernal 1.0.2 (http://infernal.janelia.org/) MAFFT 6.849b (http://mafft.cbrc.jp/alignment/software/) RNA Vienna package 1.8.4 (http://www.tbi.univie.ac.at/~ivo/RNA/) (2) Either (a) download and run our installer or (b) use a CPAN client to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer creates the directory 'perl5' inside your home directory. This directory is for holding Perl modules, including this module and any Perl module dependencies not already included on your system. The installer also appends commands to your .bashrc file to make it easy for the Perl runtime to find these new modules (i.e. it includes your local 'perl5/lib/perl5' directory in the PERL5LIB environment variable). (a) Use the installer i. Download installer (and name it "installer") curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 ii. Make it executable chmod u+x installer iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the installer as many as three times to get all of the Perl modules installed. Please contact us if this doesn't work after three attempts. ./installer (b) If you prefer using a CPAN client, then we recommend that you install Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system perl, to avoid overwriting core Perl modules. If this doesn't make sense to you, then please be sure to use the installer as described in (a) above. INCOMPATIBILITIES None known BUGS AND LIMITATIONS There are no known bugs in this module. Please report problems to molecules cpan org Patches are welcome. RELATED PUBLICATIONS Ditzler et. al. Manuscript currently in review. From l.m.timmermans at students.uu.nl Fri Aug 24 05:59:16 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Fri, 24 Aug 2012 11:59:16 +0200 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A wrote: > I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. That's perfectly fine. > -------------------------------------- > SYNOPSIS > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run That is a bit wrong. .pm files are modules, not scripts. You're better off adding a small script that uses your module. > EXAMPLE USE > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run > > (The file 'simple.seqs' should only contain sequences, one per line.) Why are you not using a proper sequence format, Bio::SeqIO will allow you to accept any common format. > Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add > the directories containing their executables to your PATH, so that the > first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) > will be generated for you with all of the correct parameters. Otherwise, > you'll need to update your cluster.cfg file manually. > > After installing mafft, Infernal, and Vienna RNA packages, add the > directories in which their executables reside in your PATH. > > For example, assuming that the mafft executable is located in the directory > '/usr/local/myapps/bin/', you would want to add it to your PATH. To make > sure this is done every time you open a terminal window, add this to your > .bashrc file, thus: > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > Then, to make it effective immediately, you can source your .bashrc file: > > source ~/.bashrc If possible (perhaps it's not), you may want to create a so called Alien package that installs those requirements itself. Not sure if that's possible, and probably not that urgent either. > INSTALLATION > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) > > Open a terminal and then type the following commands, answering > all questions in the afirmative: > > sudo apt-get install gcc > sudo apt-get install make The package you're looking for is called build-essentials > (2) Either (a) download and run our installer or (b) use a CPAN client > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer > creates the directory 'perl5' inside your home directory. This > directory is for holding Perl modules, including this module and any > Perl module dependencies not already included on your system. The > installer also appends commands to your .bashrc file to make it easy > for the Perl runtime to find these new modules (i.e. it includes your > local 'perl5/lib/perl5' directory in the PERL5LIB environment > variable). > > (a) Use the installer > i. Download installer (and name it "installer") > > curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 That download doesn't work for me. > ii. Make it executable > > chmod u+x installer > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the > installer as many as three times to get all of the Perl > modules installed. Please contact us if this doesn't work > after three attempts. > > ./installer If it has that many issues, it's probably wrong. I'd strongly recommend going to CPAN way. > (b) If you prefer using a CPAN client, then we recommend that you install > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system > perl, to avoid overwriting core Perl modules. If this doesn't make > sense to you, then please be sure to use the installer as > described in (a) above. Installing locally is usually a good idea, I recommend local::lib in particular. This ?overwriting core Perl modules? suggests to me you're doing something wrong anyway though. Leon From alexeymorozov1991 at gmail.com Fri Aug 24 10:21:37 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 24 Aug 2012 22:21:37 +0800 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: 2012/8/24 Leon Timmermans > On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A > wrote: > > I developed this application for a research lab here at the University > of Missouri. I was wondering if this sounded okay and if it were okay to > use the "Bio" namespace. > > That's perfectly fine. > > > -------------------------------------- > > SYNOPSIS > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > That is a bit wrong. .pm files are modules, not scripts. You're better > off adding a small script that uses your module. > > > EXAMPLE USE > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > > > (The file 'simple.seqs' should only contain sequences, one per > line.) > > Why are you not using a proper sequence format, Bio::SeqIO will allow > you to accept any common format. > > > Install Infernal, MAFFT, and the RNA Vienna package ahead of > time and add > > the directories containing their executables to your PATH, so > that the > > first time you run RNAmotifAnalysis.pm a configuration file > (cluster.cfg) > > will be generated for you with all of the correct parameters. > Otherwise, > > you'll need to update your cluster.cfg file manually. > > > > After installing mafft, Infernal, and Vienna RNA packages, add > the > > directories in which their executables reside in your PATH. > > > > For example, assuming that the mafft executable is located in > the directory > > '/usr/local/myapps/bin/', you would want to add it to your PATH. > To make > > sure this is done every time you open a terminal window, add > this to your > > .bashrc file, thus: > > > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > > > Then, to make it effective immediately, you can source your > .bashrc file: > > > > source ~/.bashrc > > If possible (perhaps it's not), you may want to create a so called > Alien package that installs those requirements itself. Not sure if > that's possible, and probably not that urgent either. > > > INSTALLATION > > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu > 12-04 LTS) > > > > Open a terminal and then type the following commands, > answering > > all questions in the afirmative: > > > > sudo apt-get install gcc > > sudo apt-get install make > > The package you're looking for is called build-essentials > > > (2) Either (a) download and run our installer or (b) use a CPAN > client > > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our > installer > > creates the directory 'perl5' inside your home directory. > This > > directory is for holding Perl modules, including this module > and any > > Perl module dependencies not already included on your > system. The > > installer also appends commands to your .bashrc file to make > it easy > > for the Perl runtime to find these new modules (i.e. it > includes your > > local 'perl5/lib/perl5' directory in the PERL5LIB environment > > variable). > > > > (a) Use the installer > > i. Download installer (and name it "installer") > > > > curl -o installer -L > http://ircf.rnet.missouri.edu:8000/share.attachment/184 > > That download doesn't work for me. > > > ii. Make it executable > > > > chmod u+x installer > > > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had > to run the > > installer as many as three times to get all of the > Perl > > modules installed. Please contact us if this > doesn't work > > after three attempts. > > > > ./installer > > If it has that many issues, it's probably wrong. I'd strongly > recommend going to CPAN way. > > > (b) If you prefer using a CPAN client, then we recommend > that you install > > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to > system > > perl, to avoid overwriting core Perl modules. If this > doesn't make > > sense to you, then please be sure to use the installer as > > described in (a) above. > > Installing locally is usually a good idea, I recommend local::lib in > particular. This ?overwriting core Perl modules? suggests to me you're > doing something wrong anyway though. > > Leon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, right? What is required to get to official bioperl distribution? I think some of my code might eventially prove useful. Alexey Morozov LIN SB RAS Irkutsk, Russia From cjfields at illinois.edu Fri Aug 24 13:39:32 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Aug 2012 17:39:32 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B85703@CHIMBX5.ad.uillinois.edu> On Aug 24, 2012, at 9:21 AM, Alexey Morozov wrote: > 2012/8/24 Leon Timmermans > >> On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A >> wrote: >>> I developed this application for a research lab here at the University >> of Missouri. I was wondering if this sounded okay and if it were okay to >> use the "Bio" namespace. >> >> ... >> Installing locally is usually a good idea, I recommend local::lib in >> particular. This ?overwriting core Perl modules? suggests to me you're >> doing something wrong anyway though. >> >> Leon >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, > right? What is required to get to official bioperl distribution? I think > some of my code might eventially prove useful. > > Alexey Morozov > LIN SB RAS > Irkutsk, Russia Alexey, Any ideas on the name? We (BioPerl) don't technically own the primary Bio:: namespace, but we do have substantial real estate there :) Confusing namespaces are my only concern. Chris, Personally, using Bio::App doesn't seem right, mainly for the same reasons that Leon mentioned already, but if the modules are the basis for an application then I think the namespace makes sense (see the App::* namespace, for instance App::cpanminus/cpanm, App::perlbrew/perlbrew, etc). Everyone, It is a good practice to ask opinions on module names here and where they should go, though. Doing so here is completely acceptable. (well, Bio* specific ones...) My thoughts: There are a number of CPAN Bio::* modules that don't use BioPerl, and I wouldn't want to discourage anyone from submitting code to something like Bio::Foo as long as the dependencies are noted. I really want to remove the artificial barrier to CPAN submission for any Bio*-related Perl code, where the BioPerl devs must bless a set of modules prior to submission; it slows down development on your code as well as BioPerl in general. I do highly suggest naming your modules in a way so they wouldn't be confused with BioPerl if possible, though, e.g. don't name something in a more specific namespace that BioPerl already occupies, such as Bio::Seq::MySeqFile, but feel free to ask if there are questions on this. Re: what to do with modules: please submit the modules/distributions independently to CPAN. I *DON'T* suggest asking us to include code within the main BioPerl distribution, unless it is something integral to the entire BioPerl distribution (e.g. core-like). The reasons are two-fold. First, CPAN is an integral part of Perl, and interactions and submission of code to it should be part of the learning curve (just as creating eggs for python or gems for ruby are parts of their respective communities). It's very easy to add BioPerl as a dependency and submit a module on one's own: https://metacpan.org/release/Bio-EUtilities https://metacpan.org/release/Bio-Tools-Primer3Redux There are lots of tutorials for doing so, and if you have multiple modules or plan on maintaining support I highly suggest looking into some of the modern approaches to distribution and release management, Dist::Zilla being the primary one. BTW, the nice side benefits of submitting to CPAN: you get basic issue tracking and cross-platform testing for free (RT, CPAN Reporters), and it's easy enough to support. Second, we have been bitten many times in the past with code that was added to the core distribution (BioPerl). These are generally cases where code was supposed to be supported by the submitting authors, but for one reason or another they disappear, and the rest of the Bioperl developers may be left 'holding the bag' so to speak. We can't easily maintain code we don't write, particularly with various coding styles, practices, etc (bioperl-live/run have around ~1000 modules). Submission to CPAN places the maintenance responsibility back where it should be, on the submitting author. Frankly, beyond any namespace issues, wouldn't you want the ability/freedom to do with your code what you want? chris From jayoung at fhcrc.org Fri Aug 24 20:56:04 2012 From: jayoung at fhcrc.org (Janet Young) Date: Fri, 24 Aug 2012 17:56:04 -0700 Subject: [Bioperl-l] cigar_line Message-ID: <3B92347B-8105-4614-AA87-0B0DC4BF101E@fhcrc.org> Hi there, I'm playing around with alignment formats, and saw the function cigar_line for SimpleAlign objects. I have a couple of questions/suggestions: 1. It looks like the cigar string is being generated with respect to the consensus sequence. That's fine, but it would also be really useful to be able to generate it with respect to the reference (first) sequence. Would that be easy to implement? Could you consider that as a feature request? 2. Is there any commonly accepted definition of CIGAR format? and/or has it changed in recent years? The definition I've seen is from the SAM format (http://samtools.sourceforge.net/SAM1.pd) and these cigar strings don't look like they're in that format. The SAM definition carries a lot of useful information that this cigar string doesn't. 3. the 100% threshold used for generating the consensus from which cigar strings are made is very stringent (and counter-intuitive to the biologist: when I hear "consensus" I don't think 100% conserved). Also different to the default for consensus_string. Any thoughts on changing that threshold, or maybe just making the documentation a little clearer on that? 4. deletions with respect to consensus sequence don't seem to be reported in the cigar string (see seq4 in my toy example script below). Is this a bug? thanks for listening! Janet ------------------------------------------------------------------- Dr. Janet Young Tapscott and Malik labs Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org ------------------------------------------------------------------- #!/usr/bin/perl use warnings; use strict; use Bio::AlignIO; my $alignString = ">seq1\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq2\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq3\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq4\n AG-GAGGAGATCGGTAGCTGTTGCTAGTT"; my $stringfh; open($stringfh, "<", \$alignString); my $in = Bio::AlignIO->new(-fh => $stringfh, -format => "fasta"); while (my $aln = $in->next_aln()) { my $consString3 = $aln->consensus_string(100); print "\nconsensus100 $consString3\n"; my %cigars = $aln->cigar_line(); foreach my $seqname (sort keys %cigars) { my $shortseqname = (split /\//, $seqname)[0]; my $seq = $aln->get_seq_by_id($shortseqname)->seq(); print "seqname $seqname seq $seq cigar1 $cigars{$seqname}\n"; } } ##### script output: # consensus100 AG?GAGG?GATCGGTAGCTG?TGCTAGTT # seqname seq1/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq2/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq3/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq4/1-28 seq AG-GAGGAGATCGGTAGCTGTTGCTAGTT cigar1 1,6:8,19:21,28 From daisieh at zoology.ubc.ca Mon Aug 27 16:05:53 2012 From: daisieh at zoology.ubc.ca (Daisie Huang) Date: Mon, 27 Aug 2012 13:05:53 -0700 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module Message-ID: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? Thanks, Daisie ----------------------------------------- Daisie Huang, PhD Rm 318, Beaty Biodiversity Centre Department of Botany University of British Columbia http://cronklab.wikidot.com/daisie-huang From cjfields at illinois.edu Mon Aug 27 16:34:18 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 27 Aug 2012 20:34:18 +0000 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module In-Reply-To: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> References: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B895DF@CHIMBX5.ad.uillinois.edu> Yes, fixing things on a github fork and submitting a pull request is generally the best approach to this. If you have more substantial improvements over time we can add you as a developer on Github. chris On Aug 27, 2012, at 3:05 PM, Daisie Huang wrote: > I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? > > Thanks, > Daisie > ----------------------------------------- > Daisie Huang, PhD > Rm 318, Beaty Biodiversity Centre > Department of Botany > University of British Columbia > http://cronklab.wikidot.com/daisie-huang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Tue Aug 28 14:09:48 2012 From: jimhu at tamu.edu (Jim Hu) Date: Tue, 28 Aug 2012 13:09:48 -0500 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning Message-ID: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() --------------------- WARNING --------------------- MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 --------------------------------------------------- It still seems to work, though. Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Tue Aug 28 16:02:11 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 20:02:11 +0000 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Is the BioProject DBSOURCE retained if you write the output back using Bio::SeqIO? chris On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() > > --------------------- WARNING --------------------- > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > --------------------------------------------------- > > It still seems to work, though. > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 28 17:19:42 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 28 Aug 2012 17:19:42 -0400 Subject: [Bioperl-l] Bug reporting help Message-ID: Hi, Can somebody with Redmine experience help me out? I have an account associated with the address scott+bioperl at scottcain.net. When I try to reset my password by following the link that is emailed to me, no matter what I enter, I'm told the login is invalid. Any idea what I can do? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Aug 28 17:56:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 21:56:28 +0000 Subject: [Bioperl-l] Bug reporting help In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8B58F@CHIMBX5.ad.uillinois.edu> That's odd; just tried this with my account and had no problem. I can try changing it via the admin page and will send it to you. chris On Aug 28, 2012, at 4:19 PM, Scott Cain wrote: > Hi, > > Can somebody with Redmine experience help me out? I have an account > associated with the address scott+bioperl at scottcain.net. When I try > to reset my password by following the link that is emailed to me, no > matter what I enter, I'm told the login is invalid. Any idea what I > can do? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mahakadry at aucegypt.edu Tue Aug 28 20:48:58 2012 From: mahakadry at aucegypt.edu (maha ahmed) Date: Wed, 29 Aug 2012 02:48:58 +0200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Tue Aug 28 21:20:56 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 29 Aug 2012 13:20:56 +1200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF60AEC8@exchsth.agresearch.co.nz> What sequence identifiers are you using and what exactly are you trying to get? Data is available via URL so a simple Perl script will retrieve that: Eg. http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=unaligned http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=newick I haven't tried it but will Bio::DB::EMBL work? --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of maha ahmed Sent: Wednesday, 29 August 2012 12:49 p.m. To: Fields, Christopher J Cc: Jim Hu; Subject: Re: [Bioperl-l] Bio::DB::GenBank new(?) warning Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their > > web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Thu Aug 30 14:07:11 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 14:07:11 -0400 Subject: [Bioperl-l] reverse complement of fastq Message-ID: HI, I have a fastq file with few million reads. I need to find reverse complement of the reads. I used 'revcom' method but its not working for fastq. I will really appreciate if anyone can help me out. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Aug 30 14:54:14 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 30 Aug 2012 18:54:14 +0000 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> If you want something that gives you revcom *very quickly*, Bioperl is sadly not the way to go just yet. However, you can use something like seqtk, which is very fast: https://github.com/lh3/seqtk Something like this should work: $ seqtk seq -r orig.fq > rc.fq chris On Aug 30, 2012, at 1:07 PM, shalabh sharma wrote: > HI, > I have a fastq file with few million reads. I need to find reverse > complement of the reads. > I used 'revcom' method but its not working for fastq. > > I will really appreciate if anyone can help me out. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Thu Aug 30 16:01:10 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 16:01:10 -0400 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> Message-ID: Hey Chris, Thanks a lot it worked and it was really fast. Thanks Shalabh On Thu, Aug 30, 2012 at 2:54 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > If you want something that gives you revcom *very quickly*, Bioperl is > sadly not the way to go just yet. However, you can use something like > seqtk, which is very fast: > > https://github.com/lh3/seqtk > > Something like this should work: > > $ seqtk seq -r orig.fq > rc.fq > > chris > > On Aug 30, 2012, at 1:07 PM, shalabh sharma > wrote: > > > HI, > > I have a fastq file with few million reads. I need to find reverse > > complement of the reads. > > I used 'revcom' method but its not working for fastq. > > > > I will really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > > > > > -- > > Shalabh Sharma > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > Department of Marine Sciences > > University of Georgia > > Athens, GA 30602-3636 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From alexeymorozov1991 at gmail.com Thu Aug 2 23:54:02 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 3 Aug 2012 12:54:02 +0900 Subject: [Bioperl-l] Random trees generation Message-ID: Is it true that for generating random trees with integer branch lenghts I have to write my own generator? Seems like Tree::RandomFactory is only able to produce one with very small real values (and even that not at all branches). Is there no other good module fo that around? Alexey. From jason.stajich at gmail.com Fri Aug 3 11:52:48 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 3 Aug 2012 08:52:48 -0700 Subject: [Bioperl-l] Random trees generation In-Reply-To: References: Message-ID: The current Bioperl random tree factory is for use with the coalescent which I needed for my research -- it may or may not be suitable for your purposes. The module documentation echoes a call for more contribution to the implementations. Rutger's Bio::Phylo can generates random trees you can try it out too. http://search.cpan.org/~rvosa/Bio-Phylo-0.50/lib/Bio/Phylo/Generator.pm It really depends on what model you are trying to do. There are many tree simulators out there that may suit your needs better. http://evolution.genetics.washington.edu/phylip/software.html#Simulation Jason On Aug 2, 2012, at 8:54 PM, Alexey Morozov wrote: > Is it true that for generating random trees with integer branch lenghts I > have to write my own generator? Seems like Tree::RandomFactory is only able > to produce one with very small real values (and even that not at all > branches). Is there no other good module fo that around? > > Alexey. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From soyestepi at gmail.com Tue Aug 7 15:28:58 2012 From: soyestepi at gmail.com (Estefania) Date: Tue, 7 Aug 2012 16:28:58 -0300 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 Message-ID: Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. Using this script (previously cited here), nothing is printed and I have no errror messages. #!/usr/bin/perl -w use strict;use Data::Dumper; use Bio::SearchIO; my $infile = $ARGV[0]; # infernal report my $parser = Bio::SearchIO->new(-format => 'infernal', -file => $infile); while( my $result = $parser->next_result ) { print $result->query_name . "\n"; } If I try to print other elements, the only ones I can print are:$parser->algorithm(), $parser->version(), and for: $result = $parser->next_result, it works just for $size = $result->database_letters() and $dbname = $result->database_name() (but displays wrong name) Is this a problem of the version of Infernal? How can I parse this output? I also have tabulated output. Thanks in advance estepi From maquino at knome.com Tue Aug 7 20:16:56 2012 From: maquino at knome.com (Mark Aquino) Date: Wed, 8 Aug 2012 00:16:56 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: References: Message-ID: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Try changing the use to Bio::SeqIO::Infernal and see if that works. Sent from my iPhone On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. > Using this script (previously cited here), nothing is printed and I have no > errror messages. > #!/usr/bin/perl -w > use strict;use Data::Dumper; > use Bio::SearchIO; > > my $infile = $ARGV[0]; # infernal report > my $parser = Bio::SearchIO->new(-format => 'infernal', > -file => $infile); > while( my $result = $parser->next_result ) { > print $result->query_name . "\n"; > } > > If I try to print other elements, the only ones I can print > are:$parser->algorithm(), $parser->version(), > and for: $result = $parser->next_result, it works just for $size = > $result->database_letters() and $dbname = $result->database_name() (but > displays wrong name) > > Is this a problem of the version of Infernal? How can I parse this output? > I also have tabulated output. > Thanks in advance > estepi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 14 10:28:27 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 14 Aug 2012 10:28:27 -0400 Subject: [Bioperl-l] Protein GI to nucleotide GI Message-ID: HI All, I have thousands of protein GI/accession no. , is there any way i can get their corresponding nucleotide GIs. Thanks Shalabh From jason.stajich at gmail.com Tue Aug 14 14:48:19 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 14 Aug 2012 11:48:19 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: References: Message-ID: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Did you read the FAQ, this question is answered in there. http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F On Aug 14, 2012, at 7:28 AM, shalabh sharma wrote: > HI All, > I have thousands of protein GI/accession no. , is there any way i > can get their corresponding nucleotide GIs. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Wed Aug 15 11:50:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 15 Aug 2012 15:50:06 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> References: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B6C175@CHIMBX5.ad.uillinois.edu> Mark, no, the parser is Bio::SearchIO-based. My guess is this is a legitimate bug, Infernal 1.0.2 is the latest release and it is very possible there was a format change that is breaking things. Estepi, can you send me an example file to test? I know Infernal was recently updated and is much faster, I want to make sure BioPerl parses it correctly. chris On Aug 7, 2012, at 7:16 PM, Mark Aquino wrote: > Try changing the use to Bio::SeqIO::Infernal and see if that works. > > Sent from my iPhone > > On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > >> Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. >> Using this script (previously cited here), nothing is printed and I have no >> errror messages. >> #!/usr/bin/perl -w >> use strict;use Data::Dumper; >> use Bio::SearchIO; >> >> my $infile = $ARGV[0]; # infernal report >> my $parser = Bio::SearchIO->new(-format => 'infernal', >> -file => $infile); >> while( my $result = $parser->next_result ) { >> print $result->query_name . "\n"; >> } >> >> If I try to print other elements, the only ones I can print >> are:$parser->algorithm(), $parser->version(), >> and for: $result = $parser->next_result, it works just for $size = >> $result->database_letters() and $dbname = $result->database_name() (but >> displays wrong name) >> >> Is this a problem of the version of Infernal? How can I parse this output? >> I also have tabulated output. >> Thanks in advance >> estepi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From daisieh at gmail.com Thu Aug 16 14:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From daisieh at gmail.com Thu Aug 16 14:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hpm at ebi.ac.uk Fri Aug 17 06:39:43 2012 From: hpm at ebi.ac.uk (Hamish McWilliam) Date: Fri, 17 Aug 2012 11:39:43 +0100 Subject: [Bioperl-l] Programmatic Access To Biological Databases (Perl) Message-ID: <502E1F6F.7030205@ebi.ac.uk> *Date:* 1st-4th October 2012 *Venue:* EMBL-EBI, Hinxton, Nr Cambridge, CB10 1SD, UK *Registration Deadline:* 31st August 2012 This Perl based course in programmatic access to biological databases is ideal for bioinformaticians and biological researchers looking to develop data analysis pipelines, access data in an automated manner or to integrate web services into their own applications. What will it cover? - Overview of public domain biological databases at the EMBL-EBI. - Principles of Web Services, how they work and how to find them. - Integrating data from multiple sources. - Programmatic access to a variety of bioinformatic analysis tools. For a detailed programme and information about registration please see http://www.ebi.ac.uk/training/handson/course_121112_webservices.html All the best, Hamish -- ============================================================ Mr Hamish McWilliam European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK URL: http://www.ebi.ac.uk/ ============================================================ From saladi1 at illinois.edu Fri Aug 17 22:03:51 2012 From: saladi1 at illinois.edu (Shyam Saladi) Date: Fri, 17 Aug 2012 19:03:51 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> References: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Message-ID: Another way is through NCBI's E-utilities -- http://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Finding_Related_Data_Through_En On Tue, Aug 14, 2012 at 11:48 AM, Jason Stajich wrote: > Did you read the FAQ, this question is answered in there. > > http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F > > On Aug 14, 2012, at 7:28 AM, shalabh sharma > wrote: > > > HI All, > > I have thousands of protein GI/accession no. , is there any way > i > > can get their corresponding nucleotide GIs. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Tue Aug 21 11:32:17 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 21 Aug 2012 17:32:17 +0200 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> Message-ID: <5033AA01.8010801@gmail.com> Hi all, I have tested the code some more, made a couple of changes and put the branch in sync with master. This codes looks ready to me. I am prepared to either merge the branch in master or making a separate distro. Best, Florent On 06/03/12 02:08, Fields, Christopher J wrote: > I'll check it out. Want me to post test results here (I have access to a few systems to test on). > > chris > > On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: > >> To all interested, >> the AmpliconSearch module is in a decent state. If you want to test it or improve it, head to https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm >> Regards, >> Florent >> >> >> On 01/03/12 12:42, Fields, Christopher J wrote: >>> Florent, >>> >>> Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. >>> >>> chris >>> >>> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >>> >>>> There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. >>>> >>>> As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. >>>> >>>> BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. >>>> >>>> chris >>>> >>>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>>> >>>>> Thanks for everybody's feedback. >>>>> >>>>> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >>>>> >>>>> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >>>>> >>>>> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >>>>> >>>>> Florent >>>>> >>>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>>>>> >>>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>>> >>>>>> and it's not there. >>>>>> >>>>>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>>>>> >>>>>> chris >>>>>> >>>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>>> >>>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>>> >>>>>>> (There's supposedly a more recent version here: >>>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>>> but that file seems to be truncated). >>>>>>> >>>>>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>>>>> >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> >>>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>>>> github repo contains history going back to the original CVS repo. >>>>>>>> You can try contacting the author, possibly. >>>>>>>> >>>>>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>>>>> using regular expressions. I wanted to modularize my code more and >>>>>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Florent >>>>>>>> Maybe the last (InSilicoPCR). >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ Bioperl-l mailing >>>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l From BottomsC at missouri.edu Wed Aug 22 18:12:49 2012 From: BottomsC at missouri.edu (Bottoms, Christopher A) Date: Wed, 22 Aug 2012 22:12:49 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis Message-ID: Dear BioPerl community, I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. Thank you for all you do. Sincerely, Christopher Bottoms -------------------------------------- SYNOPSIS perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run DESCRIPTION This module pipelines steps in the analysis of SELEX (Systematic Evolution of Ligands through EXponential enrichment) data. This main module creates scripts to do the following: (1) Cluster similar sequences based on edit distance. (2) Align sequences within each cluster (using mafft). (3) Calculate the secondary structure of the aligned sequences (using RNAalifold, from the Vienna RNA package) (4) Build covariance models using cmbuild from Infernal. The module Bio::App::SELEX::CovarianceSearch can also be used to create scripts for doing iterative refinements of covariance models. EXAMPLE USE perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run (The file 'simple.seqs' should only contain sequences, one per line.) This will cluster the sequences found in 'simple.seqs' and create a FASTA file for each one. The FASTA files will be grouped into batches (i.e. one per cpu requested) that will be placed in a separate directory for each batch, and processed within that directory. At the end of processing, for each cluster there will be a covariance model and postscript illustration files. The batch script used to process each batch will be located in the respective batch directory. To produce the scripts without running them, simply exclude the --run flag from the command line. CONFIGURATION AND ENVIRONMENT As written, this code makes heavy use of UNIX utilities and is therefore only supported on UNIX-like environemnts (e.g. Linux, UNIX, Mac OS X). Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add the directories containing their executables to your PATH, so that the first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) will be generated for you with all of the correct parameters. Otherwise, you'll need to update your cluster.cfg file manually. After installing mafft, Infernal, and Vienna RNA packages, add the directories in which their executables reside in your PATH. For example, assuming that the mafft executable is located in the directory '/usr/local/myapps/bin/', you would want to add it to your PATH. To make sure this is done every time you open a terminal window, add this to your .bashrc file, thus: echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. Then, to make it effective immediately, you can source your .bashrc file: source ~/.bashrc INSTALLATION These installation instructions assume being able to open and use a terminal window on Linux. (0) Some systems need several dependencies installed ahead of time. You may be able to skip this step. However, if subsequent steps don't work, then be sure that some basic libraries are installed, as shown below (or ask a system administrator to take care of it): For RedHat or CentOS 5.x systems (tested on CentOS 5.5) Open a terminal and then type the following command, answering all questions in the afirmative: sudo yum install gcc For RedHat or CentOS 6.x systems (tested on CentOS 6.3) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo yum install gcc sudo yum install perl-devel For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo apt-get install gcc sudo apt-get install make (1) Install the non-Perl dependencies: (Versions shown are those that we've tested. Please contact us if newer versions do not work.) Infernal 1.0.2 (http://infernal.janelia.org/) MAFFT 6.849b (http://mafft.cbrc.jp/alignment/software/) RNA Vienna package 1.8.4 (http://www.tbi.univie.ac.at/~ivo/RNA/) (2) Either (a) download and run our installer or (b) use a CPAN client to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer creates the directory 'perl5' inside your home directory. This directory is for holding Perl modules, including this module and any Perl module dependencies not already included on your system. The installer also appends commands to your .bashrc file to make it easy for the Perl runtime to find these new modules (i.e. it includes your local 'perl5/lib/perl5' directory in the PERL5LIB environment variable). (a) Use the installer i. Download installer (and name it "installer") curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 ii. Make it executable chmod u+x installer iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the installer as many as three times to get all of the Perl modules installed. Please contact us if this doesn't work after three attempts. ./installer (b) If you prefer using a CPAN client, then we recommend that you install Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system perl, to avoid overwriting core Perl modules. If this doesn't make sense to you, then please be sure to use the installer as described in (a) above. INCOMPATIBILITIES None known BUGS AND LIMITATIONS There are no known bugs in this module. Please report problems to molecules cpan org Patches are welcome. RELATED PUBLICATIONS Ditzler et. al. Manuscript currently in review. From l.m.timmermans at students.uu.nl Fri Aug 24 05:59:16 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Fri, 24 Aug 2012 11:59:16 +0200 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A wrote: > I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. That's perfectly fine. > -------------------------------------- > SYNOPSIS > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run That is a bit wrong. .pm files are modules, not scripts. You're better off adding a small script that uses your module. > EXAMPLE USE > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run > > (The file 'simple.seqs' should only contain sequences, one per line.) Why are you not using a proper sequence format, Bio::SeqIO will allow you to accept any common format. > Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add > the directories containing their executables to your PATH, so that the > first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) > will be generated for you with all of the correct parameters. Otherwise, > you'll need to update your cluster.cfg file manually. > > After installing mafft, Infernal, and Vienna RNA packages, add the > directories in which their executables reside in your PATH. > > For example, assuming that the mafft executable is located in the directory > '/usr/local/myapps/bin/', you would want to add it to your PATH. To make > sure this is done every time you open a terminal window, add this to your > .bashrc file, thus: > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > Then, to make it effective immediately, you can source your .bashrc file: > > source ~/.bashrc If possible (perhaps it's not), you may want to create a so called Alien package that installs those requirements itself. Not sure if that's possible, and probably not that urgent either. > INSTALLATION > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) > > Open a terminal and then type the following commands, answering > all questions in the afirmative: > > sudo apt-get install gcc > sudo apt-get install make The package you're looking for is called build-essentials > (2) Either (a) download and run our installer or (b) use a CPAN client > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer > creates the directory 'perl5' inside your home directory. This > directory is for holding Perl modules, including this module and any > Perl module dependencies not already included on your system. The > installer also appends commands to your .bashrc file to make it easy > for the Perl runtime to find these new modules (i.e. it includes your > local 'perl5/lib/perl5' directory in the PERL5LIB environment > variable). > > (a) Use the installer > i. Download installer (and name it "installer") > > curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 That download doesn't work for me. > ii. Make it executable > > chmod u+x installer > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the > installer as many as three times to get all of the Perl > modules installed. Please contact us if this doesn't work > after three attempts. > > ./installer If it has that many issues, it's probably wrong. I'd strongly recommend going to CPAN way. > (b) If you prefer using a CPAN client, then we recommend that you install > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system > perl, to avoid overwriting core Perl modules. If this doesn't make > sense to you, then please be sure to use the installer as > described in (a) above. Installing locally is usually a good idea, I recommend local::lib in particular. This ?overwriting core Perl modules? suggests to me you're doing something wrong anyway though. Leon From alexeymorozov1991 at gmail.com Fri Aug 24 10:21:37 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 24 Aug 2012 22:21:37 +0800 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: 2012/8/24 Leon Timmermans > On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A > wrote: > > I developed this application for a research lab here at the University > of Missouri. I was wondering if this sounded okay and if it were okay to > use the "Bio" namespace. > > That's perfectly fine. > > > -------------------------------------- > > SYNOPSIS > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > That is a bit wrong. .pm files are modules, not scripts. You're better > off adding a small script that uses your module. > > > EXAMPLE USE > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > > > (The file 'simple.seqs' should only contain sequences, one per > line.) > > Why are you not using a proper sequence format, Bio::SeqIO will allow > you to accept any common format. > > > Install Infernal, MAFFT, and the RNA Vienna package ahead of > time and add > > the directories containing their executables to your PATH, so > that the > > first time you run RNAmotifAnalysis.pm a configuration file > (cluster.cfg) > > will be generated for you with all of the correct parameters. > Otherwise, > > you'll need to update your cluster.cfg file manually. > > > > After installing mafft, Infernal, and Vienna RNA packages, add > the > > directories in which their executables reside in your PATH. > > > > For example, assuming that the mafft executable is located in > the directory > > '/usr/local/myapps/bin/', you would want to add it to your PATH. > To make > > sure this is done every time you open a terminal window, add > this to your > > .bashrc file, thus: > > > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > > > Then, to make it effective immediately, you can source your > .bashrc file: > > > > source ~/.bashrc > > If possible (perhaps it's not), you may want to create a so called > Alien package that installs those requirements itself. Not sure if > that's possible, and probably not that urgent either. > > > INSTALLATION > > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu > 12-04 LTS) > > > > Open a terminal and then type the following commands, > answering > > all questions in the afirmative: > > > > sudo apt-get install gcc > > sudo apt-get install make > > The package you're looking for is called build-essentials > > > (2) Either (a) download and run our installer or (b) use a CPAN > client > > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our > installer > > creates the directory 'perl5' inside your home directory. > This > > directory is for holding Perl modules, including this module > and any > > Perl module dependencies not already included on your > system. The > > installer also appends commands to your .bashrc file to make > it easy > > for the Perl runtime to find these new modules (i.e. it > includes your > > local 'perl5/lib/perl5' directory in the PERL5LIB environment > > variable). > > > > (a) Use the installer > > i. Download installer (and name it "installer") > > > > curl -o installer -L > http://ircf.rnet.missouri.edu:8000/share.attachment/184 > > That download doesn't work for me. > > > ii. Make it executable > > > > chmod u+x installer > > > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had > to run the > > installer as many as three times to get all of the > Perl > > modules installed. Please contact us if this > doesn't work > > after three attempts. > > > > ./installer > > If it has that many issues, it's probably wrong. I'd strongly > recommend going to CPAN way. > > > (b) If you prefer using a CPAN client, then we recommend > that you install > > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to > system > > perl, to avoid overwriting core Perl modules. If this > doesn't make > > sense to you, then please be sure to use the installer as > > described in (a) above. > > Installing locally is usually a good idea, I recommend local::lib in > particular. This ?overwriting core Perl modules? suggests to me you're > doing something wrong anyway though. > > Leon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, right? What is required to get to official bioperl distribution? I think some of my code might eventially prove useful. Alexey Morozov LIN SB RAS Irkutsk, Russia From cjfields at illinois.edu Fri Aug 24 13:39:32 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Aug 2012 17:39:32 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B85703@CHIMBX5.ad.uillinois.edu> On Aug 24, 2012, at 9:21 AM, Alexey Morozov wrote: > 2012/8/24 Leon Timmermans > >> On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A >> wrote: >>> I developed this application for a research lab here at the University >> of Missouri. I was wondering if this sounded okay and if it were okay to >> use the "Bio" namespace. >> >> ... >> Installing locally is usually a good idea, I recommend local::lib in >> particular. This ?overwriting core Perl modules? suggests to me you're >> doing something wrong anyway though. >> >> Leon >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, > right? What is required to get to official bioperl distribution? I think > some of my code might eventially prove useful. > > Alexey Morozov > LIN SB RAS > Irkutsk, Russia Alexey, Any ideas on the name? We (BioPerl) don't technically own the primary Bio:: namespace, but we do have substantial real estate there :) Confusing namespaces are my only concern. Chris, Personally, using Bio::App doesn't seem right, mainly for the same reasons that Leon mentioned already, but if the modules are the basis for an application then I think the namespace makes sense (see the App::* namespace, for instance App::cpanminus/cpanm, App::perlbrew/perlbrew, etc). Everyone, It is a good practice to ask opinions on module names here and where they should go, though. Doing so here is completely acceptable. (well, Bio* specific ones...) My thoughts: There are a number of CPAN Bio::* modules that don't use BioPerl, and I wouldn't want to discourage anyone from submitting code to something like Bio::Foo as long as the dependencies are noted. I really want to remove the artificial barrier to CPAN submission for any Bio*-related Perl code, where the BioPerl devs must bless a set of modules prior to submission; it slows down development on your code as well as BioPerl in general. I do highly suggest naming your modules in a way so they wouldn't be confused with BioPerl if possible, though, e.g. don't name something in a more specific namespace that BioPerl already occupies, such as Bio::Seq::MySeqFile, but feel free to ask if there are questions on this. Re: what to do with modules: please submit the modules/distributions independently to CPAN. I *DON'T* suggest asking us to include code within the main BioPerl distribution, unless it is something integral to the entire BioPerl distribution (e.g. core-like). The reasons are two-fold. First, CPAN is an integral part of Perl, and interactions and submission of code to it should be part of the learning curve (just as creating eggs for python or gems for ruby are parts of their respective communities). It's very easy to add BioPerl as a dependency and submit a module on one's own: https://metacpan.org/release/Bio-EUtilities https://metacpan.org/release/Bio-Tools-Primer3Redux There are lots of tutorials for doing so, and if you have multiple modules or plan on maintaining support I highly suggest looking into some of the modern approaches to distribution and release management, Dist::Zilla being the primary one. BTW, the nice side benefits of submitting to CPAN: you get basic issue tracking and cross-platform testing for free (RT, CPAN Reporters), and it's easy enough to support. Second, we have been bitten many times in the past with code that was added to the core distribution (BioPerl). These are generally cases where code was supposed to be supported by the submitting authors, but for one reason or another they disappear, and the rest of the Bioperl developers may be left 'holding the bag' so to speak. We can't easily maintain code we don't write, particularly with various coding styles, practices, etc (bioperl-live/run have around ~1000 modules). Submission to CPAN places the maintenance responsibility back where it should be, on the submitting author. Frankly, beyond any namespace issues, wouldn't you want the ability/freedom to do with your code what you want? chris From jayoung at fhcrc.org Fri Aug 24 20:56:04 2012 From: jayoung at fhcrc.org (Janet Young) Date: Fri, 24 Aug 2012 17:56:04 -0700 Subject: [Bioperl-l] cigar_line Message-ID: <3B92347B-8105-4614-AA87-0B0DC4BF101E@fhcrc.org> Hi there, I'm playing around with alignment formats, and saw the function cigar_line for SimpleAlign objects. I have a couple of questions/suggestions: 1. It looks like the cigar string is being generated with respect to the consensus sequence. That's fine, but it would also be really useful to be able to generate it with respect to the reference (first) sequence. Would that be easy to implement? Could you consider that as a feature request? 2. Is there any commonly accepted definition of CIGAR format? and/or has it changed in recent years? The definition I've seen is from the SAM format (http://samtools.sourceforge.net/SAM1.pd) and these cigar strings don't look like they're in that format. The SAM definition carries a lot of useful information that this cigar string doesn't. 3. the 100% threshold used for generating the consensus from which cigar strings are made is very stringent (and counter-intuitive to the biologist: when I hear "consensus" I don't think 100% conserved). Also different to the default for consensus_string. Any thoughts on changing that threshold, or maybe just making the documentation a little clearer on that? 4. deletions with respect to consensus sequence don't seem to be reported in the cigar string (see seq4 in my toy example script below). Is this a bug? thanks for listening! Janet ------------------------------------------------------------------- Dr. Janet Young Tapscott and Malik labs Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org ------------------------------------------------------------------- #!/usr/bin/perl use warnings; use strict; use Bio::AlignIO; my $alignString = ">seq1\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq2\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq3\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq4\n AG-GAGGAGATCGGTAGCTGTTGCTAGTT"; my $stringfh; open($stringfh, "<", \$alignString); my $in = Bio::AlignIO->new(-fh => $stringfh, -format => "fasta"); while (my $aln = $in->next_aln()) { my $consString3 = $aln->consensus_string(100); print "\nconsensus100 $consString3\n"; my %cigars = $aln->cigar_line(); foreach my $seqname (sort keys %cigars) { my $shortseqname = (split /\//, $seqname)[0]; my $seq = $aln->get_seq_by_id($shortseqname)->seq(); print "seqname $seqname seq $seq cigar1 $cigars{$seqname}\n"; } } ##### script output: # consensus100 AG?GAGG?GATCGGTAGCTG?TGCTAGTT # seqname seq1/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq2/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq3/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq4/1-28 seq AG-GAGGAGATCGGTAGCTGTTGCTAGTT cigar1 1,6:8,19:21,28 From daisieh at zoology.ubc.ca Mon Aug 27 16:05:53 2012 From: daisieh at zoology.ubc.ca (Daisie Huang) Date: Mon, 27 Aug 2012 13:05:53 -0700 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module Message-ID: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? Thanks, Daisie ----------------------------------------- Daisie Huang, PhD Rm 318, Beaty Biodiversity Centre Department of Botany University of British Columbia http://cronklab.wikidot.com/daisie-huang From cjfields at illinois.edu Mon Aug 27 16:34:18 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 27 Aug 2012 20:34:18 +0000 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module In-Reply-To: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> References: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B895DF@CHIMBX5.ad.uillinois.edu> Yes, fixing things on a github fork and submitting a pull request is generally the best approach to this. If you have more substantial improvements over time we can add you as a developer on Github. chris On Aug 27, 2012, at 3:05 PM, Daisie Huang wrote: > I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? > > Thanks, > Daisie > ----------------------------------------- > Daisie Huang, PhD > Rm 318, Beaty Biodiversity Centre > Department of Botany > University of British Columbia > http://cronklab.wikidot.com/daisie-huang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Tue Aug 28 14:09:48 2012 From: jimhu at tamu.edu (Jim Hu) Date: Tue, 28 Aug 2012 13:09:48 -0500 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning Message-ID: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() --------------------- WARNING --------------------- MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 --------------------------------------------------- It still seems to work, though. Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Tue Aug 28 16:02:11 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 20:02:11 +0000 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Is the BioProject DBSOURCE retained if you write the output back using Bio::SeqIO? chris On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() > > --------------------- WARNING --------------------- > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > --------------------------------------------------- > > It still seems to work, though. > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 28 17:19:42 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 28 Aug 2012 17:19:42 -0400 Subject: [Bioperl-l] Bug reporting help Message-ID: Hi, Can somebody with Redmine experience help me out? I have an account associated with the address scott+bioperl at scottcain.net. When I try to reset my password by following the link that is emailed to me, no matter what I enter, I'm told the login is invalid. Any idea what I can do? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Aug 28 17:56:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 21:56:28 +0000 Subject: [Bioperl-l] Bug reporting help In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8B58F@CHIMBX5.ad.uillinois.edu> That's odd; just tried this with my account and had no problem. I can try changing it via the admin page and will send it to you. chris On Aug 28, 2012, at 4:19 PM, Scott Cain wrote: > Hi, > > Can somebody with Redmine experience help me out? I have an account > associated with the address scott+bioperl at scottcain.net. When I try > to reset my password by following the link that is emailed to me, no > matter what I enter, I'm told the login is invalid. Any idea what I > can do? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mahakadry at aucegypt.edu Tue Aug 28 20:48:58 2012 From: mahakadry at aucegypt.edu (maha ahmed) Date: Wed, 29 Aug 2012 02:48:58 +0200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Tue Aug 28 21:20:56 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 29 Aug 2012 13:20:56 +1200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF60AEC8@exchsth.agresearch.co.nz> What sequence identifiers are you using and what exactly are you trying to get? Data is available via URL so a simple Perl script will retrieve that: Eg. http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=unaligned http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=newick I haven't tried it but will Bio::DB::EMBL work? --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of maha ahmed Sent: Wednesday, 29 August 2012 12:49 p.m. To: Fields, Christopher J Cc: Jim Hu; Subject: Re: [Bioperl-l] Bio::DB::GenBank new(?) warning Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their > > web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Thu Aug 30 14:07:11 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 14:07:11 -0400 Subject: [Bioperl-l] reverse complement of fastq Message-ID: HI, I have a fastq file with few million reads. I need to find reverse complement of the reads. I used 'revcom' method but its not working for fastq. I will really appreciate if anyone can help me out. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Aug 30 14:54:14 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 30 Aug 2012 18:54:14 +0000 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> If you want something that gives you revcom *very quickly*, Bioperl is sadly not the way to go just yet. However, you can use something like seqtk, which is very fast: https://github.com/lh3/seqtk Something like this should work: $ seqtk seq -r orig.fq > rc.fq chris On Aug 30, 2012, at 1:07 PM, shalabh sharma wrote: > HI, > I have a fastq file with few million reads. I need to find reverse > complement of the reads. > I used 'revcom' method but its not working for fastq. > > I will really appreciate if anyone can help me out. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Thu Aug 30 16:01:10 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 16:01:10 -0400 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> Message-ID: Hey Chris, Thanks a lot it worked and it was really fast. Thanks Shalabh On Thu, Aug 30, 2012 at 2:54 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > If you want something that gives you revcom *very quickly*, Bioperl is > sadly not the way to go just yet. However, you can use something like > seqtk, which is very fast: > > https://github.com/lh3/seqtk > > Something like this should work: > > $ seqtk seq -r orig.fq > rc.fq > > chris > > On Aug 30, 2012, at 1:07 PM, shalabh sharma > wrote: > > > HI, > > I have a fastq file with few million reads. I need to find reverse > > complement of the reads. > > I used 'revcom' method but its not working for fastq. > > > > I will really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > > > > > -- > > Shalabh Sharma > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > Department of Marine Sciences > > University of Georgia > > Athens, GA 30602-3636 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From alexeymorozov1991 at gmail.com Fri Aug 3 03:54:02 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 3 Aug 2012 12:54:02 +0900 Subject: [Bioperl-l] Random trees generation Message-ID: Is it true that for generating random trees with integer branch lenghts I have to write my own generator? Seems like Tree::RandomFactory is only able to produce one with very small real values (and even that not at all branches). Is there no other good module fo that around? Alexey. From jason.stajich at gmail.com Fri Aug 3 15:52:48 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 3 Aug 2012 08:52:48 -0700 Subject: [Bioperl-l] Random trees generation In-Reply-To: References: Message-ID: The current Bioperl random tree factory is for use with the coalescent which I needed for my research -- it may or may not be suitable for your purposes. The module documentation echoes a call for more contribution to the implementations. Rutger's Bio::Phylo can generates random trees you can try it out too. http://search.cpan.org/~rvosa/Bio-Phylo-0.50/lib/Bio/Phylo/Generator.pm It really depends on what model you are trying to do. There are many tree simulators out there that may suit your needs better. http://evolution.genetics.washington.edu/phylip/software.html#Simulation Jason On Aug 2, 2012, at 8:54 PM, Alexey Morozov wrote: > Is it true that for generating random trees with integer branch lenghts I > have to write my own generator? Seems like Tree::RandomFactory is only able > to produce one with very small real values (and even that not at all > branches). Is there no other good module fo that around? > > Alexey. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From soyestepi at gmail.com Tue Aug 7 19:28:58 2012 From: soyestepi at gmail.com (Estefania) Date: Tue, 7 Aug 2012 16:28:58 -0300 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 Message-ID: Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. Using this script (previously cited here), nothing is printed and I have no errror messages. #!/usr/bin/perl -w use strict;use Data::Dumper; use Bio::SearchIO; my $infile = $ARGV[0]; # infernal report my $parser = Bio::SearchIO->new(-format => 'infernal', -file => $infile); while( my $result = $parser->next_result ) { print $result->query_name . "\n"; } If I try to print other elements, the only ones I can print are:$parser->algorithm(), $parser->version(), and for: $result = $parser->next_result, it works just for $size = $result->database_letters() and $dbname = $result->database_name() (but displays wrong name) Is this a problem of the version of Infernal? How can I parse this output? I also have tabulated output. Thanks in advance estepi From maquino at knome.com Wed Aug 8 00:16:56 2012 From: maquino at knome.com (Mark Aquino) Date: Wed, 8 Aug 2012 00:16:56 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: References: Message-ID: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Try changing the use to Bio::SeqIO::Infernal and see if that works. Sent from my iPhone On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. > Using this script (previously cited here), nothing is printed and I have no > errror messages. > #!/usr/bin/perl -w > use strict;use Data::Dumper; > use Bio::SearchIO; > > my $infile = $ARGV[0]; # infernal report > my $parser = Bio::SearchIO->new(-format => 'infernal', > -file => $infile); > while( my $result = $parser->next_result ) { > print $result->query_name . "\n"; > } > > If I try to print other elements, the only ones I can print > are:$parser->algorithm(), $parser->version(), > and for: $result = $parser->next_result, it works just for $size = > $result->database_letters() and $dbname = $result->database_name() (but > displays wrong name) > > Is this a problem of the version of Infernal? How can I parse this output? > I also have tabulated output. > Thanks in advance > estepi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 14 14:28:27 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 14 Aug 2012 10:28:27 -0400 Subject: [Bioperl-l] Protein GI to nucleotide GI Message-ID: HI All, I have thousands of protein GI/accession no. , is there any way i can get their corresponding nucleotide GIs. Thanks Shalabh From jason.stajich at gmail.com Tue Aug 14 18:48:19 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 14 Aug 2012 11:48:19 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: References: Message-ID: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Did you read the FAQ, this question is answered in there. http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F On Aug 14, 2012, at 7:28 AM, shalabh sharma wrote: > HI All, > I have thousands of protein GI/accession no. , is there any way i > can get their corresponding nucleotide GIs. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Wed Aug 15 15:50:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 15 Aug 2012 15:50:06 +0000 Subject: [Bioperl-l] Parsing output INFERNAL 1.0.2 In-Reply-To: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> References: <1D78B70F-F473-49BF-88B2-8ED844C2359F@knome.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B6C175@CHIMBX5.ad.uillinois.edu> Mark, no, the parser is Bio::SearchIO-based. My guess is this is a legitimate bug, Infernal 1.0.2 is the latest release and it is very possible there was a format change that is breaking things. Estepi, can you send me an example file to test? I know Infernal was recently updated and is much faster, I want to make sure BioPerl parses it correctly. chris On Aug 7, 2012, at 7:16 PM, Mark Aquino wrote: > Try changing the use to Bio::SeqIO::Infernal and see if that works. > > Sent from my iPhone > > On Aug 7, 2012, at 3:30 PM, "Estefania" wrote: > >> Dear all, I have some problems parsing INFERNAL 1.0.2 ouput. >> Using this script (previously cited here), nothing is printed and I have no >> errror messages. >> #!/usr/bin/perl -w >> use strict;use Data::Dumper; >> use Bio::SearchIO; >> >> my $infile = $ARGV[0]; # infernal report >> my $parser = Bio::SearchIO->new(-format => 'infernal', >> -file => $infile); >> while( my $result = $parser->next_result ) { >> print $result->query_name . "\n"; >> } >> >> If I try to print other elements, the only ones I can print >> are:$parser->algorithm(), $parser->version(), >> and for: $result = $parser->next_result, it works just for $size = >> $result->database_letters() and $dbname = $result->database_name() (but >> displays wrong name) >> >> Is this a problem of the version of Infernal? How can I parse this output? >> I also have tabulated output. >> Thanks in advance >> estepi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From daisieh at gmail.com Thu Aug 16 18:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From daisieh at gmail.com Thu Aug 16 18:38:25 2012 From: daisieh at gmail.com (Daisie Huang) Date: Thu, 16 Aug 2012 11:38:25 -0700 (PDT) Subject: [Bioperl-l] PAML problem In-Reply-To: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> References: <75365EA6-399D-4E89-A8BA-C0E1ED8871E8@gmail.com> Message-ID: <6b34fc2f-1163-47b6-b5ad-a94c5092a2a4@googlegroups.com> I'm not sure which PAML component caused this particular outcome, but the bugs and fixes I pushed to bioperl-live might fix this. When will those get pulled into the master? If those particular fixes don't help, I'd be happy to take a peek at the originator's code and see if it's a quick re-parsing fix. Daisie On Tuesday, June 26, 2012 6:37:55 PM UTC-7, Jason Stajich wrote: > > Peng - > > This module needs a person who's sole job is to keep tracking bugs and > updating it with new versions of the program. so far it has burned out > several developers on working on it since it not stable. > > I am not sure what the answer is to the problem, but often it depends on > the extra parameters used as this changes the order of the output making it > hard to parse. > > So I don't have a solution for you except that you'll have to post the bug > and the problem output mlc file to redmine and hope that we can entice some > developers to bang their head against this some more. > > Jason > On Jun 26, 2012, at 6:28 PM, Du, Peng wrote: > > > Hi everyone, > > > > I am using bioperl to parse paml output, and I saw this > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Unknown format of PAML output did not see seqtype > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > > STACK: Bio::Tools::Phylo::PAML::_parse_summary > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:461 > > STACK: Bio::Tools::Phylo::PAML::next_result > > /usr/local/share/perl/5.10.1/Bio/Tools/Phylo/PAML.pm:270 > > STACK: main::cal_dn_ds dn_ds.pl:131 > > STACK: dn_ds.pl:44 > > ---------------------------------------------------------------- > > > > I googled and found that, it was caused by PAML version > > incompatibility. I tried 3.13, 3.14, 4.1, 4.2, 4.5 and none of them > > worked. Could someone tell me which version is fine? > > > > My bioperl version is 1.006001. Thank you very much. > > > > -- > > > > Peng Du > > Graduate School of Information Science and Technology, Hokkaido > University > > Kita 14 Nishi 9 Kita-ku, Sapporo, Japan 060-0814 > > Email: d... at ibio.jp Tel: +81 80 3268 9713 > > > > _______________________________________________ > > Bioperl-l mailing list > > Biop... at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.... at gmail.com > ja... at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Biop... at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hpm at ebi.ac.uk Fri Aug 17 10:39:43 2012 From: hpm at ebi.ac.uk (Hamish McWilliam) Date: Fri, 17 Aug 2012 11:39:43 +0100 Subject: [Bioperl-l] Programmatic Access To Biological Databases (Perl) Message-ID: <502E1F6F.7030205@ebi.ac.uk> *Date:* 1st-4th October 2012 *Venue:* EMBL-EBI, Hinxton, Nr Cambridge, CB10 1SD, UK *Registration Deadline:* 31st August 2012 This Perl based course in programmatic access to biological databases is ideal for bioinformaticians and biological researchers looking to develop data analysis pipelines, access data in an automated manner or to integrate web services into their own applications. What will it cover? - Overview of public domain biological databases at the EMBL-EBI. - Principles of Web Services, how they work and how to find them. - Integrating data from multiple sources. - Programmatic access to a variety of bioinformatic analysis tools. For a detailed programme and information about registration please see http://www.ebi.ac.uk/training/handson/course_121112_webservices.html All the best, Hamish -- ============================================================ Mr Hamish McWilliam European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK URL: http://www.ebi.ac.uk/ ============================================================ From saladi1 at illinois.edu Sat Aug 18 02:03:51 2012 From: saladi1 at illinois.edu (Shyam Saladi) Date: Fri, 17 Aug 2012 19:03:51 -0700 Subject: [Bioperl-l] Protein GI to nucleotide GI In-Reply-To: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> References: <7AC7F5C3-DD69-47DC-A974-70DBD6CAA5EB@gmail.com> Message-ID: Another way is through NCBI's E-utilities -- http://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Finding_Related_Data_Through_En On Tue, Aug 14, 2012 at 11:48 AM, Jason Stajich wrote: > Did you read the FAQ, this question is answered in there. > > http://bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F > > On Aug 14, 2012, at 7:28 AM, shalabh sharma > wrote: > > > HI All, > > I have thousands of protein GI/accession no. , is there any way > i > > can get their corresponding nucleotide GIs. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Tue Aug 21 15:32:17 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 21 Aug 2012 17:32:17 +0200 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> Message-ID: <5033AA01.8010801@gmail.com> Hi all, I have tested the code some more, made a couple of changes and put the branch in sync with master. This codes looks ready to me. I am prepared to either merge the branch in master or making a separate distro. Best, Florent On 06/03/12 02:08, Fields, Christopher J wrote: > I'll check it out. Want me to post test results here (I have access to a few systems to test on). > > chris > > On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: > >> To all interested, >> the AmpliconSearch module is in a decent state. If you want to test it or improve it, head to https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm >> Regards, >> Florent >> >> >> On 01/03/12 12:42, Fields, Christopher J wrote: >>> Florent, >>> >>> Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. >>> >>> chris >>> >>> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >>> >>>> There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. >>>> >>>> As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. >>>> >>>> BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. >>>> >>>> chris >>>> >>>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>>> >>>>> Thanks for everybody's feedback. >>>>> >>>>> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >>>>> >>>>> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >>>>> >>>>> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >>>>> >>>>> Florent >>>>> >>>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>>>>> >>>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>>> >>>>>> and it's not there. >>>>>> >>>>>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>>>>> >>>>>> chris >>>>>> >>>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>>> >>>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>>> >>>>>>> (There's supposedly a more recent version here: >>>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>>> but that file seems to be truncated). >>>>>>> >>>>>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>>>>> >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> >>>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>>>> github repo contains history going back to the original CVS repo. >>>>>>>> You can try contacting the author, possibly. >>>>>>>> >>>>>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>>>>> using regular expressions. I wanted to modularize my code more and >>>>>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Florent >>>>>>>> Maybe the last (InSilicoPCR). >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ Bioperl-l mailing >>>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l From BottomsC at missouri.edu Wed Aug 22 22:12:49 2012 From: BottomsC at missouri.edu (Bottoms, Christopher A) Date: Wed, 22 Aug 2012 22:12:49 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis Message-ID: Dear BioPerl community, I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. Thank you for all you do. Sincerely, Christopher Bottoms -------------------------------------- SYNOPSIS perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run DESCRIPTION This module pipelines steps in the analysis of SELEX (Systematic Evolution of Ligands through EXponential enrichment) data. This main module creates scripts to do the following: (1) Cluster similar sequences based on edit distance. (2) Align sequences within each cluster (using mafft). (3) Calculate the secondary structure of the aligned sequences (using RNAalifold, from the Vienna RNA package) (4) Build covariance models using cmbuild from Infernal. The module Bio::App::SELEX::CovarianceSearch can also be used to create scripts for doing iterative refinements of covariance models. EXAMPLE USE perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run (The file 'simple.seqs' should only contain sequences, one per line.) This will cluster the sequences found in 'simple.seqs' and create a FASTA file for each one. The FASTA files will be grouped into batches (i.e. one per cpu requested) that will be placed in a separate directory for each batch, and processed within that directory. At the end of processing, for each cluster there will be a covariance model and postscript illustration files. The batch script used to process each batch will be located in the respective batch directory. To produce the scripts without running them, simply exclude the --run flag from the command line. CONFIGURATION AND ENVIRONMENT As written, this code makes heavy use of UNIX utilities and is therefore only supported on UNIX-like environemnts (e.g. Linux, UNIX, Mac OS X). Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add the directories containing their executables to your PATH, so that the first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) will be generated for you with all of the correct parameters. Otherwise, you'll need to update your cluster.cfg file manually. After installing mafft, Infernal, and Vienna RNA packages, add the directories in which their executables reside in your PATH. For example, assuming that the mafft executable is located in the directory '/usr/local/myapps/bin/', you would want to add it to your PATH. To make sure this is done every time you open a terminal window, add this to your .bashrc file, thus: echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. Then, to make it effective immediately, you can source your .bashrc file: source ~/.bashrc INSTALLATION These installation instructions assume being able to open and use a terminal window on Linux. (0) Some systems need several dependencies installed ahead of time. You may be able to skip this step. However, if subsequent steps don't work, then be sure that some basic libraries are installed, as shown below (or ask a system administrator to take care of it): For RedHat or CentOS 5.x systems (tested on CentOS 5.5) Open a terminal and then type the following command, answering all questions in the afirmative: sudo yum install gcc For RedHat or CentOS 6.x systems (tested on CentOS 6.3) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo yum install gcc sudo yum install perl-devel For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) Open a terminal and then type the following commands, answering all questions in the afirmative: sudo apt-get install gcc sudo apt-get install make (1) Install the non-Perl dependencies: (Versions shown are those that we've tested. Please contact us if newer versions do not work.) Infernal 1.0.2 (http://infernal.janelia.org/) MAFFT 6.849b (http://mafft.cbrc.jp/alignment/software/) RNA Vienna package 1.8.4 (http://www.tbi.univie.ac.at/~ivo/RNA/) (2) Either (a) download and run our installer or (b) use a CPAN client to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer creates the directory 'perl5' inside your home directory. This directory is for holding Perl modules, including this module and any Perl module dependencies not already included on your system. The installer also appends commands to your .bashrc file to make it easy for the Perl runtime to find these new modules (i.e. it includes your local 'perl5/lib/perl5' directory in the PERL5LIB environment variable). (a) Use the installer i. Download installer (and name it "installer") curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 ii. Make it executable chmod u+x installer iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the installer as many as three times to get all of the Perl modules installed. Please contact us if this doesn't work after three attempts. ./installer (b) If you prefer using a CPAN client, then we recommend that you install Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system perl, to avoid overwriting core Perl modules. If this doesn't make sense to you, then please be sure to use the installer as described in (a) above. INCOMPATIBILITIES None known BUGS AND LIMITATIONS There are no known bugs in this module. Please report problems to molecules cpan org Patches are welcome. RELATED PUBLICATIONS Ditzler et. al. Manuscript currently in review. From l.m.timmermans at students.uu.nl Fri Aug 24 09:59:16 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Fri, 24 Aug 2012 11:59:16 +0200 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A wrote: > I developed this application for a research lab here at the University of Missouri. I was wondering if this sounded okay and if it were okay to use the "Bio" namespace. That's perfectly fine. > -------------------------------------- > SYNOPSIS > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run That is a bit wrong. .pm files are modules, not scripts. You're better off adding a small script that uses your module. > EXAMPLE USE > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile simple.seqs --cpus 4 --run > > (The file 'simple.seqs' should only contain sequences, one per line.) Why are you not using a proper sequence format, Bio::SeqIO will allow you to accept any common format. > Install Infernal, MAFFT, and the RNA Vienna package ahead of time and add > the directories containing their executables to your PATH, so that the > first time you run RNAmotifAnalysis.pm a configuration file (cluster.cfg) > will be generated for you with all of the correct parameters. Otherwise, > you'll need to update your cluster.cfg file manually. > > After installing mafft, Infernal, and Vienna RNA packages, add the > directories in which their executables reside in your PATH. > > For example, assuming that the mafft executable is located in the directory > '/usr/local/myapps/bin/', you would want to add it to your PATH. To make > sure this is done every time you open a terminal window, add this to your > .bashrc file, thus: > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > Then, to make it effective immediately, you can source your .bashrc file: > > source ~/.bashrc If possible (perhaps it's not), you may want to create a so called Alien package that installs those requirements itself. Not sure if that's possible, and probably not that urgent either. > INSTALLATION > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu 12-04 LTS) > > Open a terminal and then type the following commands, answering > all questions in the afirmative: > > sudo apt-get install gcc > sudo apt-get install make The package you're looking for is called build-essentials > (2) Either (a) download and run our installer or (b) use a CPAN client > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our installer > creates the directory 'perl5' inside your home directory. This > directory is for holding Perl modules, including this module and any > Perl module dependencies not already included on your system. The > installer also appends commands to your .bashrc file to make it easy > for the Perl runtime to find these new modules (i.e. it includes your > local 'perl5/lib/perl5' directory in the PERL5LIB environment > variable). > > (a) Use the installer > i. Download installer (and name it "installer") > > curl -o installer -L http://ircf.rnet.missouri.edu:8000/share.attachment/184 That download doesn't work for me. > ii. Make it executable > > chmod u+x installer > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had to run the > installer as many as three times to get all of the Perl > modules installed. Please contact us if this doesn't work > after three attempts. > > ./installer If it has that many issues, it's probably wrong. I'd strongly recommend going to CPAN way. > (b) If you prefer using a CPAN client, then we recommend that you install > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to system > perl, to avoid overwriting core Perl modules. If this doesn't make > sense to you, then please be sure to use the installer as > described in (a) above. Installing locally is usually a good idea, I recommend local::lib in particular. This ?overwriting core Perl modules? suggests to me you're doing something wrong anyway though. Leon From alexeymorozov1991 at gmail.com Fri Aug 24 14:21:37 2012 From: alexeymorozov1991 at gmail.com (Alexey Morozov) Date: Fri, 24 Aug 2012 22:21:37 +0800 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: 2012/8/24 Leon Timmermans > On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A > wrote: > > I developed this application for a research lab here at the University > of Missouri. I was wondering if this sounded okay and if it were okay to > use the "Bio" namespace. > > That's perfectly fine. > > > -------------------------------------- > > SYNOPSIS > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > That is a bit wrong. .pm files are modules, not scripts. You're better > off adding a small script that uses your module. > > > EXAMPLE USE > > perl perl5/lib/perl5/Bio/App/SELEX/RNAmotifAnalysis.pm --infile > simple.seqs --cpus 4 --run > > > > (The file 'simple.seqs' should only contain sequences, one per > line.) > > Why are you not using a proper sequence format, Bio::SeqIO will allow > you to accept any common format. > > > Install Infernal, MAFFT, and the RNA Vienna package ahead of > time and add > > the directories containing their executables to your PATH, so > that the > > first time you run RNAmotifAnalysis.pm a configuration file > (cluster.cfg) > > will be generated for you with all of the correct parameters. > Otherwise, > > you'll need to update your cluster.cfg file manually. > > > > After installing mafft, Infernal, and Vienna RNA packages, add > the > > directories in which their executables reside in your PATH. > > > > For example, assuming that the mafft executable is located in > the directory > > '/usr/local/myapps/bin/', you would want to add it to your PATH. > To make > > sure this is done every time you open a terminal window, add > this to your > > .bashrc file, thus: > > > > echo 'export PATH=/usr/local/myapps/bin:$PATH' >> ~/.bashrc. > > > > Then, to make it effective immediately, you can source your > .bashrc file: > > > > source ~/.bashrc > > If possible (perhaps it's not), you may want to create a so called > Alien package that installs those requirements itself. Not sure if > that's possible, and probably not that urgent either. > > > INSTALLATION > > For Debian or Ubuntu systems (tested on Debian 5.06, Ubuntu > 12-04 LTS) > > > > Open a terminal and then type the following commands, > answering > > all questions in the afirmative: > > > > sudo apt-get install gcc > > sudo apt-get install make > > The package you're looking for is called build-essentials > > > (2) Either (a) download and run our installer or (b) use a CPAN > client > > to install Bio::App::SELEX::RNAmotifAnalysis. Note that our > installer > > creates the directory 'perl5' inside your home directory. > This > > directory is for holding Perl modules, including this module > and any > > Perl module dependencies not already included on your > system. The > > installer also appends commands to your .bashrc file to make > it easy > > for the Perl runtime to find these new modules (i.e. it > includes your > > local 'perl5/lib/perl5' directory in the PERL5LIB environment > > variable). > > > > (a) Use the installer > > i. Download installer (and name it "installer") > > > > curl -o installer -L > http://ircf.rnet.missouri.edu:8000/share.attachment/184 > > That download doesn't work for me. > > > ii. Make it executable > > > > chmod u+x installer > > > > iii. Run it. In a few cases (e.g. CentOS 5.5) we've had > to run the > > installer as many as three times to get all of the > Perl > > modules installed. Please contact us if this > doesn't work > > after three attempts. > > > > ./installer > > If it has that many issues, it's probably wrong. I'd strongly > recommend going to CPAN way. > > > (b) If you prefer using a CPAN client, then we recommend > that you install > > Bio::App::SELEX::RNAmotifAnalyis 'locally' instead of to > system > > perl, to avoid overwriting core Perl modules. If this > doesn't make > > sense to you, then please be sure to use the installer as > > described in (a) above. > > Installing locally is usually a good idea, I recommend local::lib in > particular. This ?overwriting core Perl modules? suggests to me you're > doing something wrong anyway though. > > Leon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, right? What is required to get to official bioperl distribution? I think some of my code might eventially prove useful. Alexey Morozov LIN SB RAS Irkutsk, Russia From cjfields at illinois.edu Fri Aug 24 17:39:32 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 24 Aug 2012 17:39:32 +0000 Subject: [Bioperl-l] RFC: Bio::App::SELEX::RNAmotifAnalysis In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B85703@CHIMBX5.ad.uillinois.edu> On Aug 24, 2012, at 9:21 AM, Alexey Morozov wrote: > 2012/8/24 Leon Timmermans > >> On Thu, Aug 23, 2012 at 12:12 AM, Bottoms, Christopher A >> wrote: >>> I developed this application for a research lab here at the University >> of Missouri. I was wondering if this sounded okay and if it were okay to >> use the "Bio" namespace. >> >> ... >> Installing locally is usually a good idea, I recommend local::lib in >> particular. This ?overwriting core Perl modules? suggests to me you're >> doing something wrong anyway though. >> >> Leon >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > So one is free to do whatever he pleases and call it Bio::MyAwesomeStuff, > right? What is required to get to official bioperl distribution? I think > some of my code might eventially prove useful. > > Alexey Morozov > LIN SB RAS > Irkutsk, Russia Alexey, Any ideas on the name? We (BioPerl) don't technically own the primary Bio:: namespace, but we do have substantial real estate there :) Confusing namespaces are my only concern. Chris, Personally, using Bio::App doesn't seem right, mainly for the same reasons that Leon mentioned already, but if the modules are the basis for an application then I think the namespace makes sense (see the App::* namespace, for instance App::cpanminus/cpanm, App::perlbrew/perlbrew, etc). Everyone, It is a good practice to ask opinions on module names here and where they should go, though. Doing so here is completely acceptable. (well, Bio* specific ones...) My thoughts: There are a number of CPAN Bio::* modules that don't use BioPerl, and I wouldn't want to discourage anyone from submitting code to something like Bio::Foo as long as the dependencies are noted. I really want to remove the artificial barrier to CPAN submission for any Bio*-related Perl code, where the BioPerl devs must bless a set of modules prior to submission; it slows down development on your code as well as BioPerl in general. I do highly suggest naming your modules in a way so they wouldn't be confused with BioPerl if possible, though, e.g. don't name something in a more specific namespace that BioPerl already occupies, such as Bio::Seq::MySeqFile, but feel free to ask if there are questions on this. Re: what to do with modules: please submit the modules/distributions independently to CPAN. I *DON'T* suggest asking us to include code within the main BioPerl distribution, unless it is something integral to the entire BioPerl distribution (e.g. core-like). The reasons are two-fold. First, CPAN is an integral part of Perl, and interactions and submission of code to it should be part of the learning curve (just as creating eggs for python or gems for ruby are parts of their respective communities). It's very easy to add BioPerl as a dependency and submit a module on one's own: https://metacpan.org/release/Bio-EUtilities https://metacpan.org/release/Bio-Tools-Primer3Redux There are lots of tutorials for doing so, and if you have multiple modules or plan on maintaining support I highly suggest looking into some of the modern approaches to distribution and release management, Dist::Zilla being the primary one. BTW, the nice side benefits of submitting to CPAN: you get basic issue tracking and cross-platform testing for free (RT, CPAN Reporters), and it's easy enough to support. Second, we have been bitten many times in the past with code that was added to the core distribution (BioPerl). These are generally cases where code was supposed to be supported by the submitting authors, but for one reason or another they disappear, and the rest of the Bioperl developers may be left 'holding the bag' so to speak. We can't easily maintain code we don't write, particularly with various coding styles, practices, etc (bioperl-live/run have around ~1000 modules). Submission to CPAN places the maintenance responsibility back where it should be, on the submitting author. Frankly, beyond any namespace issues, wouldn't you want the ability/freedom to do with your code what you want? chris From jayoung at fhcrc.org Sat Aug 25 00:56:04 2012 From: jayoung at fhcrc.org (Janet Young) Date: Fri, 24 Aug 2012 17:56:04 -0700 Subject: [Bioperl-l] cigar_line Message-ID: <3B92347B-8105-4614-AA87-0B0DC4BF101E@fhcrc.org> Hi there, I'm playing around with alignment formats, and saw the function cigar_line for SimpleAlign objects. I have a couple of questions/suggestions: 1. It looks like the cigar string is being generated with respect to the consensus sequence. That's fine, but it would also be really useful to be able to generate it with respect to the reference (first) sequence. Would that be easy to implement? Could you consider that as a feature request? 2. Is there any commonly accepted definition of CIGAR format? and/or has it changed in recent years? The definition I've seen is from the SAM format (http://samtools.sourceforge.net/SAM1.pd) and these cigar strings don't look like they're in that format. The SAM definition carries a lot of useful information that this cigar string doesn't. 3. the 100% threshold used for generating the consensus from which cigar strings are made is very stringent (and counter-intuitive to the biologist: when I hear "consensus" I don't think 100% conserved). Also different to the default for consensus_string. Any thoughts on changing that threshold, or maybe just making the documentation a little clearer on that? 4. deletions with respect to consensus sequence don't seem to be reported in the cigar string (see seq4 in my toy example script below). Is this a bug? thanks for listening! Janet ------------------------------------------------------------------- Dr. Janet Young Tapscott and Malik labs Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org ------------------------------------------------------------------- #!/usr/bin/perl use warnings; use strict; use Bio::AlignIO; my $alignString = ">seq1\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq2\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq3\n AGTGAGGTGATCGGTAGCTGATGCTAGTT\n >seq4\n AG-GAGGAGATCGGTAGCTGTTGCTAGTT"; my $stringfh; open($stringfh, "<", \$alignString); my $in = Bio::AlignIO->new(-fh => $stringfh, -format => "fasta"); while (my $aln = $in->next_aln()) { my $consString3 = $aln->consensus_string(100); print "\nconsensus100 $consString3\n"; my %cigars = $aln->cigar_line(); foreach my $seqname (sort keys %cigars) { my $shortseqname = (split /\//, $seqname)[0]; my $seq = $aln->get_seq_by_id($shortseqname)->seq(); print "seqname $seqname seq $seq cigar1 $cigars{$seqname}\n"; } } ##### script output: # consensus100 AG?GAGG?GATCGGTAGCTG?TGCTAGTT # seqname seq1/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq2/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq3/1-29 seq AGTGAGGTGATCGGTAGCTGATGCTAGTT cigar1 1,2:4,7:9,20:22,29 # seqname seq4/1-28 seq AG-GAGGAGATCGGTAGCTGTTGCTAGTT cigar1 1,6:8,19:21,28 From daisieh at zoology.ubc.ca Mon Aug 27 20:05:53 2012 From: daisieh at zoology.ubc.ca (Daisie Huang) Date: Mon, 27 Aug 2012 13:05:53 -0700 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module Message-ID: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? Thanks, Daisie ----------------------------------------- Daisie Huang, PhD Rm 318, Beaty Biodiversity Centre Department of Botany University of British Columbia http://cronklab.wikidot.com/daisie-huang From cjfields at illinois.edu Mon Aug 27 20:34:18 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 27 Aug 2012 20:34:18 +0000 Subject: [Bioperl-l] RFC: Refactoring the Hyphy module In-Reply-To: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> References: <6314EB81-165C-4BF0-9844-2D5F90D30F70@zoology.ubc.ca> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B895DF@CHIMBX5.ad.uillinois.edu> Yes, fixing things on a github fork and submitting a pull request is generally the best approach to this. If you have more substantial improvements over time we can add you as a developer on Github. chris On Aug 27, 2012, at 3:05 PM, Daisie Huang wrote: > I've been playing around in the guts of the Hyphy module in bioperl-run, and it strikes me that a lot of the code in the modules is redundant and could be refactored to streamline things and fix some crashes I've been seeing in the test code. Generally, my coding philosophy is to mess with things as little as possible because of the possibility of unintentional side effects, but sometimes a refactor will be beneficial going forward, especially if it doesn't affect the API. What is the group policy on such things? Should I go ahead and attempt it on a branch, make the pull request, and see if anyone has a problem with the code? > > Thanks, > Daisie > ----------------------------------------- > Daisie Huang, PhD > Rm 318, Beaty Biodiversity Centre > Department of Botany > University of British Columbia > http://cronklab.wikidot.com/daisie-huang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jimhu at tamu.edu Tue Aug 28 18:09:48 2012 From: jimhu at tamu.edu (Jim Hu) Date: Tue, 28 Aug 2012 13:09:48 -0500 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning Message-ID: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() --------------------- WARNING --------------------- MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 --------------------------------------------------- It still seems to work, though. Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Tue Aug 28 20:02:11 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 20:02:11 +0000 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Is the BioProject DBSOURCE retained if you write the output back using Bio::SeqIO? chris On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > I noticed that NCBI recently changed the path to genomes in their web interface. I'm wondering if that's related to my getting this kind of message when I use Bio::DB::GenBank->get_Seq_by_acc() > > --------------------- WARNING --------------------- > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > --------------------------------------------------- > > It still seems to work, though. > > Jim > ===================================== > Jim Hu > Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Aug 28 21:19:42 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 28 Aug 2012 17:19:42 -0400 Subject: [Bioperl-l] Bug reporting help Message-ID: Hi, Can somebody with Redmine experience help me out? I have an account associated with the address scott+bioperl at scottcain.net. When I try to reset my password by following the link that is emailed to me, no matter what I enter, I'm told the login is invalid. Any idea what I can do? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Aug 28 21:56:28 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 28 Aug 2012 21:56:28 +0000 Subject: [Bioperl-l] Bug reporting help In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8B58F@CHIMBX5.ad.uillinois.edu> That's odd; just tried this with my account and had no problem. I can try changing it via the admin page and will send it to you. chris On Aug 28, 2012, at 4:19 PM, Scott Cain wrote: > Hi, > > Can somebody with Redmine experience help me out? I have an account > associated with the address scott+bioperl at scottcain.net. When I try > to reset my password by following the link that is emailed to me, no > matter what I enter, I'm told the login is invalid. Any idea what I > can do? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mahakadry at aucegypt.edu Wed Aug 29 00:48:58 2012 From: mahakadry at aucegypt.edu (maha ahmed) Date: Wed, 29 Aug 2012 02:48:58 +0200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Aug 29 01:20:56 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 29 Aug 2012 13:20:56 +1200 Subject: [Bioperl-l] Bio::DB::GenBank new(?) warning In-Reply-To: References: <93C0096F-541A-4DAD-8C74-10CDCEEEDDA1@tamu.edu> <118F034CF4C3EF48A96F86CE585B94BF33B8AE92@CHIMBX5.ad.uillinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF60AEC8@exchsth.agresearch.co.nz> What sequence identifiers are you using and what exactly are you trying to get? Data is available via URL so a simple Perl script will retrieve that: Eg. http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=unaligned http://eggnog.embl.de/version_3.0/cgi/groupview.py/?group=NOG285349&format=newick I haven't tried it but will Bio::DB::EMBL work? --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of maha ahmed Sent: Wednesday, 29 August 2012 12:49 p.m. To: Fields, Christopher J Cc: Jim Hu; Subject: Re: [Bioperl-l] Bio::DB::GenBank new(?) warning Hi guys, I was searching for a bioperl script I can use to retrieve fasta sequences from the Eggnog database I only found ones to get sequences from Genbank, swissprot or ensembl Did anyone pass by any such script or knows a bioperl module that can be used to retrieve sequences from an online database If not then does anyone know a unix command that can be used to retrieve a sequence from an online database like eggnog On Tue, Aug 28, 2012 at 10:02 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Is the BioProject DBSOURCE retained if you write the output back using > Bio::SeqIO? > > chris > > On Aug 28, 2012, at 1:09 PM, Jim Hu wrote: > > > I noticed that NCBI recently changed the path to genomes in their > > web > interface. I'm wondering if that's related to my getting this kind of > message when I use Bio::DB::GenBank->get_Seq_by_acc() > > > > --------------------- WARNING --------------------- > > MSG: Unrecognized DBSOURCE data: BioProject: PRJNA161931 > > > > --------------------------------------------------- > > > > It still seems to work, though. > > > > Jim > > ===================================== > > Jim Hu > > Professor > > Dept. of Biochemistry and Biophysics > > 2128 TAMU > > Texas A&M Univ. > > College Station, TX 77843-2128 > > 979-862-4054 > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From shalabh.sharma7 at gmail.com Thu Aug 30 18:07:11 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 14:07:11 -0400 Subject: [Bioperl-l] reverse complement of fastq Message-ID: HI, I have a fastq file with few million reads. I need to find reverse complement of the reads. I used 'revcom' method but its not working for fastq. I will really appreciate if anyone can help me out. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Aug 30 18:54:14 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 30 Aug 2012 18:54:14 +0000 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> If you want something that gives you revcom *very quickly*, Bioperl is sadly not the way to go just yet. However, you can use something like seqtk, which is very fast: https://github.com/lh3/seqtk Something like this should work: $ seqtk seq -r orig.fq > rc.fq chris On Aug 30, 2012, at 1:07 PM, shalabh sharma wrote: > HI, > I have a fastq file with few million reads. I need to find reverse > complement of the reads. > I used 'revcom' method but its not working for fastq. > > I will really appreciate if anyone can help me out. > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Thu Aug 30 20:01:10 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 30 Aug 2012 16:01:10 -0400 Subject: [Bioperl-l] reverse complement of fastq In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF33B8E21B@CHIMBX5.ad.uillinois.edu> Message-ID: Hey Chris, Thanks a lot it worked and it was really fast. Thanks Shalabh On Thu, Aug 30, 2012 at 2:54 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > If you want something that gives you revcom *very quickly*, Bioperl is > sadly not the way to go just yet. However, you can use something like > seqtk, which is very fast: > > https://github.com/lh3/seqtk > > Something like this should work: > > $ seqtk seq -r orig.fq > rc.fq > > chris > > On Aug 30, 2012, at 1:07 PM, shalabh sharma > wrote: > > > HI, > > I have a fastq file with few million reads. I need to find reverse > > complement of the reads. > > I used 'revcom' method but its not working for fastq. > > > > I will really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > > > > > -- > > Shalabh Sharma > > Scientific Computing Professional Associate (Bioinformatics Specialist) > > Department of Marine Sciences > > University of Georgia > > Athens, GA 30602-3636 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636